Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring Riak

Monitoring Riak

A brief talk covering the basics of monitoring a Riak cluster. Delivered at the Boston Riak Meetup, inspired by Monitorama. I'm sorry for all the Wire references (not really).

Tom Santero

March 27, 2013
Tweet

More Decks by Tom Santero

Other Decks in Technology

Transcript

  1. $  bin/riak  ping pong OK! √ $  bin/riak  ping Node

     not  responding  to  pings Friday, March 29, 13
  2. $  bin/riak  ping pong OK! √ $  bin/riak  ping Node

     not  responding  to  pings OHNOES! X Friday, March 29, 13
  3. $  riak-­‐admin  test Attempting  to  restart  script  through  sudo  -­‐H

     -­‐u  riak Successfully  completed  1  read/write  cycle  to  'riak@devnull' Friday, March 29, 13
  4. Metric CPU Memory Disk Space Disk IO Network File Descriptors

    Swap Threshold 75% * num_cores 70% - bu!ers 75% 80% sustained 70% sustained 75% of ulimit > 0KB Friday, March 29, 13
  5. $  riak-­‐admin  status 1-­‐minute  stats  for  'riak@devnull' -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ vnode_gets  :

     600 vnode_gets_total  :  714 vnode_puts  :  600 vnode_puts_total  :  714 vnode_index_reads  :  0 vnode_index_reads_total  :  0 vnode_index_writes  :  0 vnode_index_writes_total  :  0 vnode_index_writes_postings  :  0 vnode_index_writes_postings_total  :  0 vnode_index_deletes  :  0 vnode_index_deletes_total  :  0 vnode_index_deletes_postings  :  0 vnode_index_deletes_postings_total  :  0 node_gets  :  585 node_gets_total  :  694 node_get_fsm_siblings_mean  :  0 node_get_fsm_siblings_median  :  0 node_get_fsm_siblings_95  :  0 node_get_fsm_time_99  :  743 node_get_fsm_time_100   <-­‐-­‐  snip  -­‐-­‐> Friday, March 29, 13
  6. **  Reason  for  termination  ==   **  {error,system_limit,[{erlang,open_port, [{spawn,"zlib_drv"},[binary]],[]},{zlib,open,0,[]}, {zlib,zip,1,[]},{riak_kv_pb_object,process,2,

    [{file,"src/riak_kv_pb_object.erl"},{line,218}]}, {riak_api_pb_server,process_message,4,[{file,"src/ riak_api_pb_server.erl"},{line,203}]}, {riak_api_pb_server,handle_info,2,[{file,"src/ riak_api_pb_server.erl"},{line,123}]}, {gen_server,handle_msg,5,[{file,"gen_server.erl"}, {line,607}]},{proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,227}]}]} 2013-­‐03-­‐26  17:24:17  =CRASH  REPORT====    crasher:        initial  call:  riak_api_pb_server:init/1        pid:  <0.15785.5260> Friday, March 29, 13
  7. Riaknostic $  riak-­‐admin  diag  -­‐-­‐level  debug 18:34:19.708  [debug]  Lager  installed

     handler  lager_console_backend  into  lager_event 18:34:19.720  [debug]  Lager  installed  handler  error_logger_lager_h  into  error_logger 18:34:19.720  [info]  Application  lager  started  on  node  nonode@nohost 18:34:20.736  [debug]  Not  connected  to  the  local  Riak  node,  trying  to  connect.   alive:false  connect_failed:undefined 18:34:20.737  [debug]  Starting  distributed  Erlang. 18:34:20.740  [debug]  Supervisor  net_sup  started  erl_epmd:start_link()  at  pid  <0.42.0> 18:34:20.742  [debug]  Supervisor  net_sup  started  auth:start_link()  at  pid  <0.43.0> Friday, March 29, 13