Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Debugging Ruby Performance

Aman Karmani
September 21, 2011

Debugging Ruby Performance

Ruby might be slow, but bad code only makes it worse. This talk will teach you how to use powerful tools to see how your code is executed, so you can understand, debug and optimize it.

Aman Karmani

September 21, 2011
Tweet

More Decks by Aman Karmani

Other Decks in Programming

Transcript

  1. DEBUGGING RUBY
    PERFORMANCE
    Aman Gupta
    @tmm1
    speakerdeck.com/u/tmm1/p/debugging-ruby-performance

    View Slide

  2. AS RUBYISTS, WE'VE SEEN...

    View Slide

  3. ejpphoto (flickr)
    nasty bugs

    View Slide

  4. code
    fatboyke (flickr)

    View Slide

  5. 37prime (flickr)
    memory bloat

    View Slide

  6. THIS TALK IS ABOUT...
    TOOLS TO FIX THESE ISSUES.

    View Slide

  7. TOOLS FOR LINUX.
    pleeker (flickr)
    lsof
    strace
    ltrace

    View Slide

  8. TOOLS FOR C CODE.
    booddin (flickr)
    perftools
    gdb

    View Slide

  9. pascal.charest (flickr)
    TOOLS FOR NETWORKS.
    tcpdump
    ngrep

    View Slide

  10. marksze (flickr)
    TOOLS FOR CPU USAGE.
    perftools
    perftools.rb

    View Slide

  11. kgrocki (flickr)
    TOOLS FOR MEMORY USAGE.
    bleak_house
    gdb.rb
    memprof

    View Slide

  12. mayu (flickr) delgrossodotcom (flickr)
    IGNORE THE FINE PRINT

    View Slide

  13. list open files
    LSOF
    lsof -nPp

    View Slide

  14. lsof -nPp
    -n
    Inhibits the conversion of network numbers to host names.
    -P
    Inhibits the conversion of port numbers to names for network files
    FD TYPE NAME
    cwd DIR /var/www/myapp
    txt REG /usr/bin/ruby
    mem REG /json-1.1.9/ext/json/ext/generator.so
    mem REG /json-1.1.9/ext/json/ext/parser.so
    mem REG /memcached-0.17.4/lib/rlibmemcached.so
    mem REG /mysql-2.8.1/lib/mysql_api.so
    0u CHR /dev/null
    1w REG /usr/local/nginx/logs/error.log
    2w REG /usr/local/nginx/logs/error.log
    3u IPv4 10.8.85.66:33326->10.8.85.68:3306 (ESTABLISHED)
    10u IPv4 10.8.85.66:33327->10.8.85.68:3306 (ESTABLISHED)
    11u IPv4 127.0.0.1:58273->127.0.0.1:11211 (ESTABLISHED)
    12u REG /tmp/RackMultipart.28957.0
    33u IPv4 174.36.83.42:37466->69.63.180.21:80 (ESTABLISHED)
    json
    memcached
    mysql
    http

    View Slide

  15. dump traffic on a network
    TCPDUMP
    tcpdump -i eth0 -s 0 -nqA
    tcp dst port 3306

    View Slide

  16. tcpdump -i -s -nqA
    -i
    Network interface.
    -s
    Snarf len bytes of data from each packet.
    -n
    Don't convert addresses (host addresses, port numbers) to names.
    -q
    Quiet output. Print less protocol information.
    -A
    Print each packet (minus its link level header) in ASCII.
    -w
    Write the raw packets to file rather than printing them out.

    libpcap expression, for example:
    tcp src port 80
    tcp dst port 3306
    tcpdump -i -w

    View Slide

  17. tcp dst port 80
    19:52:20.216294 IP 24.203.197.27.40105 >
    174.37.48.236.80: tcp 438
    E...*[email protected].%&.....%0....POx..%s.oP.......
    GET /poll_images/cld99erh0/logo.png HTTP/1.1
    Accept: */*
    Referer: http://apps.facebook.com/realpolls/?
    _fb_q=1

    View Slide

  18. tcp dst port 3306
    19:51:06.501632 IP 10.8.85.66.50443 >
    10.8.85.68.3306: tcp 98
    E..."K@[email protected]
    .UB
    .UD.....z....L............
    GZ.y3b..[......W....
    SELECT * FROM `votes` WHERE (`poll_id` =
    72621) LIMIT 1

    View Slide

  19. tcpdump -w

    View Slide

  20. trace system calls and signals
    STRACE
    strace -cp
    strace -ttTp -o

    View Slide

  21. strace -cp
    -c
    Count time, calls, and errors for each system call and report a
    summary on program exit.
    -p pid
    Attach to the process with the process ID pid and begin tracing.
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    50.39 0.000064 0 1197 592 read
    34.65 0.000044 0 609 writev
    14.96 0.000019 0 1226 epoll_ctl
    0.00 0.000000 0 4 close
    0.00 0.000000 0 1 select
    0.00 0.000000 0 4 socket
    0.00 0.000000 0 4 4 connect
    0.00 0.000000 0 1057 epoll_wait
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.000127 4134 596 total

    View Slide

  22. strace -ttTp -o
    -t
    Prefix each line of the trace with the time of day.
    -tt
    If given twice, the time printed will include the microseconds.
    -T
    Show the time spent in system calls.
    -o filename
    Write the trace output to the file filename rather than to stderr.
    epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
    accept(10, {sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
    fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
    fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
    setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
    accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
    epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
    epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
    read(22, "GET / HTTP/1.1\r"..., 16384) = 772 <0.000012>
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
    poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
    write(5, "1\0\0\0\0\0\0-\0\0\0\3SELECT * FROM `table`"..., 56) = 56 <0.000023>
    read(5, "\25\1\0\1,\2\0x\234m"..., 16384) = 284 <1.300897>

    View Slide

  23. read(22, "GET / HTTP/1.1\r"..., 16384) = 772 <0.0012>
    http client connection
    read 772 bytes
    incoming http request
    took 0.0012s

    View Slide

  24. write(5, "SELECT * FROM `table`"..., 56) = 56 <0.0023>
    read(5, "\25\1\0\1,\2\0x\234m"..., 16384) = 284 <1.30>
    mysql connection
    write sql query to db
    read query
    response
    slow query

    View Slide

  25. stracing ruby: SIGVTALRM
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 2207807 <0.000009>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 0 <0.000009>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 140734552062624 <0.000009>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 140734552066688 <0.000009>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 11333952 <0.000008>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 0 <0.000009>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a) = 1 <0.000010>
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    • ruby 1.8 uses signals to schedule its green threads
    • process receives a SIGVTALRM signal every 10ms

    View Slide

  26. stracing ruby: sigprocmask
    • debian/redhat compile ruby with --enable-pthread
    • uses a native thread timer for SIGVTALRM
    • causes excessive calls to sigprocmask: 30% slowdown!
    % time seconds usecs/call calls errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.326334 0 3568567 rt_sigprocmask
    0.00 0.000000 0 9 read
    0.00 0.000000 0 10 open
    0.00 0.000000 0 10 close
    0.00 0.000000 0 9 fstat
    0.00 0.000000 0 25 mmap
    ------ ----------- ----------- --------- --------- ----------------
    100.00 0.326334 3568685 0 total

    View Slide

  27. % time seconds calls syscall
    ------ ----------- ------ -------
    95.94 76.685658 1086 clone
    95s in the github test suite in clone()
    `git cmd` calls fork() which calls clone()
    posix-spawn gem sped up tests by 2x!
    stracing ruby: tests

    View Slide

  28. trace library calls
    LTRACE
    ltrace -cp
    ltrace -ttTp -o

    View Slide

  29. % time seconds usecs/call calls function
    ------ ----------- ----------- --------- --------------------
    48.65 11.741295 617 19009 memcpy
    30.16 7.279634 831 8751 longjmp
    9.78 2.359889 135 17357 _setjmp
    8.91 2.150565 285 7540 malloc
    1.10 0.265946 20 13021 memset
    0.81 0.195272 19 10105 __ctype_b_loc
    0.35 0.084575 19 4361 strcmp
    0.19 0.046163 19 2377 strlen
    0.03 0.006272 23 265 realloc
    ------ ----------- ----------- --------- --------------------
    100.00 24.134999 82999 total
    ltrace -c ruby threaded_em.rb
    01:24:48.769408 --- SIGVTALRM (Virtual timer expired) ---
    01:24:48.769616 memcpy(0x1216000, "", 1086328) = 0x1216000 <0.000578>
    01:24:48.770555 memcpy(0x6e32670, "\240&\343v", 1086328) = 0x6e32670 <0.000418>
    01:24:49.899414 --- SIGVTALRM (Virtual timer expired) ---
    01:24:49.899490 memcpy(0x1320000, "", 1082584) = 0x1320000 <0.000628>
    01:24:49.900474 memcpy(0x6e32670, "", 1086328) = 0x6e32670 <0.000479>
    ltrace -ttT -e memcpy ruby threaded_em.rb

    View Slide

  30. trace ruby method calls
    RBTRACE
    require 'rbtrace'
    rbtrace -p -m
    github.com/tmm1/rbtrace

    View Slide

  31. Unicorn::App#call(env['PATH_INFO']="/cakephp/cakephp/downloads")
    Smoke::Git#cat_ref(args=["2.0.0-dev"]) <0.046590>
    Smoke::Git#cat_ref(args=["1.3-dev"]) <0.053170>
    Smoke::Git#cat_ref(args=["1.3.7"]) <0.046643>
    Smoke::Git#cat_ref(args=["1.3.6"]) <0.045075>
    Smoke::Git#cat_ref(args=["1.3.5"]) <0.043925>
    Smoke::Git#cat_ref(args=["1.3.4"]) <0.054496>
    Smoke::Git#cat_ref(args=["1.3.3"]) <0.045860>
    rbtrace -p -m
    "Unicorn::App#call(env['PATH_INFO'])"
    "Smoke::Git#(args)"
    downloads page makes N+1 git calls to
    resolve tags

    View Slide

  32. $ rbtrace -p 20052 --slow=50
    *** attached to process 20052
    Array#include? <0.075737>
    Array#include? <0.077302>
    Array#include? <0.076444>
    Array#include? <0.077122>
    Array#include? <0.074956>
    $ rbtrace -p 20052 --slow=50 -m 'include?(__source__)'
    *** attached to process 20052
    Array#include?(__source__="lib/jekyll/site.rb:179") <0.075737>
    Array#include?(__source__="lib/jekyll/site.rb:179") <0.077302>
    Array#include?(__source__="lib/jekyll/site.rb:179") <0.076444>
    slow jekyll processing fixed by switching to
    Set#include?

    View Slide

  33. Albino#colorize(@target.size=180, @options[:l]="plain") <0.151861>
    Albino#colorize(@target.size=1009, @options[:l]="ruby") <0.233931>
    Albino#colorize(@target.size=9024, @options[:l]="js") <0.278415>
    Albino#colorize(@target.size=243, @options[:l]="plain") <0.155549>
    Albino#colorize(@target.size=596, @options[:l]="diff") <0.169390>
    Albino#colorize(@target.size=2994, @options[:l]="ruby") <0.305385>
    Albino#colorize(@target.size=2372, @options[:l]="js") <0.227131>
    Albino#colorize(@target.size=42444, @options[:l]="php") <0.448636>
    Albino#colorize(@target.size=1795, @options[:l]="ruby") <0.259931>
    Albino#colorize(@target.size=3088, @options[:l]="plain") <0.480678>
    ~500ms to shell out to python pygmentize
    use pygments.rb for 15x faster syntax highlighting!
    (uses FFI to embed python inside ruby)

    View Slide

  34. rbtrace -c
    MysqlAdapter#execute(sql="SELECT * FROM `users` WHERE (`users`.`id` =
    105775) ") <0.000472>
    MysqlAdapter#execute(sql="SELECT * FROM `gists` WHERE (`gists`.`public` =
    1 AND `gists`.`id` = '871708' ) LIMIT 1") <0.002153>
    MysqlAdapter#execute(sql="SELECT count(*) AS count_all FROM
    `gist_comments` WHERE (`gist_comments`.gist_id = 871708) ") <0.000866>
    Redis::Client#process(commands=[[:zscore, "gwiki:597840:updated",
    ["home"]]]) <0.000439>
    Redis::Client#process(commands=[[:get, "gh.storage.network.1745996"]])
    <0.000923>
    Redis::Client#process(commands=[[:sismember, "gh:beta:users", "14850"]])
    <0.000520>
    Redis::Client#process(commands=[[:rpush, "resque:queue:high", "{...}"]])
    <0.000428>
    built-in tracers for
    activerecord, redis, mongo, eventmachine, unicorn,
    testunit

    View Slide

  35. google's performance tools
    PERFTOOLS
    CPUPROFILE=/tmp/myprof ./myapp
    pprof ./myapp /tmp/myprof

    View Slide

  36. wget http://google-perftools.googlecode.com/files/google-
    perftools-1.6.tar.gz
    tar zxvf google-perftools-1.6.tar.gz
    cd google-perftools-1.6
    ./configure --prefix=/opt
    make
    sudo make install
    # for linux
    export LD_PRELOAD=/opt/lib/libprofiler.so
    # for osx
    export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib
    CPUPROFILE=/tmp/ruby.prof ruby -e'
    5_000_000.times{ "hello world" }
    '
    pprof `which ruby` --text /tmp/ruby.prof
    download
    compile
    profile
    report
    setup

    View Slide

  37. Total: 103 samples
    95 92.2% rb_yield_0
    103 100.0% rb_eval
    12 11.7% gc_sweep
    52 50.5% rb_str_new3
    3 2.9% obj_free
    103 100.0% int_dotimes
    12 11.7% gc_mark
    pprof ruby
    ruby.prof --text
    pprof ruby
    ruby.prof --gif

    View Slide

  38. Profiling MRI
    • 10% of production
    VM time spent in
    rb_str_sub_bang
    • String#sub!
    • called from
    Time.parse
    return unless str.sub!(/\A(\d{1,2})/, '')
    return unless str.sub!(/\A( \d|\d{1,2})/, '')
    return unless str.sub!(/\A( \d|\d{1,2})/, '')
    return unless str.sub!(/\A(\d{1,3})/, '')
    return unless str.sub!(/\A(\d{1,2})/, '')
    return unless str.sub!(/\A(\d{1,2})/, '')

    View Slide

  39. Profiling EM + threads
    • known issue: EM+threads =
    slow
    • memcpy??
    • thread context switches copy
    the stack w/ memcpy
    • EM allocates huge buffer on
    the stack
    • solution: move buffer to the
    heap
    Total: 3763 samples
    2764 73.5% catch_timer
    989 26.3% memcpy
    3 0.1% st_lookup
    2 0.1% rb_thread_schedule
    1 0.0% rb_eval
    1 0.0% rb_newobj
    1 0.0% rb_gc_force_recycle

    View Slide

  40. perftools for ruby code
    PERFTOOLS.RB
    pprof.rb /tmp/myrbprof
    github.com/tmm1/perftools.rb

    View Slide

  41. gem install perftools.rb
    RUBYOPT="-r`gem which perftools | tail -1`"
    CPUPROFILE=/tmp/myrbprof
    ruby myapp.rb
    pprof.rb /tmp/myrbprof --text
    pprof.rb /tmp/myrbprof --gif > /tmp/myrbprof.gif

    View Slide

  42. • Sampling profiler:
    • 232 samples total
    • 83 samples were in /compute
    • 118 samples had /compute on
    the stack but were in
    another function
    • /compute accounts for 50%
    of process, but only 35% of
    time was in /compute itself
    require 'sinatra'
    get '/sleep' do
    sleep 0.25
    'done'
    end
    get '/compute' do
    proc{ |n|
    a,b=0,1
    n.times{ a,b = b,a+b }
    b
    }.call(10_000)
    'done'
    end
    $ ab -c 1 -n 50 http://127.0.0.1:4567/compute
    $ ab -c 1 -n 50 http://127.0.0.1:4567/sleep
    == Sinatra has ended his set (crowd applauds)
    PROFILE: interrupts/evictions/bytes = 232/0/2152
    Total: 232 samples
    83 35.8% 35.8% 118 50.9% Sinatra::Application#GET /compute
    56 24.1% 59.9% 56 24.1% garbage_collector
    35 15.1% 75.0% 113 48.7% Integer#times

    View Slide

  43. CPUPROFILE=app.prof
    CPUPROFILE_REALTIME=1
    CPUPROFILE=app-rt.prof

    View Slide

  44. redis-rb bottleneck

    View Slide

  45. why is rubygems slow?

    View Slide

  46. slow nanite tests

    View Slide

  47. faster
    bundle
    install
    • 23% spent in
    Gem::Version#<=>
    • simple patch to rubygems
    improved overall install
    performance by 15%
    • http://gist.github.com/
    458185

    View Slide

  48. CPUPROFILE_OBJECTS=1
    CPUPROFILE=app-objs.prof
    • object allocation profiler
    mode built-in
    • 1 sample = 1 object
    created
    • Time parsing is both
    CPU and object
    allocation intensive
    • using mysql2 moves
    this to C

    View Slide

  49. rack middleware for perftools.rb
    RACK-PERFTOOLS
    gem install rack-
    perftools_profiler
    github.com/bhb/rack-perftools_profiler

    View Slide

  50. require 'rack/perftools_profiler'
    config.middleware.insert(
    0,
    Rack::PerftoolsProfiler,
    :default_printer => 'gif'
    )
    $ curl http://localhost:3000/__start__
    $ curl http://localhost:3000/home
    $ curl http://localhost:3000/about
    $ curl http://localhost:3000/__stop__
    $ curl http://localhost:3000/__data__ -o profile.gif
    $ curl http://localhost:3000/__data__?printer=text -o profile.txt
    $ curl "http://localhost:3000/home?profile=true&times=10"
    -o 10_requests_to_homepage.gif

    View Slide

  51. the GNU debugger
    GDB
    gdb
    gdb attach

    View Slide

  52. Debugging Ruby Segfaults
    #include "ruby.h"
    VALUE
    segv()
    {
    VALUE array[1];
    array[1000000] = NULL;
    return Qnil;
    }
    void
    Init_segv()
    {
    rb_define_method(rb_cObject, "segv", segv, 0);
    }
    test_segv.rb:4: [BUG] Segmentation fault
    ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9.7.0]
    def test
    require 'segv'
    4.times do
    Dir.chdir '/tmp' do
    Hash.new{ segv }[0]
    end
    end
    end
    sleep 10
    test()

    View Slide

  53. 1. Attach to running process
    $ sudo gdb ruby 23611
    Attaching to program: ruby, process 23611
    0x00007fa5113c0c93 in nanosleep () from /lib/libc.so.6
    (gdb) c
    Continuing.
    Program received signal SIGBUS, Bus error.
    segv () at segv.c:7
    7 array[1000000] = NULL;
    $ ps aux | grep ruby
    joe 23611 0.0 0.1 25424 7540 S Dec01 0:00 ruby test_segv.rb
    2. Use a coredump
    $ sudo mkdir /cores
    $ sudo chmod 777 /cores
    $ sudo sysctl kernel.core_pattern=/cores/%e.core.%s.%p.%t
    Process.setrlimit Process::RLIMIT_CORE, 300*1024*1024
    $ sudo gdb ruby /cores/ruby.core.6.23611.1259781224

    View Slide

  54. def test
    require 'segv'
    4.times do
    Dir.chdir '/tmp' do
    Hash.new{ segv }[0]
    end
    end
    end
    test()
    (gdb) where
    #0 segv () at segv.c:7
    #1 0x000000000041f2be in call_cfunc () at eval.c:5727
    ...
    #13 0x000000000043ba8c in rb_hash_default () at hash.c:521
    ...
    #19 0x000000000043b92a in rb_hash_aref () at hash.c:429
    ...
    #26 0x00000000004bb7bc in chdir_yield () at dir.c:728
    #27 0x000000000041d8d7 in rb_ensure () at eval.c:5528
    #28 0x00000000004bb93a in dir_s_chdir () at dir.c:816
    ...
    #35 0x000000000041c444 in rb_yield () at eval.c:5142
    #36 0x0000000000450690 in int_dotimes () at numeric.c:2834
    ...
    #48 0x0000000000412a90 in ruby_run () at eval.c:1678
    #49 0x000000000041014e in main () at main.c:48

    View Slide

  55. gdb with MRI hooks
    GDB.RB
    gem install gdb.rb
    gdb.rb
    github.com/tmm1/gdb.rb

    View Slide

  56. (gdb) ruby eval 1+2
    3
    (gdb) ruby eval Thread.current
    #
    (gdb) ruby eval Thread.list.size
    8

    View Slide

  57. (gdb) ruby threads list
    0x15890 main thread THREAD_STOPPED WAIT_JOIN(0x19ef4)
    0x19ef4 thread THREAD_STOPPED WAIT_TIME(57.10s)
    0x19e34 thread THREAD_STOPPED WAIT_FD(5)
    0x19dc4 thread THREAD_STOPPED WAIT_NONE
    0x19dc8 thread THREAD_STOPPED WAIT_NONE
    0x19dcc thread THREAD_STOPPED WAIT_NONE
    0x22668 thread THREAD_STOPPED WAIT_NONE
    0x1d630 curr thread THREAD_RUNNABLE WAIT_NONE

    View Slide

  58. (gdb) ruby objects
    HEAPS 8
    SLOTS 1686252
    LIVE 893327 (52.98%)
    FREE 792925 (47.02%)
    scope 1641 (0.18%)
    regexp 2255 (0.25%)
    data 3539 (0.40%)
    class 3680 (0.41%)
    hash 6196 (0.69%)
    object 8785 (0.98%)
    array 13850 (1.55%)
    string 105350 (11.79%)
    node 742346 (83.10%)

    View Slide

  59. (gdb) ruby objects strings
    140 u'lib'
    158 u'0'
    294 u'\n'
    619 u''
    30503 unique strings
    3187435 bytes

    View Slide

  60. def test
    require 'segv'
    4.times do
    Dir.chdir '/tmp' do
    Hash.new{ segv }[0]
    end
    end
    end
    test()
    (gdb) ruby threads
    0xa3e000 main curr thread THREAD_RUNNABLE WAIT_NONE
    node_vcall segv in test_segv.rb:5
    node_call test in test_segv.rb:5
    node_call call in test_segv.rb:5
    node_call default in test_segv.rb:5
    node_call [] in test_segv.rb:5
    node_call test in test_segv.rb:4
    node_call chdir in test_segv.rb:4
    node_call test in test_segv.rb:3
    node_call times in test_segv.rb:3
    node_vcall test in test_segv.rb:9

    View Slide

  61. rails_warden leak
    (gdb) ruby objects classes
    1197 MIME::Type
    2657 NewRelic::MetricSpec
    2719 TZInfo::TimezoneTransitionInfo
    4124 Warden::Manager
    4124 MethodOverrideForAll
    4124 AccountMiddleware
    4124 Rack::Cookies
    4125 ActiveRecord::ConnectionAdapters::ConnectionManagement
    4125 ActionController::Session::CookieStore
    4125 ActionController::Failsafe
    4125 ActionController::ParamsParser
    4125 Rack::Lock
    4125 ActionController::Dispatcher
    4125 ActiveRecord::QueryCache
    4125 ActiveSupport::MessageVerifier
    4125 Rack::Head
    middleware chain leaking per request

    View Slide

  62. mongrel sleeper thread
    0x16814c00 thread THREAD_STOPPED WAIT_TIME(0.47) 1522 bytes
    node_fcall sleep in lib/mongrel/configurator.rb:285
    node_fcall run in lib/mongrel/configurator.rb:285
    node_fcall loop in lib/mongrel/configurator.rb:285
    node_call run in lib/mongrel/configurator.rb:285
    node_call initialize in lib/mongrel/configurator.rb:285
    node_call new in lib/mongrel/configurator.rb:285
    node_call run in bin/mongrel_rails:128
    node_call run in lib/mongrel/command.rb:212
    node_call run in bin/mongrel_rails:281
    node_fcall (unknown) in bin/mongrel_rails:19
    def run
    @listeners.each {|name,s|
    s.run
    }
    $mongrel_sleeper_thread = Thread.new { loop { sleep 1 } }
    end

    View Slide

  63. god memory leaks
    (gdb) ruby objects arrays
    elements instances
    94310 3
    94311 3
    94314 2
    94316 1
    5369 arrays
    2863364 member elements
    many arrays with
    90k+ elements!
    5 separate god leaks fixed by Eric
    Lindvall with the help of gdb.rb!
    43 God::Process
    43 God::Watch
    43 God::Driver
    43 God::DriverEventQueue
    43 God::Conditions::MemoryUsage
    43 God::Conditions::ProcessRunning
    43 God::Behaviors::CleanPidFile
    45 Process::Status
    86 God::Metric
    327 God::System::SlashProcPoller
    327 God::System::Process
    406 God::DriverEvent

    View Slide

  64. a heap visualizer for ruby
    MEMPROF
    gem install memprof
    open http://memprof.com
    github.com/ice799/memprof

    View Slide

  65. • memprof.track
    • memprof.dump
    • memprof.dump_all
    • memprof.com
    memprof features

    View Slide

  66. Memprof.track{
    100.times{ "abc" }
    100.times{ 1.23 + 1 }
    100.times{ Module.new }
    }
    100 file.rb:2:String
    100 file.rb:3:Float
    100 file.rb:4:Module
    • like bleak_house, but for a given block of code
    • use Memprof::Middleware in your webapps to run track
    per request

    View Slide

  67. • memprof.track
    • memprof.dump
    • memprof.dump_all
    • memprof.com
    memprof features

    View Slide

  68. {
    "_id": "0x19c610",
    "file": "file.rb",
    "line": 2,
    "type": "string",
    "class": "0x1ba7f0",
    "class_name": "String",
    "length": 10,
    "data": "helloworld"
    }
    memory address of object
    file and line where string
    was created
    length and contents
    of this string instance
    address of the class
    “String”
    strings Memprof.dump{
    "hello" + "world"
    }

    View Slide

  69. floats and strings are
    separate ruby objects
    {
    "_id": "0x19c5c0",
    "class": "0x1b0d18",
    "class_name": "Array",
    "length": 4,
    "data": [
    1,
    ":b",
    "0x19c750",
    "0x19c598"
    ]
    }
    integers and symbols are
    stored in the array itself
    arrays Memprof.dump{
    [
    1,
    :b,
    2.2,
    "d"
    ]
    }

    View Slide

  70. hashes
    {
    "_id": "0x19c598",
    "type": "hash",
    "class": "0x1af170",
    "class_name": "Hash",
    "default": null,
    "length": 2,
    "data": [
    [ ":a", 1 ],
    [ "0xc728", "0xc750" ]
    ]
    }
    hash entries as key/value
    pairs
    no default proc
    Memprof.dump{
    {
    :a => 1,
    "b" => 2.2
    }
    }

    View Slide

  71. classes Memprof.dump{
    class Hello
    @@var=1
    Const=2
    def world() end
    end
    }
    {
    "_id": "0x19c408",
    "type": "class",
    "name": "Hello",
    "super": "0x1bfa48",
    "super_name": "Object",
    "ivars": {
    "@@var": 1,
    "Const": 2
    },
    "methods": {
    "world": "0x19c318"
    }
    }
    class variables and constants
    are stored in the instance
    variable table
    superclass object reference
    references to method objects

    View Slide

  72. • memprof.track
    • memprof.dump
    • memprof.dump_all
    • memprof.com
    memprof features

    View Slide

  73. Memprof.dump_all("myapp_heap.json")
    • dump out every single live object as json
    • one per line to specified file
    • analyze via
    • jsawk/grep
    • mongodb/couchdb
    • custom ruby scripts
    • libyajl + Boost Graph Library

    View Slide

  74. • memprof.track
    • memprof.dump
    • memprof.dump_all
    • memprof.com
    memprof features

    View Slide

  75. a web-based heap visualizer and leak analyzer
    memprof.com

    View Slide

  76. plugging a leak in rails3
    • in dev mode, rails3 is leaking 10mb per request
    # in environment.rb
    require `gem which memprof/signal`.strip
    let’s use memprof to find it!

    View Slide

  77. plugging a leak
    in rails3
    tell memprof to dump
    out the entire heap to
    json
    $ memprof
    --pid
    --name
    --key
    send the app some
    requests so it leaks
    $ ab -c 1 -n 30
    http://localhost:3000/

    View Slide

  78. 2519 classes
    30 copies of
    TestController
    mongo query for all
    TestController classes
    details for one copy of
    TestController

    View Slide

  79. find references to object
    holding references
    to all controllers
    “leak” is on line 178

    View Slide

  80. • In development mode, Rails reloads all your
    application code on every request
    • ActionView::Partials::PartialRenderer is caching
    partials used by each controller as an optimization
    • But.. it ends up holding a reference to every single
    reloaded version of those controllers

    View Slide

  81. a performance tool for rails
    HOTSPOTS
    gem install hotspots
    hotspots /tmp/*.json report/
    github.com/tmm1/hotspots

    View Slide

  82. config.middleware.use(Memprof::Tracer)
    {
    "time": 4.3442,
    "rails": {
    "controller": "test",
    "action": "index"
    },
    "request": {
    "REQUEST_PATH": "/test",
    "REQUEST_METHOD": "GET"
    },
    total time for request
    rails controller/action
    request env info

    View Slide

  83. "mysql": {
    "queries": 3,
    "time": 0.00109302
    },
    "gc": {
    "calls": 8,
    "time": 2.04925
    },
    config.middleware.use(Memprof::Tracer)
    8 calls to GC
    2 secs spent in GC
    3 mysql queries
    config.middleware.use(Memprof::Tracer)

    View Slide

  84. "objects": {
    "created": 3911103,
    "types": {
    "none": 1168831,
    "object": 1127,
    "float": 627,
    "string": 1334637,
    "array": 609313,
    "hash": 3676,
    "match": 70211
    }
    }
    }
    config.middleware.use(Memprof::Tracer)
    3 million objs created
    lots of strings
    lots of arrays
    regexp matches
    object instances
    1 million method calls
    config.middleware.use(Memprof::Tracer)

    View Slide

  85. Requests over time

    View Slide

  86. Top controllers and actions

    View Slide

  87. Request and response types

    View Slide

  88. Response time breakdown

    View Slide

  89. Garbage Collection overhead

    View Slide

  90. Objects created per request

    View Slide

  91. SQL queries and types

    View Slide

  92. time-series analytics
    GRAPHITE
    ./carbon-cache.py start
    gunicorn_django
    launchpad.net/graphite

    View Slide

  93. collectd: frontend load

    View Slide

  94. collectd: frontend cpu usage

    View Slide

  95. collectd: all frontend cpu usage

    View Slide

  96. custom metrics: unicorn requests
    diff = Time.now - @start
    $stats.timing("unicorn.#{host}.response_time",
    diff * 1000)

    View Slide

  97. custom metrics: unicorn response time

    View Slide

  98. custom metrics: unicorn response time

    View Slide

  99. custom metrics: unicorn GC time

    View Slide

  100. custom metrics: deployments

    View Slide

  101. custom metrics: deployments

    View Slide

  102. custom metrics: GitHub for Mac
    $stats.increment('macapp.install')

    View Slide

  103. custom metrics: merge button

    View Slide

  104. custom metrics: merge button

    View Slide

  105. custom metrics: github users!

    View Slide

  106. custom metrics: github users!

    View Slide

  107. TOOLS MAKE
    DEBUGGING EASIER.

    View Slide

  108. LSOF
    TCPDUMP
    STRACE
    LTRACE
    RBTRACE
    GDB
    GDB.RB
    PERFTOOLS
    PERFTOOLS.RB
    RACK-PERFTOOLS
    MEMPROF
    HOTSPOTS
    GRAPHITE

    View Slide

  109. LEARN THESE TOOLS.
    USE THESE TOOLS.
    yujie (flickr)

    View Slide

  110. QUESTIONS?
    Aman Gupta
    @tmm1
    speakerdeck.com/u/tmm1/p/debugging-ruby-performance

    View Slide