Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GC in Ruby 2.2

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for Zete Zete
December 17, 2014

GC in Ruby 2.2

Slides of my local tech share in Beijing, 2014-12-06

Avatar for Zete

Zete

December 17, 2014
Tweet

More Decks by Zete

Other Decks in Programming

Transcript

  1. Memory Management Application View • stack • alloca • heap

    • with regard to stack: RTTI, auto_ptr, __attribute__(destructor), … • manual with help: arena, buddy memory, memory pool, … • reference counted (shared_ptr, regexp, IO) • GC
  2. Memory Management Operating System View • Virtual memory • Segment

    - segment fault is serious error… however segment is rarely used now • Page - page fault is not error, but may recall very slow disk access • The translation table is stored in TLB (movl cr3, eax) • IPC memory • Pipes • Process shared memory • With regard to IPC memory management, raptor uses mbuf and many ways to avoid copying
  3. Memory Management Hardware View • CPU talks to memory through

    Address Bus and Data Bus — Bus clock cycle is several times slower than CPU clock • SDRAM and RDRAM are High bandwidth (throughput), High latency (100+ CPU cycles) • L1, L2, L3 caches — 90% of memory access is through cache • Multi-way cache lines: the more “ways” the more precise and more complicated circuit • DMA (direct memory access) mode: read from or write memory to device directly • Memory fences: loadload, loadsave, saveload, savesave, volatile (rb_gc_guarded_ptr_val)
  4. Implemenation Considers… • CPU interruptions (Boehm GC page-fault) • Locality

    (heap allocations) • Predicting performance (G1GC -XX:MaxGCPauseMillis) • Debugging (how to debug a segfault in GC?) • Pointer compressing (Jikes VM, LLVM compressing on linked list) • Language features (Erlang and Haskell take advantage of immutability) • Internal of C APIs (tcmalloc, jemalloc, … which to use?) • OS APIs (mmap) • (Disable) Compiler optimisations (volatile) • CPU arch (memory fence to ensure execution sequence)
  5. Many GCs conservative mark sweep generational CMS (Java) N Y

    Y G1GC (Java 7) N Y Infinite CPython N N Y Rubinius N Y Y Lua N Y N Go Y Y N Boehm GC Y Y N
  6. CRuby GC • Conservative • Bit marking • Lazy sweep

    • Generational • Incremental marking
  7. Implementation Choices • Ruby is not fast, but easy to

    optimize with C-ext — conservative GC makes C-ext easier to write • GIL, GC don’t need to add locks or spinlocks yet • Cross-architect requirement and code simplicity • GC provides tools for C-ext use
  8. Parallel GC • Many threads mark and sweep • Java

    6 - (CMS) concurrent mark and sweep is in fact parallel GC… (-XX:+CMSIncrementalMode)
  9. Concurrent GC • Low to zero stop time • Usually

    achieved by incremental mark/sweep or a separate GC thread • No STW (stop-the-world)
  10. How CRuby Achieves “Concurrent” • Trade throughput (~10%) and code

    complexity to reduce pause time • one mark -> one sweep • one mark -> many sweeps (lazy sweep) • many marks -> many sweeps (tri-color marking)
  11. Generational • Based on heuristics: young objects die young •

    Can not do semispace or mark-compact GC for conservative GC, hard to make efficient pointer- rewriting for platforms, hard for C-Ext
  12. Other Optimizations: Bit Marking • There was bit Marking for

    Copy-On-Write friendly • To represent colors, 2 bits per object is used (the result is 4 bits)
  13. Ways to Control GC • OOB GC (out of band

    GC) in unicorn, passenger • GC.stress • GC.stop … GC.start • rb_gc_mark() rb_gc_register()
  14. Performance Tools • gdb/lldb • rbtrace • tmm1/stackprof • ko1/gc_tracer

    • require 'gc_tracer' • GC::Tracer.start_logging("log.txt")