Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimizing Production Performance with MRI JIT ...

Optimizing Production Performance with MRI JIT / RubyConf 2021

RubyConf 2021
https://rubyconf.org/

Takashi Kokubun

October 27, 2021
Tweet

More Decks by Takashi Kokubun

Other Decks in Programming

Transcript

  1. Agenda • Introduction to MRI JIT • Tuning JIT performance

    for Rails • Warming up MRI JIT • The future of MRI JIT
  2. MJIT • Merged in Ruby 2.6 • Optionally enabled by

    --jit • Run a C compiler at runtime • Support GCC, Clang, and MSVC
  3. YJIT • To be merged in Ruby 3.1 • Optionally

    enabled by --yjit • In-process x86 assembler
  4. MIR • JIT framework, motivated by MJIT • Planned to

    be integrated with MRI • Inline C functions without a C compiler
  5. Layers of JIT implementation VM JIT compiler Codegen RTL YARV

    RTL-MJIT MJIT (C compiler) YARV-MJIT (mjit_compile.c) MIR YJIT yjit_codegen.c MIR-based JIT Ruby 2.6~3.0 Feature #12589 MIR YJIT
  6. Key ideas • Some versions are slow • TracePoint, GC.compact,

    and Ractor • Change --jit-max-cache for Ruby 3.0 • Wait until everything is compiled
  7. Some versions are slow • Don't use Ruby 2.x ◦

    Ruby 3.0 has better CPU cache efficiency • Even Ruby 3 has slow versions ◦ MJIT doesn't work properly in Ruby 3.0.1 ◦ Ruby 3.0.0 is OK, but others might have throttling issues
  8. TracePoint, GC.compact, and Ractor • MJIT can be disabled when

    GC.compact or TracePoint is used ◦ Ruby 3.1 shows "JIT cancel" on --jit-verbose=1 when it happens • However, Ruby 3.1 supported TracePoint :class events for Zeitwerk • MJIT has performance issues when you have Ractors
  9. Change --jit-max-cache for Ruby 3.0 • The default --jit-max-cache is

    100 in Ruby 3.0 • It should be large enough to compile everything, like 10,000 ◦ Use --jit-verbose=1 to see what's happening
  10. Wait until everything is compiled • When a C compiler

    is running, the interpreter becomes slower ◦ We've found no workaround so far • So be sure to see the end of compilation with --jit-verbose=1 ◦ This can take some minutes
  11. --jit-min-calls • The default of --jit-min-calls is 10,000 • You

    need to wait until the benchmarked path is used 10,000 times
  12. The lifecycle of JIT-ed code • MJIT's code has multiple

    stages: ◦ Fragmented code with full optimizations ◦ Fragmented code with partial optimizations ◦ Compacted code with partial optimizations • All methods should be in the last stage to see the peak performance
  13. JIT recompile • MJIT disables optimizations that didn't work and

    recompiles the code • Look for "JIT recompile" shown by --jit-verbose=1 • Your log should NOT end with "MJIT recompile" to see the peak performance
  14. Optimization switches for each method • disable_ivar_cache • disable_exivar_cache •

    disable_send_cache • disable_inlining • disable_const_cache
  15. JIT compaction • Once everything is compiled, MJIT schedules "JIT

    compaction" • Your --jit-verbose=1 log should end with this to see the peak performance
  16. Why do we have multiple JITs? • Are we competing?

    ◦ No, we contribute to each other's project as well • Multi-tier JIT? ◦ Efficiently mixing the code of MJIT and YJIT might be hard ◦ At least MJIT needs to be replaced by MIR for better control
  17. A short-term idea • We should probably focus on YJIT

    ◦ It is already faster and has more developers than MJIT ◦ MJIT's warmup is too slow by design
  18. A long-term idea • Unblock inlining over C methods ◦

    YJIT cannot inline and optimize C methods as is ◦ MJIT has Ruby → C inlining, but not C → Ruby yet ◦ Rewrite more C methods to Ruby and/or integrate MIR
  19. Conclusion • There's a way to speed up Rails with

    MJIT • We're shifting to YJIT for better performance