Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20 Years of JRuby

headius
August 07, 2024

20 Years of JRuby

A talk on the past 20 years of JRuby development and how it and the JDK have evolved over that time.

Presented at the JVM Language Summit 2024 in Santa Clara, California.

headius

August 07, 2024
Tweet

More Decks by headius

Other Decks in Programming

Transcript

  1. I'm Back! • Charles Oliver Nutter • @headius(@mastodon.social) • [email protected]

    • JRuby developer for 20 years • JVM user for nearly 30 years
  2. JRuby • Ruby on the JVM • Primary focus on

    Ruby compatibility and experience • Standard Ruby libs, frameworks, work fl ow all are the same • JVM for Ruby • Bring the best of the JVM to the Ruby world • Bring the challenges of Ruby to the JVM
  3. Better Scaling requests per second per MB of memory (16-way

    concurrency) 0rps/mb 0.45rps/mb 0.9rps/mb 1.35rps/mb 1.8rps/mb 1.72 rps/MB 0.92 rps/MB 0.8 rps/MB CRuby CRuby + YJIT JRuby
  4. JRuby Users • Webapps that need to scale • Packaged

    software for sale • Mobile and embedded uses • Unusual platforms with JVMs • Scripting support for Java apps • For better libraries from Java
  5. JRuby to Me • Exploring all aspects of a language

    runtime • Much more interesting than former life as JavaEE architect • Sandbox for utilizing new JVM features • Easy excuse to pester all of you for features, performance • Interesting solutions to dif fi cult problems • Immediately applicable to real-world users • Help Ruby scale, grow, gain access to larger enterprises
  6. JRuby to You • Pushing edges you didn't know you

    had • JRuby drove early stages of invokedynamic, method handles • Demands for native FFI, lightweight threads, startup/warmup • Brutal test case • If you're not testing against JRuby you should be • If we're not utilizing your work, show us how
  7. JRuby Timeline • 2001: JRuby's fi rst commits • 2004:

    My fi rst contribution • 2006: JRuby runs Rails, team joins Sun Micro • 2008: First JVM bytecode JIT, early production use, fi rst JVMLS • 2015: New IR compiler, interpreter, new JVM JIT • 2024: Leap forward to Java 17 or 21
  8. JVM Language Summit? • Languages on the agenda in 2008:

    • Fan, Groovy, Scala, Fortress, JRuby, PHP, Dynalang (framework), Clojure, Python/Jython, LINQ, Kawa, NetRexx, Common Lisp, Parrot (VM) • Languages on the agenda in 2024: • JRuby (and CodeModel/HAT, sort of?)
  9. JVM Dynamic Languages • #1 Python (Jython): no recent development,

    stuck at 2.7 • #6 JavaScript (Rhino, Nashorn): no recent development, old spec • #8 Visual Basic: Promising efforts late 2000s, none active now • #13 PHP (Quercus): long ago abandoned • #15 Ruby (JRuby): active development, ongoing prod usage • Groovy, Clojure, others: semi-dynamic, active niche use
  10. Ruby Implementations • CRuby: Still 99% of all Ruby usage

    • Ongoing native JIT effort, reducing C exts, weird parallelism • Rubinius: meta-circular VM from scratch, abandoned >10 years • Truf fl eRuby: Development has slowed, no major production users • JRuby: Silently powering large users in the 1% • Next gen JVM usage, optimizations right around the corner
  11. JRuby High-level • Parser: port of Bison to Java (through

    10), native parser (10+) • Compiler: register-based IR compiler, blocks, operands, passes • Execution: IR interpretation, JVM bytecode generation • Core classes: Similar to C logic, numerous support libraries • Libraries: Ruby ok, Java ports of C extensions, wrapped JVM libs
  12. Parsing • Extremely complex, context-sensitive grammar • Historically, only one

    complete LALR parser • Bison grammar, ported to Jay for JRuby • Recent momentum on hand-written recursive-descent parser • Likely future for us to maintain feature parity
  13. Prism Parser • Simple C-based RDP • Several years of

    work, 99% accurate • Error-tolerant, returns error AST nodes and attempts to continue • Weak detection of *invalid* syntax • Native library bound with JNI or Panama • Ship major platforms • WASM on Chicory on JVM for others (mostly to bootstrap)
  14. Execution • All Ruby code lives as source • Must

    be parsed, compiled, loaded at boot • Work fl ow does not make room for a compile phase • Eventual goal is Ruby as JVM bytecode • But also have reasonable startup time
  15. Language Compatibility • High compatibility • 99% of Ruby 3.4

    language specs • Edge cases rarely encountered by users • Prism will ease upgrade path for each Ruby release
  16. JRuby Compiler Pipeline Ruby (.rb) JIT Java Instructions (java bytecode)

    Ruby Instructions (IR) parse interpret interpreter interpret C1 compile native code better native code java bytecode interpreter execute C2 compile Java Virtual Machine JRuby Internals
  17. Mixed-Mode • Reducing work at startup • Compiling entire scripts?

    90% of code is never used • Compile method at call time? Early methods called once • Compile at threshold! Save JIT for valuable targets • Unique challenges similar to JVM itself • Evolving representation of a body of code • Interpreter and bytecode frames on JVM stack
  18. at org.jruby.RubyKernel.gets(org.jruby.dist/RubyKernel.java:322) at org.jruby.RubyKernel$INVOKER$s$0$1$gets.call(org.jruby.dist/RubyKernel$INVOKER$s$0$1$gets.gen) at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(org.jruby.dist/JavaMethod.java:821) at org.jruby.internal.runtime.methods.DynamicMethod.call(org.jruby.dist/DynamicMethod.java:212) at org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.dist/CachingCallSite.java:193)

    at org.jruby.ir.interpreter.InterpreterEngine.processCall(org.jruby.dist/InterpreterEngine.java:350) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(org.jruby.dist/StartupInterpreterEngine.java:64) at org.jruby.ir.interpreter.Interpreter.interpretFrameScope(org.jruby.dist/Interpreter.java:177) at org.jruby.ir.interpreter.Interpreter.INTERPRET_METHOD(org.jruby.dist/Interpreter.java:148) at org.jruby.internal.runtime.methods.InterpretedIRMethod.call(org.jruby.dist/InterpretedIRMethod.java:130) at org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.dist/CachingCallSite.java:193) at org.jruby.ir.interpreter.InterpreterEngine.processCall(org.jruby.dist/InterpreterEngine.java:350) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(org.jruby.dist/StartupInterpreterEngine.java:64) at org.jruby.ir.interpreter.Interpreter.interpretFrameScope(org.jruby.dist/Interpreter.java:177) at org.jruby.ir.interpreter.Interpreter.INTERPRET_METHOD(org.jruby.dist/Interpreter.java:148) at org.jruby.internal.runtime.methods.InterpretedIRMethod.call(org.jruby.dist/InterpretedIRMethod.java:130) at org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.dist/CachingCallSite.java:193) at org.jruby.ir.interpreter.InterpreterEngine.processCall(org.jruby.dist/InterpreterEngine.java:350) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(org.jruby.dist/StartupInterpreterEngine.java:64) at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(org.jruby.dist/Interpreter.java:123)
  19. Mixed-Mode Stack Trace • Interpreted backtrace managed on heap •

    Update line, walk JVM trace to fi nd interpreter frames • Splice into modi fi ed stack trace • JIT backtrace encoded in method name • Look for marker on stack, unpack backtrace element • Include relevant Java frames from JRuby
  20. at org.jruby.RubyKernel.gets(org.jruby.dist/RubyKernel.java:322) at org.jruby.RubyKernel$INVOKER$s$0$1$gets.call(org.jruby.dist/RubyKernel$INVOKER$s$0$1$gets.gen) at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(org.jruby.dist/JavaMethod.java:821) at org.jruby.internal.runtime.methods.DynamicMethod.call(org.jruby.dist/DynamicMethod.java:212) at org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.dist/CachingCallSite.java:193)

    at org.jruby.ir.interpreter.InterpreterEngine.processCall(org.jruby.dist/InterpreterEngine.java:350) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(org.jruby.dist/StartupInterpreterEngine.java:64) at org.jruby.ir.interpreter.Interpreter.interpretFrameScope(org.jruby.dist/Interpreter.java:177) at org.jruby.ir.interpreter.Interpreter.INTERPRET_METHOD(org.jruby.dist/Interpreter.java:148) at org.jruby.internal.runtime.methods.InterpretedIRMethod.call(org.jruby.dist/InterpretedIRMethod.java:130) at org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.dist/CachingCallSite.java:193) at org.jruby.ir.interpreter.InterpreterEngine.processCall(org.jruby.dist/InterpreterEngine.java:350) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(org.jruby.dist/StartupInterpreterEngine.java:64) at org.jruby.ir.interpreter.Interpreter.interpretFrameScope(org.jruby.dist/Interpreter.java:177) at org.jruby.ir.interpreter.Interpreter.INTERPRET_METHOD(org.jruby.dist/Interpreter.java:148) at org.jruby.internal.runtime.methods.InterpretedIRMethod.call(org.jruby.dist/InterpretedIRMethod.java:130) at org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.dist/CachingCallSite.java:193) at org.jruby.ir.interpreter.InterpreterEngine.processCall(org.jruby.dist/InterpreterEngine.java:350) at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(org.jruby.dist/StartupInterpreterEngine.java:64) at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(org.jruby.dist/Interpreter.java:123)
  21. at blah .️ ❤ def bar #2(blah.rb:6) at blah .️

    ❤ def foo #1(blah.rb:2) at blah .️ ❤ {} \=\^main\_ #0(blah.rb:9)
  22. at blah .️ ❤ def foo #1(-e:1) • blah: run

    at command line using jruby blah.rb • ❤: Ruby marker (formerly $RUBY$) • def: Method de fi nition marker (formerly $method$ • foo: Method name (encoded for JVMS as needed) • #1: Unique body within compilation unit
  23. at blah .️ ❤ {} \=\^main\_ #0(-e:1) • blah: run

    at command line using jruby blah.rb • ❤: Ruby marker • {}: Method de fi nition marker • \=\^main\_: Main body marker (encoded for JVMS) • #0: Unique body within compilation unit
  24. at blah .️ ❤ def bar #2(blah.rb:6) at blah .️

    ❤ def foo #1(blah.rb:2) at blah .️ ❤ {} \=\^main\_ #0(blah.rb:9)
  25. at blah .️ ❤ def bar #2(blah.rb:6) at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@21/DirectMethodHandle$Holder) at

    java.lang.invoke.LambdaForm$MH/0x00000070015ab000.invoke(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015abc00.reinvoke(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015ac000.guard(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015abc00.reinvoke(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015ac000.guard(java.base@21/LambdaForm$MH) at java.lang.invoke.Invokers$Holder.linkToCallSite(java.base@21/Invokers$Holder) at blah .️ ❤ def foo #1(blah.rb:2) at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@21/DirectMethodHandle$Holder) at java.lang.invoke.LambdaForm$MH/0x00000070015ab000.invoke(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015abc00.reinvoke(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015ac000.guard(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015abc00.reinvoke(java.base@21/LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/0x00000070015ac000.guard(java.base@21/LambdaForm$MH) at java.lang.invoke.Invokers$Holder.linkToCallSite(java.base@21/Invokers$Holder) at blah .️ ❤ {} \=\^main\_ #0(blah.rb:9)
  26. InvokeDynamic • InvokeDynamic makes JRuby work • Bytecode output consists

    of loads, stores, and invokedynamic • Controlled use of MH chains, keep the graph compact • Default execution uses minimal indy • Startup hit is nontrivial • Really want to be all indy all the time
  27. InvokeDynamic Performance Times faster with indy enabled 0 1.25 2.5

    3.75 5 Mandelbrot Red/Black 4.05x 3.92x 3.74x 3.68x 3.72x 1.97x Java 8 indy Java 11 indy Java 17 indy
  28. Indy + PEA? Times faster with indy enabled 0 4

    8 12 16 Mandelbrot Red/Black 3.13x 15.7x 4.05x 3.92x 3.74x 3.68x 3.72x 1.97x Java 8 indy Java 11 indy Java 17 indy GraalVM indy Escape analysis But not always better
  29. Are We Fast Yet? • It's Complicated • Truf fl

    eRuby optimizes much more aggressively • Method splitting and specialization, deep inlining, PEA • Uses many times more memory and warms up much slower • Support for C extensions cripples optimizer on real-world apps • JRuby balances optimization and usability
  30. LambdaForm is Fat IR • Generally do optimize well given

    enough time • High overhead early on • Easy to trip up and then inlining falls apart • Nightmare for tooling: thread dumps, pro fi le splitting and pollution • Hard to fi gure out why things fail to optimize • Synchronicity with hidden classes or code models?
  31. Startup Time • Despite all our tricks, still far from

    CRuby • Parser, compiler, interpeters, core: all start cold • Signi fi cant amount of Ruby code loaded at every boot • Interpreter heats up faster than precompiled Ruby • Thousands of classes at boot • Thousands more as methods JIT, indy sites stabilize
  32. ruby -e 1 0s 0.45s 0.9s 1.35s 1.8s -e 1

    1.686s 0.053s CRuby 3.2 JRuby 9.4
  33. rails new testapp 0s 1.5s 3s 4.5s 6s rails new

    testapp 5.918s 0.314s CRuby JRuby
  34. --dev fl ag • Hello -client mode! • Turn everything

    off • Startup interpreter only • Limited invokedynamic • Parallel GC • -XX:TieredStopAtLevel=1
  35. rails new testapp 0s 1.5s 3s 4.5s 6s rails new

    testapp --skip-bundle 2.7s 5.918s 0.314s CRuby JRuby JRuby --dev
  36. CRaC • Coordinated Restore at Checkpoint • CRIU: Checkpoint and

    Restore In Userspace • Snapshot a process and exit • Clone that and jump back in for new processes • Primarily a Linux technology (work ongoing) • Cumbersome to deal with external resources (open IO, native stuff)
  37. rails new testapp 0s 1.5s 3s 4.5s 6s rails new

    testapp --skip-bundle 0.89s 2.7s 5.918s 0.314s CRuby JRuby JRuby --dev JRuby CRaC
  38. Leyden • Holy Grail for startup, warmup? • Many levels

    of "AOT" • Super CDS • Method pro fi les • Compiled code • Precompilation of a subset of libraries becomes feasible
  39. ruby -e 1 0s 0.45s 0.9s 1.35s 1.8s -e 1

    0.625s 0.21s 1.271s 1.686s 0.053s CRuby 3.2 JRuby 9.4 JRuby 9.4 --dev JRuby 9.4 CRaC Leyden
  40. rails new testapp 0s 1.5s 3s 4.5s 6s rails new

    testapp --skip-bundle 4.55s 0.89s 2.7s 5.918s 0.314s CRuby JRuby JRuby --dev JRuby CRaC Leyden
  41. Core Compatibility • High level of compatibility for core classes

    • String, Regexp, Array, Hash, numerics, IO, Thread, Fiber • 98% passing Ruby specs • Hidden complexity in many of these
  42. String • Custom String implementation to match CRuby • byte[]

    contains encoded characters • Encoding de fi nes structure, manipulation of those characters • Necessitates a custom Regexp implementation • Inef fi cient to roundtrip through String/char[] • java.util.regex falls apart easily (deep alternations over fl ow, etc)
  43. jcodings and joni • jcodings: character encoding and transcoding library

    • Designed based on CRuby encoding subsystem • Direct byte[] to byte[] transcoding of most encodings • joni: fl exible bytecoded regex engine • Works directly against byte[] • Pluggable encoding and dialect support
  44. IO • Interruptible threads require selectable channels • Standard NIO

    channels self-destruct on interrupt • Only Sockets, Pipes provide selection • Files, fi fos, stdio, process IO must be native in JRuby • Channel abstraction atop native IO via FFI • Can't safely access fi le descriptor from Channel
  45. Java Native Runtime • Presented here in 2013 • Led

    to my fi rst and only JEP: 191 • Precursor to Panama • Minimal FFI stub, utility libs for binding, common POSIX functions, native IO, unix sockets • Jorn Vernee (Oracle) did a large rework of JNR atop Panama
  46. Panama Expectations • Faster native integration • Obviously for core

    IO • Better Ruby FFI to help eliminate C extensions • Code generation for us and for Ruby • jextract to generate our JNR wrapper APIs • jextract as a library to generate Ruby FFI compatible with CRuby
  47. Early Success Story • SQLite-JDBC • Used by JRuby to

    support Ruby database wrappers • Java Native Interface (JNI) currently limits throughput • Proof-of concept Panama-based version working now • 2x performance for most operations
  48. Fibers • Introduced in Ruby 1.9 • Userland cooperative threading

    • Separate call stacks • Explicit scheduling, direct hand-off • Growing use in CRuby to simulate parallelism • Hide blocking operations behind coroutines
  49. Internal and External Iteration ary = ["foo", "bar", "baz"] ary.each

    {|str| puts str} # internal iteration enum = ary.each 3.times { puts enum.next } # external iteration via fiber
  50. Structured Concurrency • Throw a fi ber at every session

    • Blocking IO goes back to scheduler and picks a ready Fiber • Single thread handling 1000s of sessions without IO reactor • Hand-off must be fast for throughput server = TCPServer.new(...) loop { socket = server.accept Fiber.new { handle_request(socket) } } def handle_request(socket) request = socket.read result = do_something_with(data) socket.write(result) end
  51. Fibers on JRuby • Before Java 21: native thread per

    fi ber • Limited scaling, poor performance characteristics • Extensive contortions to tidy up abandoned fi bers • Java 21+: virtual thread per fi ber • Practically no other changes in JRuby were required • Already avoiding monitors for interruptability
  52. 10M Fiber resume + yield 0s 35s 70s 105s 140s

    10 100 1000 10000 100000 1000000 36.32 43.5 36.3 133 58 56.1 47.8 46.1 46.9 136 130 3.99 2.05 1.34 1.32 CRuby JRuby vthreads JRuby threads lower is better
  53. 1M Fiber create, resume x2 0s 7.5s 15s 22.5s 30s

    24.5 1.12 CRuby JRuby vthreads
  54. Loom and JRuby • Loom is almost exactly what we

    need • Clearly makes many fi bers possible • Hand-off performance isn't that bad • But it solves a bigger problem than we need • We are implementing coroutines on vthreads on coroutines • Ruby folks implementing a scheduler...that we run on a scheduler
  55. Summary • JVM is a quirky but powerful target for

    dynamic languages • InvokeDynamic works well if we can massage bytecode enough • CDS, CRaC, Leyden will probably make startup time "good enough" • Panama will do wonders for native integration, C ext alternatives • Loom is like 90% solution for fi bers • Every other talk here brings new potential to JRuby
  56. JRuby Future • Largely caught up with current Ruby HEAD

    compatibility • Making a move to 17 or 21 minimum • Free to do much more aggressive optimizations • If Ruby succeeds, JRuby will continue to succeed
  57. JRuby Funding • 2006-2024: corporate sponsorship • Sun Microsystems, Engine

    Yard, Red Hat • Today: it's just me • Sponsors on GitHub (https://github.com/sponsors/headius) • Commercial support (https://headius.com/support) • Seeking OSS, research grants for new work
  58. JVMLS Future • What happened to my DaVinci Machine? •

    The dream of a multi-language JVM • JRuby has proven it can be done • We want to go on proving it • How do we lure the others back? • What's missing? What stops them?
  59. Thank You! • Charles Oliver Nutter • [email protected] • @headius(@mastodon.social)

    • blog.headius.com • github.com/sponsors/headius • headius.com/support • github.com/jruby/jruby • github.com/jruby/jcodings • github.com/jruby/joni • github.com/jnr