Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improve my own Ruby

Improve my own Ruby

RubyKaigi2025

monochrome

April 18, 2025
Tweet

More Decks by monochrome

Other Decks in Programming

Transcript

  1. About me - General Surgeon (hardware engineer for human) -

    @s_isshiki1969 - https://github.com/sisshiki1969 - Loves Ruby and Rust and JIT compiler - My grandmother was born in Matsuyama - My father spent his childhood in Matsuyama
  2. monoruby - https://github.com/sisshiki1969/monoruby - A yet another Ruby implementation with

    JIT compiler - Written in Rust from (almost) scratch - parser, garbage collector, interpreter - Only x86-64 / Linux is supported - new! Supports RubyGems. - not yet! Struggling with Bundler.
  3. Compatibility - Supports - Bignum, Fiber, Binding - Redefining basic

    ops methods (like Integer#+) - Does NOT support - Native C extensions (but has alternatives) - Native threads - Encoding: supports only UTF-8 and ASCII-8BIT - ObjectSpace, TracePoint, Refinements, call/cc..
  4. deoptimization stats (top 20) FuncId func name [index] count ------------------------------------------------------------------------------------------------------------

    ( 1884) CPU#fetch [:00003] 9483 %2 = %2.[%1] [Method][Integer] ( 2281) block in PPU#setup_lut [:00001] 1018 %2 = %1.object_id() POLY [Array] FuncId(11) ( 2370) block in Parser#find_option [:00001] 210 %2 = %1.to_s() POLY [Symbol] FuncId(14) ( 2013) block in #<Class:Optcarrot::CPU>#op [:00003] 191 %6 = %6.is_a?(%7) POLY [Array] FuncId(21) global method cache stats (top 20) func name class count ------------------------------------------------------------------------ _bne Optcarrot::CPU 17914 _clc Optcarrot::CPU 7898 full method exploration stats (top 20) func name class count ------------------------------------------------------------------------ attr_reader #<Class:Optcarrot::Config> 30 key? Hash 29 jit recompile stats (top 20) FuncId func name class count -------------------------------------------------------------------------------------------- ( 2346) block in Config #<Class:Optcarrot::Config> 12 ( 492) Array#map Array 7 debug option “--profile”
  5. <-- non-traced branch in <block in Specification> FuncId(846). [:00011] _%6

    = %6 === %5 [TrueClass][NilClass] <-- non-traced branch in <block in Specification> FuncId(846). [:00011] _%6 = %6 === %5 [TrueClass][Array] <-- non-traced branch in <block in Specification> FuncId(846). [:00011] _%6 = %6 === %5 [TrueClass][Gem::Requirement] <-- deopt occurs in <block in Specification> FuncId(847). [:00002] %3 = %2.is_a?(%3) POLY [Array] FuncId(21) caused by #<Gem::Requirement:0x00007fb96d691dc0> <-- deopt occurs in <block in Specification> FuncId(847). [:00002] %3 = %2.is_a?(%3) POLY [Gem::Requirement] FuncId(21) caused by #<Gem::Requirement:0x00007fb96d6920c0> <-- deopt occurs in <block in Specification> FuncId(847). [:00002] %3 = %2.is_a?(%3) POLY [Gem::Requirement] FuncId(21) caused by "3.6.2" <-- deopt occurs in <block in Specification> FuncId(848). [:00003] %2 = %2.nil?() POLY [Array] FuncId(64) caused by #<Gem::Requirement:0x00007fb96d691dc0> <-- deopt occurs in <block in Specification> FuncId(848). [:00003] %2 = %2.nil?() POLY [Gem::Requirement] FuncId(64) caused by #<Gem::Requirement:0x00007fb96d6920c0> <-- deopt occurs in <block in Specification> FuncId(848). [:00003] %2 = %2.nil?() POLY [Gem::Requirement] FuncId(64) caused by "3.6.2" <-- deopt occurs in <block in Symbol#to_proc> FuncId(521). [:00004] %3 = %1.send(%3,*%4) [Symbol] FuncId(22) caused by :__version_guard <-- deopt occurs in <Requirement#initialize> FuncId(1275). [:00001] %1 = %1.flatten() [Array] FuncId(307) caused by :__version_guard <-- deopt occurs in <Array#map!> FuncId(491). [:00001] %3 = %0.block_given?() [Array] FuncId(74) caused by :__version_guard <-- deopt occurs in <Array#map!> FuncId(491). [:00012] %2 = %0.size() [Array] FuncId(250) caused by :__version_guard <-- non-traced branch in <block in Requirement#initialize> FuncId(1296). [:00003] %2 = %2.parse(%1) [#<Class:Gem::Requirement>] FuncId(1274) <-- deopt occurs in <#<Class:Gem::Requirement>#parse> FuncId(1274). [:00001] %2 = Gem::Version [Gem::Version] <-- deopt occurs in <#<Class:Gem::Version>#new> FuncId(1313). [:00001] %2 = Gem::Version [Gem::Version] <-- non-traced branch in <Version#initialize> FuncId(1315). [:00003] %2 = %2.correct?(%1) [#<Class:Gem::Version>] FuncId(1311) <-- deopt occurs in <#<Class:Gem::Version>#correct?> FuncId(1311). [:00001] %2 = %1.nil?() POLY [String] FuncId(64) caused by "0.3.3" <-- deopt occurs in <Array#map> FuncId(492). [:00001] %4 = %0.block_given?() [Array] FuncId(74) caused by :__version_guard <-- deopt occurs in <Specification#add_bindir> FuncId(925). [:00001] %2 = %1.nil?() [Array] FuncId(64) caused by ["erb"] <-- non-traced branch in <Specification#files> FuncId(850). [:00009] %1 = %1.flatten() [Array] FuncId(307) <-- deopt occurs in <block in #<Class:Gem>#register_default_spec> FuncId(665). [:00002] %2 = %1.start_with?(%2) [String] FuncId(195) caused by :__version_guard debug option “--deopt”
  6. 0xaa %dst %lhs %rhs ClassId(lhs) ClassId(rhs) 0x01 %dst CallsiteId cached

    FuncId METHOD_CALL 0x82 %rcv %args pos ADD_RR 8 bytes 8 bytes Bytecode (Virtual Machine instruction) cached ClassId cached version opcode operand trace info
  7. Control frame Stack frame return addr prev rbp prev cfp

    lfp outer meta block %0 self %1 %2 %3 Local frame prev cfp outer lfp method a block b method c
  8. JIT code compile deoptimize interpreter movsx rsi,WORD PTR [r13-0x10] movzx

    rdi,WORD PTR [r13-0xe] movzx r15,WORD PTR [r13-0xc] neg rdi mov rdi,QWORD PTR [r14+rdi*8-0x30] neg rsi mov rsi,QWORD PTR [r14+rsi*8-0x30] neg r15 lea r15,[r14+r15*8-0x30] test rdi,0x1 je slow_path test rsi,0x1 je slow_path mov DWORD PTR [r13-0x8],0x6 mov DWORD PTR [r13-0x4],0x6 mov rax,rdi sub al,0x1 add rax,rsi jo slow_path mov QWORD PTR [r15],rax movabs r15,0x561fe2169000 movzx rax,BYTE PTR [r13+0x6] add r13,0x10 jmp QWORD PTR [r15+rax*8] mov rdi,QWORD PTR [r14-0x38] test rdi,0x1 je deopt mov rsi,QWORD PTR [r14-0x40] test rsi,0x1 je deopt sub rdi,0x1 add rdi,rsi jo deopt mov r15,rdi deopt: mov r13, (pc) jmp interpreter fetch & dispatch execute deoptimize
  9. %0 %1 %2 Stack slots in the local frame CPU

    registers float float float xmm0 xmm1 xmm2 RAX GPRs FPRs Hardware resources RDI R15 Memory ... (self, local variables, temporaries) ... ...
  10. 100 10 10.0 %1 %2 %3 3.14 %4 “State” of

    register (in interpreter) Stack Reg
  11. 10.0 xmm2 xmm3 xmm4 100 R15 10 %1 %2 %3

    3.14 %4 “State” of register (in compiler) GPRs FPRs Stack Reg %1: GP(R15) INTEGER %2: Stack VALUE %3: FP(XMM3) FLOAT %4: Concrete( 3.14 ) FLOAT
  12. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 10 %1 %2 %3 xmm2 xmm3 xmm4 R15 Stack Reg def area(r) r * r * 3.14 end area(10)
  13. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 xmm2 xmm3 xmm4 100 R15 class guard %1:Integer R15 := %1 * %1 check overflow link %2 to R15 10 %1 %2 %3 000023: mov rdi,QWORD PTR [r14-0x20] 000027: mov rsi,QWORD PTR [r14-0x20] 00002b: test rdi,0x1 000032: je 0xfffb572 000038: sar rsi,1 00003b: sub rdi,0x1 00003f: imul rdi,rsi 000043: jo 0xfffb5a2 000049: or rdi,0x1 00004d: mov r15,rdi Stack Reg
  14. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 xmm2 xmm3 xmm4 100 R15 link %3 to constant 3.14 10 %1 %2 %3 3.14 Stack Reg
  15. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 xmm2 xmm3 xmm4 100 R15 class guard %2:Integer 10 100 %1 %2 %3 class guard %3:Float 000050: mov QWORD PTR [r14-0x28],r15 000054: sar r15,1 000057: cvtsi2sd xmm2,r15 00005c: movq xmm3,QWORD PTR [rip+0x20] 3.14 xmm2 := (Value to f64) R15 xmm3 := 3.14 Stack Reg
  16. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 xmm2 xmm3 xmm4 100 R15 xmm4 := xmm2 * xmm3 10 100 %1 %2 %3 000064: movq xmm4,xmm2 000068: mulsd xmm4,xmm3 3.14 314.0 Stack Reg link %2 to xmm4
  17. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 314.0 xmm2 xmm3 xmm4 100 R15 10 100 %1 %2 %3 Stack Reg
  18. :00000 init_method :00001 %2 = %1 * %1 [INTEGER][INTEGER] :00002

    %3 = literal[3.14] :00003 %2 = %2 * %3 [INTEGER][FLOAT] :00004 ret %2 100.0 3.14 314.0 xmm2 xmm3 xmm4 100 R15 %2 := (f64 to Value) xmm4 10 314.0 %1 %2 %3 return %2 00006c: movq xmm0,xmm4 000070: call 0xfffcfa57 000075: mov QWORD PTR [r14-0x28],rax 000079: leave 00007a: ret Stack Reg
  19. Specialization “generic” Array#each ary.each do |i| puts i end do

    |i| puts i end ary.each do |x,y| x + y end do |x,y| x + y end ary.each.flat_map class Array def each .. yield .. end end We don’t know until runtime: 1. which block is given (or not given) 2. a signature of given block
  20. compile at one time Specialization “specialized” Array#each 1000.times do |i|

    puts i end do |i| puts i end class Array def each .. yield .. end end
  21. <Array#each> BB0 :00000 [02] init_method reg:3 arg:0 stack_offset:7 :00001 [04]

    %3 = %0.block_given?() [Array] FuncId(74) :00003 [04] %2 = %3 :00004 [03] %2 = !%2 :00005 [02] condnotbr %2 => BB2 BB1 :00006 [03] %2 = :each :00007 [03] %2 = %0.to_enum(%2) [<INVALID>] - :00009 [02] ret %2 BB2 :00010 [02] %1 = 0: i32 BB3 :00011 [02] loop_start counter=0 jit-addr=0x0 :00012 [03] %2 = %0.size() [Array] FuncId(249) :00014 [03] _%2 = %1 < %2 [Integer][Integer] :00015 [02] condnotbr _%2 => BB5 BB4 :00016 [03] %2 = %0.[%1] [Array][Integer] :00017 [03] _ = yield(%2) :00019 [02] %1 = %1 + 1: i16 [Integer][Integer] :00020 [02] br => BB3 BB5 :00021 [02] loop_end BB6 :00022 [02] ret %0 class Array def each return self.to_enum(:each) if !block_given? i = 0 while i < self.size yield self[i] i += 1 end self end end
  22. BB 0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74)

    :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB 1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB 2 :00010 %1 = 0: i32 BB 3 :00011 loop_start :00012 %2 = %0.size() [Array] FuncId(249) :00014 _%2 = %1 < %2 [Integer][Integer] :00015 condnotbr _%2 => BB5 BB 4 :00016 %2 = %0.[%1] [Array][Integer] :00017 _ = yield(%2) :00019 %1 = %1 + 1: i16 [Integer][Integer] :00020 br => BB3 BB 5 :00021 loop_end BB 6 :00022 ret %0
  23. BB 0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74)

    :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB 1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB 2 :00010 %1 = 0: i32
  24. fn kernel_block_given(...) -> bool { let dst = callsite.dst; if

    let Some(true) = jitctx.has_block() { if let Some(dst) = dst { bb.def_concrete_value(dst, Value::bool(true)); } } else { ir.inline(|gen, _, _| { let exit = gen.jit.label(); monoasm! { &mut gen.jit, movq rax, (FALSE_VALUE); movq rdi, [r14 - (LFP_BLOCK)]; testq rdi, rdi; jz exit; cmpq rdi, (NIL_VALUE); jeq exit; movq rax, (TRUE_VALUE); exit: } }); bb.rax2acc(ir, dst); } true } Inline (dynamic) assembly If we know the block is given, no code generated. (state change only) If not, we can generate asm directly Kernel#block_given?
  25. BB 0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74)

    :00003 %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB 1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB 2 :00010 %1 = 0: i32 in this specialized context, always true we can remove this branch
  26. BB0 :00000 init_method :00001 %3 = %0.block_given?() [Array] FuncId(74) :00003

    %2 = %3 :00004 %2 = !%2 :00005 condnotbr %2 => BB2 BB1 :00006 %2 = :each :00007 %2 = %0.to_enum(%2) :00009 ret %2 BB2 :00010 %1 = 0: i32 BB3 :00011 loop_start :00012 %2 = %0.size() [Array] FuncId(249) :00014 _%2 = %1 < %2 [Integer][Integer] :00015 condnotbr _%2 => BB5 BB4 :00016 %2 = %0.[%1] [Array][Integer] :00017 _ = yield(%2) :00019 %1 = %1 + 1: i16 [Integer][Integer] :00020 br => BB3 BB5 :00021 loop_end BB6 :00022 ret %0 asm inlined asm inlined
  27. Generally, yield is slow BB 4 :00016 %2 = %0.[%1]

    :00017 _ = yield(%2) :00019 %1 = %1 + 1: i16 :00020 br => BB3 in this specialized context, we know the signature of callee block in compile time. - can not know which block is given - can not know the signature of callee - must use indirect branch