Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On sending methods

Avatar for Urabe Shyouhei Urabe Shyouhei
September 04, 2020

On sending methods

Avatar for Urabe Shyouhei

Urabe Shyouhei

September 04, 2020
Tweet

More Decks by Urabe Shyouhei

Other Decks in Technology

Transcript

  1. About myself • Urabe, Shyouhei • @shyouhei • Ruby committer

    @ Money Forward, Inc. • 2020’s #2 most active `ruby/ruby` committer, according to GitHub.
  2. Samples: 39K of event 'cycles', Event count (approx.): 499087050327 Overhead

    Command Shared Object Symbol 24.91% thread_pool.rb* ruby [.] vm_exec_core 2.90% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.59% thread_pool.rb* ruby [.] method_entry_get 2.52% thread_pool.rb* ruby [.] vm_call_cfunc_with_frame.isra.0 2.48% thread_pool.rb* ruby [.] st_lookup 2.47% thread_pool.rb* ruby [.] rb_id_table_lookup 2.15% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 1.82% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 1.71% thread_pool.rb* ruby [.] find_table_entry_ind 1.70% thread_pool.rb* ruby [.] gc_sweep_step 1.47% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.30% thread_pool.rb* ruby [.] find_entry 1.17% thread_pool.rb* ruby [.] setup_parameters_complex 1.11% thread_pool.rb* libc-2.31.so [.] malloc 1.03% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.93% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.92% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.90% thread_pool.rb* ruby [.] ruby_sip_hash13 0.89% thread_pool.rb* ruby [.] gc_mark_children 0.89% thread_pool.rb* ruby [.] vm_call_method_each_type 0.81% thread_pool.rb* ruby [.] vm_search_method.isra.0 0.75% thread_pool.rb* ruby [.] vm_call_ivar 0.67% thread_pool.rb* ruby [.] match_at
  3. So what is vm_call_iseq_setup? zsh % global -xr vm_call_iseq_setup vm_call_iseq_setup

    134 vm_eval.c vm_call_iseq_setup(ec, reg_cfp, calli vm_call_iseq_setup 2757 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup, T vm_call_iseq_setup 2758 vm_insnhelper.c return vm_call_iseq_setup(ec, cfp, callin zsh %
  4. So what is vm_call_iseq_setup? • It sets up VM stack

    frame, right at the moment when a method is about to run. • This function is not designed to be called frequently. • Normal method calls must skip it, because there are inline method caches.
  5. Inline method cache == disasm: #<ISeq:foo@-e:1 (1,0)-(1,21)> (catch: FALSE) 0000

    putself ( 1)[LiCa 0001 putself 0002 opt_send_without_block <calldata!mid:pp, argc:1, FCALL|ARGS_SIMPLE> 0004 leave [Re]
  6. zsh % global -xr CC_SET_FASTPATH CC_SET_FASTPATH 1635 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup_func(ci,

    param_size, CC_SET_FASTPATH 2239 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_iseq_setup, TRUE); CC_SET_FASTPATH 2244 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_cfunc, TRUE); CC_SET_FASTPATH 2251 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_attrset, !((ci->flag & VM_CALL_AR CC_SET_FASTPATH 2258 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_ivar, !(ci->flag & VM_CALL_ARGS_ CC_SET_FASTPATH 2263 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2267 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_bmethod, TRUE); CC_SET_FASTPATH 2278 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_send, TRUE); CC_SET_FASTPATH 2281 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_call, TRUE); CC_SET_FASTPATH 2284 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_opt_block_call, TRUE); CC_SET_FASTPATH 2361 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2382 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2492 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_method_missing, TRUE); CC_SET_FASTPATH 2497 vm_insnhelper.c CC_SET_FASTPATH(cc, vm_call_super_method, TRUE); What is cc->call() ?
  7. In short… • `vm_call_iseq_setup` is called when inline method cache

    mishits. • This function to show up in the #2 most frequent operation in a profiler output means the cache mishits very frequently. • This is of course not by design!
  8. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters. • When enabled, we get this:
  9. [RUBY_DEBUG_COUNTER] 42122 killed by signal. [RUBY_DEBUG_COUNTER] mc_inline_hit 928210882 [RUBY_DEBUG_COUNTER] mc_inline_miss

    93975080 [RUBY_DEBUG_COUNTER] mc_global_hit 226048406 [RUBY_DEBUG_COUNTER] mc_global_miss 24044279 [RUBY_DEBUG_COUNTER] mc_global_state_miss 9246312 [RUBY_DEBUG_COUNTER] mc_class_serial_miss 84728768 [RUBY_DEBUG_COUNTER] mc_cme_complement 40695315 [RUBY_DEBUG_COUNTER] mc_cme_complement_hit 40690629 [RUBY_DEBUG_COUNTER] mc_search_super 242234049 [RUBY_DEBUG_COUNTER] mc_miss_by_nome 35088 [RUBY_DEBUG_COUNTER] mc_miss_by_distinct 31311981 [RUBY_DEBUG_COUNTER] mc_miss_by_refine 0 [RUBY_DEBUG_COUNTER] mc_miss_by_visi 1002898 [RUBY_DEBUG_COUNTER] mc_miss_spurious 61625113 [RUBY_DEBUG_COUNTER] ccf_general 37693318 [RUBY_DEBUG_COUNTER] ccf_iseq_setup 105021309
  10. Cache hit rate • Recap `vm_search_method`, where you see there

    already are cache hit rate counters. • When enabled, we get this: 928,210,882 928,210,882 + 93,975,080 ≅ 90.80%
  11. Deep dive into cache mishits • Now that we know

    the cache mechanism is somehow not working… • In order to fix it we have to know what exactly is going on when a cache mishits.
  12. zsh % LC_ALL=C bundle exec script/bench.rb 2>&1 | sort |

    uniq -c | sort -nr 14894134 cache eviction: ActiveModel::AttributeSet#fetch_value is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:39 4139225 cache eviction: Concurrent::Map#[] is called from concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/map.rb:133 3912799 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 3329993 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/primary_key.rb:19 1739540 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 1722911 cache eviction: ActionView::OutputBuffer#<< is called from actionview-6.0.3.1/lib/action_view/buffers.rb:29 1670132 cache eviction: Module#site_id_lookup is called from message_bus-3.3.1/lib/message_bus/distributed_cache.rb:154 1285677 cache eviction: ActiveRecord::Type::HashLookupTypeMap#fetch is called from activerecord-6.0.3.1/lib/active_record/connection_adapters/postgresql_adapte 1158602 cache eviction: Post#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:17 1111963 cache eviction: ActionDispatch::Journey::Visitors::Each#visit is called from actionpack-6.0.3.1/lib/action_dispatch/journey/visitors.rb:162 1079094 cache eviction: NilClass#blank? is called from activesupport-6.0.3.1/lib/active_support/core_ext/object/blank.rb:26 973103 cache eviction: ActiveRecord::Reflection::ThroughReflection#options is called from activerecord-6.0.3.1/lib/active_record/reflection.rb:806 973103 cache eviction: ActiveRecord::Reflection::ThroughReflection#active_record is called from activerecord-6.0.3.1/lib/active_record/reflection.rb:806 940974 cache eviction: Arel::Visitors::PostgreSQL#visit_Arel_Attributes_Attribute is called from activerecord-6.0.3.1/lib/arel/visitors/visitor.rb:30 740641 cache eviction: Post#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/associations/belongs_to_association.rb:119 686061 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/primary_key.rb:19 675719 cache eviction: ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#_quote is called from activerecord-6.0.3.1/lib/active_record/connection_adapters/po 610875 cache eviction: User#respond_to? is called from activerecord-6.0.3.1/lib/active_record/attribute_methods.rb:212 609323 cache eviction: Class#type_for_attribute is called from activerecord-6.0.3.1/lib/active_record/table_metadata.rb:33 604621 cache eviction: Topic#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:32 555429 cache eviction: User#read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods.rb:323 555429 cache eviction: User#_read_attribute is called from activerecord-6.0.3.1/lib/active_record/attribute_methods/read.rb:32 544500 cache eviction: Arel::Visitors::PostgreSQL#visit_Arel_Nodes_BindParam is called from activerecord-6.0.3.1/lib/arel/visitors/visitor.rb:30 541906 cache eviction: ActiveModel::Type::Integer#cast is called from activemodel-6.0.3.1/lib/active_model/type/helpers/numeric.rb:19
  13. So? • `ActiveRecord::AttributeMethods::Read` is… • A Module. • Included from

    (`discourse`’s) `User` etc. • Which means… • Its `#read_attribute` is shared across several classes.
  14. Actual scenario 1. `User#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2.

    Populates its call cache with `User#_read_attribute` 2. `Post#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2. Evicts previous `User#_read_attribute` entry and populates its own `Post#_read_attribute` 3. `User#read_attribute` is called. 1. Runs `AR::AM::Read#read_attribute` 2. Evicts previous `Post#_read_attribute` entry and populates its own `User#_read_attribute`
  15. Recap the problem • Inline method cache mishits too frequently.

    • Mishits happen when a method is shared among several classes possibly due to: • Inheritance • Module inclusion
  16. How do we tackle this? • Inline call cache to

    consider class hierarchy? • NG; there are refinements. Method resolution can vary from call site to call site. Class hierarchy is not suitable for the needs. • Also, it is dead slow to scan inheritance tree every time a method is called. • Remember (class, method) tuple for each and every call site • → Way to go.
  17. The new design • An inline cache was a map

    from (class, symbol) tuple to a method body. We modified it to stores multiple classes, which share the identical method name for identical method body. • Ko1 told me that this is a variant of polymorphic inline caching. • Hölzle et al., “Optimizing dynamically-typed object-oriented languages with polymorphic inline caches” in Proceedings of ECOOP ’91. LNCS 512. https://doi.org/10.1007/BFb0057013
  18. Samples: 39K of event 'cycles', Event count (approx.): 499087050327 Overhead

    Command Shared Object Symbol 24.91% thread_pool.rb* ruby [.] vm_exec_core 2.90% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.59% thread_pool.rb* ruby [.] method_entry_get 2.52% thread_pool.rb* ruby [.] vm_call_cfunc_with_frame.isra.0 2.48% thread_pool.rb* ruby [.] st_lookup 2.47% thread_pool.rb* ruby [.] rb_id_table_lookup 2.15% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 1.82% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 1.71% thread_pool.rb* ruby [.] find_table_entry_ind 1.70% thread_pool.rb* ruby [.] gc_sweep_step 1.47% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.30% thread_pool.rb* ruby [.] find_entry 1.17% thread_pool.rb* ruby [.] setup_parameters_complex 1.11% thread_pool.rb* libc-2.31.so [.] malloc 1.03% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.93% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.92% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.90% thread_pool.rb* ruby [.] ruby_sip_hash13 0.89% thread_pool.rb* ruby [.] gc_mark_children 0.89% thread_pool.rb* ruby [.] vm_call_method_each_type 0.81% thread_pool.rb* ruby [.] vm_search_method.isra.0 0.75% thread_pool.rb* ruby [.] vm_call_ivar 0.67% thread_pool.rb* ruby [.] match_at
  19. Samples: 40K of event 'cycles', Event count (approx.): 502933506043 Overhead

    Command Shared Object Symbol 25.22% thread_pool.rb* ruby [.] vm_exec_core 2.61% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_0params_0local 2.60% thread_pool.rb* ruby [.] vm_call_cfunc 2.56% thread_pool.rb* ruby [.] rb_st_lookup 2.16% thread_pool.rb* ruby [.] gc_sweep_step 2.12% thread_pool.rb* ruby [.] vm_call_iseq_setup 2.06% thread_pool.rb* ruby [.] find_table_entry_ind 2.04% thread_pool.rb* ruby [.] method_entry_get 1.61% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_0start_1params_1local 1.58% thread_pool.rb* ruby [.] rb_id_table_lookup 1.56% thread_pool.rb* libc-2.31.so [.] malloc_usable_size 0.98% thread_pool.rb* ruby [.] rb_obj_is_kind_of 0.87% thread_pool.rb* ruby [.] vm_call_ivar 0.86% thread_pool.rb* ruby [.] ruby_sip_hash13 0.76% thread_pool.rb* ruby [.] gc_mark_children 0.74% thread_pool.rb* ruby [.] rgengc_check_relation.part.0 0.71% thread_pool.rb* ruby [.] rb_call0 0.71% thread_pool.rb* ruby [.] rb_get_alloc_func 0.69% thread_pool.rb* ruby [.] vm_call_iseq_setup_normal_opt_start 0.68% thread_pool.rb* ruby [.] match_at 0.68% thread_pool.rb* ruby [.] rb_vm_search_method_slowpath 0.66% thread_pool.rb* ruby [.] rb_wb_protected_newobj_of 0.62% thread_pool.rb* ruby [.] gc_mark_ptr
  20. # Event 'cycles' # # Baseline Delta Abs Shared Object

    Symbol # ........ ......... .................. .................................................... # 2.39% -0.75% ruby [.] rb_id_table_lookup 25.90% -0.34% ruby [.] vm_exec_core 2.35% -0.27% ruby [.] method_entry_get 0.88% -0.19% ruby [.] rb_vm_search_method_slowpath 0.15% +0.17% ruby [.] check_match 1.98% +0.16% ruby [.] find_table_entry_ind 0.57% +0.15% ruby [.] rb_call0 0.80% -0.15% libc-2.31.so [.] malloc +0.14% ruby [.] vm_search_method_fastpath 0.45% +0.13% ruby [.] vm_callee_setup_arg 2.52% +0.11% ruby [.] vm_call_iseq_setup_normal_0start_0params_0locals 0.41% +0.10% ruby [.] rb_vm_exec 0.61% +0.10% ruby [.] rb_get_alloc_func 0.60% +0.09% ruby [.] match_at 0.50% +0.08% ruby [.] vm_call_method_each_type 0.80% +0.08% ruby [.] vm_call_ivar 0.50% +0.07% ruby [.] rb_vm_invoke_bmethod 0.31% -0.07% ruby [.] tbl_update 0.64% -0.07% ruby [.] setup_parameters_complex 0.62% +0.07% ruby [.] rb_wb_protected_newobj_of
  21. 0 128 256 384 512 64 80 96 112 128

    144 160 Histgram Response time [msec] [ `discourse` | `script/bench.rb` | `home` ] result before after Faster
  22. Key result • Quantitative improvements: • Cache hit rate improved

    by 90.80% → 95.06% • Response time median decreased by 70msec → 69msec • Response time mode decreased by 69msec → 67msec • Profiler no longer considers `vm_call_iseq_setup` one of the hottest path. Specialised functions to show up instead.
  23. Conclusions • Found that the way interpreter caches method resolution

    does not work well with inheritances, situations like e.g. Rails. • Fixed that problem by storing multiple keys per call site. • Results in improved cache hit rate, actual Rails performance boost.
  24. Conclusions • The changeset has already merged into 2.7. •

    You can boost your Rails app TODAY by just using newer ruby!
  25. Future? • After 2.7 was released, inline cache mechanism (not

    only what I described today) was entirely discarded then rewritten by @ko1. • This was for Ractor (then-called Guild) migration. • https://bugs.ruby-lang.org/issues/16614 • This slowed things down a bit, according to him. • Would 2.7 be the fastest lethal ruby implementation we ever made? Watch out for Ruby 3! • It might be possible to re-implement polymorphic inline caches on top of Ractor. But cannot reuse this original design.