Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Make Regexp#match much faster

Make Regexp#match much faster

TSUYUSATO Kitsune

May 11, 2023
Tweet

More Decks by TSUYUSATO Kitsune

Other Decks in Programming

Transcript

  1. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Hiroya Fujinami ‣ Ph.D. student at SOKENDAI (NII) ‣ t @make_now_just g @makenowjust ‣ Ruby committer since 2022/12 - I contributed for speed up Regexp#match in CRuby ⚡ at Cookpad Inc. internship 
  2. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Agenda ‣ Regexp is powerful 
 Powerful features and applications in Ruby's Regexp ‣ ReDoS — a vulnerability about Regexp 
 What a vulnerability is ReDoS (Regular Expression Denial of Service)? ‣ How to speed up Regexp matching 
 The optimization by memoization makes Regexp matching faster. ‣ Future work 
 Talking about feature vision on Regexp matching 
  3. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Regexp is powerful 
 Powerful features and applications in Ruby's Regexp 
  4. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Ruby's Regexp is powerful! ‣ Ruby's Regexp has some rich features: - multiple character encoding support, - look-around operators, atomic groups, conditional branches, absent operator, - subexpression calls, and back-references (irregular extensions). → These features help our developments and hobbies.  (?=foo) (?!foo) (?<=foo) (?<!foo) (?>foo) (?(x)foo|bar) look-around operators atomic group conditional branches (?<x>foo){0}\g<x> subexpression call (?<x>foo*)\k<x> back-reference (?~foo) absent operator
  5. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan How powerful is it? ‣ The following Regexp computes (veri fi es) a sum equation of two binary digits. - e.g. "01+01=10" and "110+001=111" "01+01=00"  /(?<s>[01](?=(?:(?<l0>(?<=0))|(?<l1>(?<=1)))[01]*\+(?<r>(?:\k<r-1>|(?!\k<r-1>)) [01])(?:(?<r0>(?<=0))|(?<r1>(?<=1)))[01]*=(?<a>(?:\k<a-1>|(?!\k<a-1>))[01])(?:(? <a0>(?<=0))|(?<a1>(?<=1)))[01]*\z(?:(?<c>\k<l0+0>\k<r0+0>\k<a1+0>| \k<l1+0>\k<r0+0>\k<a0+0>|\k<l0+0>\k<r1+0>\k<a0+0>|\k<l1+0>\k<r1+0>\k<a1+0>)| \k<l0+0>\k<r0+0>\k<a0+0>|\k<l1+0>\k<r0+0>\k<a1+0>|\k<l0+0>\k<r1+0>\k<a1+0>| \k<l1+0>\k<r1+0>\k<a0+0>)(?:\k<l0+0>(?:\k<r0+0>(?:\k<a0+0>(?!\k<c-1>)|\k<a1+0>(?! \k<c-1>))|\k<r1+0>(?:\k<a0+0>\k<c-1>|\k<a1+0>(?!\k<c-1>)))|\k<l1+0>(?:\k<r0+0>(?: \k<a0+0>\k<c-1>|\k<a1+0>(?!\k<c-1>))|\k<r1+0>(?:\k<a0+0>\k<c-1>| \k<a1+0>\k<c-1>))))\g<s>|(?!\k<c-1>)\+\k<r-1>=\k<a-1>){0}\A\g<s>\z/ Theoretically, this language (a set of strings) is neither regular nor context-free.
  6. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan How powerful is it? ‣ A brainf*ck implementation in Ruby's Regexp exists. - https://github.com/shinh/hack/blob/master/bf_rb_reg/bf.rb - BF_REG =~ bf + BF_SUFFIX executes a bf program on Regexp matching 
 (loops may stop after 256 iterations.) ‣ Conclusion → ultimately powerful! 
  7. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan With great power 
 comes great responsibility. 
  8. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan ReDoS — a vulnerability about Regexp 
 What a vulnerability is ReDoS (Regular Expression Denial of Service)? 
  9. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan ReDoS (Regular Expression Denial of Service) ‣ A vulnerability about Regexp matching. ‣ The time taken for Regexp matching can be explosive. - time ruby -e '/\A(a|a)*\z/ =~ "a" * 30 + "b"' 
 30.99s user 0.09s system 99% cpu 31.188 total - The Regexp /\A(a|a)*\z/ takes an exponential matching time 
 against such an input string "a" * n + "b". 
  10. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan What a problem is Regexp matching explosion? One scenario: ‣ A web app uses /\A[^@]+@[^@]+([.][^@]+)+\z/ 
 for validating E-mail address. ‣ Then, one user sends "a@" + "." * 50 + "@" to this web app. ‣ For processing this request, web app takes some minutes 
 because the Regexp matching time is explosive. ‣ Other users see a loading window or 500 Internal Server Error ¯\_(π)_/¯. → ReDoS gives a bad experience to users and hurts a business opportunity. 
  11. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan ReDoS is a threat in the real world. ‣ For example, Cloud fl are was down for 27 minutes due to ReDoS. 
 https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ ‣ Recently, many ReDoS vulnerabilities in Ruby gems are reported. - CVE-2022-24836 (Nokogiri), CVE-2022-30122 (Rack), etc. 
 (See details in Japanese, https://zenn.dev/ooooooo_q/articles/ruby_3_2_redos) ‣ ReDoS vulnerabilities in Ruby's core libraries are also reported. → We need to prevent ReDoS in Ruby itself. 
  12. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan How to prevent ReDoS ‣ Some regexes cause ReDoS and some do not. ‣ ReDoS can be prevented if we are careful...? ‣ One way: using a ReDoS detection tool. - https://github.com/makenowjust-labs/recheck ‣ Another way: improving Regexp matching implementation (today's talk). - Ruby 3.2+ has the improved Regexp implementation that prevents ReDoS. 
  13. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Improvement result (1) polynomial case  https://www.ruby-lang.org/ja/news/2022/12/25/ruby-3-2-0-released/
  14. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Improvement result (2) exponential case  https://www.ruby-lang.org/ja/news/2022/12/25/ruby-3-2-0-released/
  15. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan How to speed up Regexp matching ⚡ The optimization by memoization makes Regexp matching faster. 
  16. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan VM-based Regexp engine ‣ CRuby's Regexp engine (Onigmo) is VM-based. ‣ VM-based means - Regexp is compiled into byte codes, and - byte code execution (matching) uses backtracking. ‣ push @label pushes a @label and the current position 
 into the stack for backtracking.  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end /\A(?:a|a)*\z/ Regexp byte codes compile
  17. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Memoization ‣ Memoization is a technique that records states and their results, 
 and omits computations for the same state. ‣ On the VM-based matching, a pair of the current PC (program counter) 
 and the position on the input string is recorded as a state. - If the same state is reached again, it is not necessary to record the result, 
 since the matching has failed once from that state. ‣ Starting on the next slide, I will explain how VM-based matching works. 
  18. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (1) ‣ Matching against the string "aab" is started. ‣ Initially, the position is 0, and the stack is empty.  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 0 Position Stack []
  19. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (2) ‣ The position is 0, then begin-buf (\A) test is passed.  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 0 Position Stack []
  20. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (3) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack.  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 0 Position Stack [[@loop-end, 0], [@branch, 0]]
  21. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (4)  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0]] ‣ The position 0 points a character 'a', then exact1 'a' test is passed. ‣ Further, exact1 'a' advances the position. 0
  22. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (5) ‣ jump @branch-end and jump @loop-begin update the PC (program counter). ‣ Then, the PC is @loop-begin (push @loop-end).  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0]]
  23. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (6) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack.  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]]
  24. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (7)  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]] ‣ The position 1 points a character 'a', then exact1 'a' test is passed. ‣ Further, exact1 'a' advances the position. 1
  25. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (8) ‣ jump @branch-end and jump @loop-begin update the PC. ‣ Then, the PC is @loop-begin (push @loop-end).  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]] 2
  26. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (9) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack.  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1], [@loop-end, 2], [@branch, 2]] 2
  27. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (10)  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1], [@loop-end, 2]] 2 Backtrack [@branch, 2] pop! ‣ The position 2 points a character 'b', then exact1 'a' test is failed. ‣ VM pops the label and the position from the stack 
 and does backtrack.
  28. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan ‣ The position 2 points a character 'b', then exact1 'a' test is failed. ‣ VM pops the label and the position from the stack 
 and does backtrack. Example: VM-based Regexp matching (11)  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]] 2 Backtrack [@loop-end, 2] pop!
  29. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan ‣ The position 2 is not the end position, then end-buf (\z) test is failed. ‣ VM pops the label and the position from the stack 
 and does backtrack. Example: VM-based Regexp matching (12)  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1]] 1 Backtrack [@branch, 1] pop! 2
  30. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (13)  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1]] ‣ The position 1 points a character 'a', then exact1 'a' test is passed. ‣ Further, exact1 'a' advances the position. 1
  31. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (14) ‣ jump @loop-begin updates the PC. ‣ Then, the PC is @loop-begin (push @loop-end).  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1]]
  32. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Example: VM-based Regexp matching (15) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack. ‣ But, we have already reached this situation at (9). 
 → Memoization  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@loop-end, 2], 
 [@branch, 2]] (9)
  33. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Memoization ‣ In the previous example, (9) and (15) look similar situation except for the stack. - Both PCs are @loop-begin (push @loop-end), and positions are 2. ‣ That is, we already know the matching will be failed from (15). ‣ By recording the PC and the position pairs once reached, 
 we can reduce unnecessary backtracking. → Let's introduce memoization!  (9) (15)
  34. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan ‣ [push @loop-end, 2] is already reached (memoized) at (9). ‣ Thus, immediately VM does backtrack. ‣ After this, many backtracks are omitted 
 by memoization, and the matching fails. ‣ Details are described in this article (in Japanese). 
 https://techlife.cookpad.com/entry/2022/12/12/162023 After introducing memoization  begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0]] Backtrack [@loop-end, 1] pop! 2
  35. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Theoretical background of memoization ‣ [David et al, SP '21] studied memoization for Regexp matching. ‣ According to the paper, 
 memoization makes matchings linear time against input string length . - Matching by backtracking may be exponential or polynomial time. - Thus, linear time is much faster. O(n) n 
  36. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Pros of memoization ‣ Memoization is implemented to the original VM directly. ‣ Therefore, - we can guarantee high-level backward compatibility, and - the optimization can be implemented with fewer modi fi cations. ‣ Actual PR: https://github.com/ruby/ruby/pull/6486 - Diff +744 -12 
  37. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Cons of memoization (1): Memory consumption ‣ Memory consumption is greater for memoization than not using memoization. - A memoization table is a bit-array, and the memory consumption 
 will increase by (string length x number of branch instructions) / 8. • Typically, the number of branch instructions is about 80 at most. - Allocating memoization table is delayed until the number of backtracks 
 exceeds a certain number. ‣ Thus, we conclude memory consumption is not a big problem. 
  38. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Cons of memoization (2): Unsupported features ‣ Memoization is not enabled when the following features are used. - look-around operators, atomic groups, conditional branches, absent operator, - subexpression calls, back-references, etc. ‣ There are also strange limitations due to the implementation. - Nested ranged repetitions are not supported. - Nested null loops are not supported too. ‣ Not all Regexps can be optimized by memoization. 
  39. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Regexp.linear_time? ‣ Ruby 3.2 also introduced Regexp.linear_time?. ‣ Regexp.linear_time?(re) checks 
 whether the given re can be optimized by memoization or not. - e.g. Regexp.linear_time?(/^(a|a)*$/) #=> true 
 e.g. Regexp.linear_time?(/^(a*)\1*$/) #=> false 
  40. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Future work 
 Talking about feature vision on Regexp matching 
  41. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Future work ‣ Showing a warning if memoization is not enabled. - Adding a new Rubocop rule. - Adding a new warning to Ruby. ‣ Introducing a new backtrack-less Regexp engine. → DFA-based Regexp engine 
  42. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan DFA-based Regexp engine ‣ DFA (Deterministic Finite-state Automaton) ‣ This can be done in linear time for matching. ‣ Go and Rust (modern languages) use such Regexp engines. ‣ Irregular extensions (subexpression calls, back-references) cannot be supported. ‣ We would like to use a DFA-based engine in Ruby if possible. 
  43. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Implementation plan ‣ We would reuse existing components as much as possible for backward compatibility. - Regexp parser, character class inclusion test, ignore-case expansion, etc.  (Just a personal vision...) Compiler Matcher Parser ignore-case 
 expansion char-class 
 inclusion DFA-based 
 Matcher
  44. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Implementation plan ‣ The recent paper [Moseley et al, PLDI '23] uses complex data structures. - It seems hard to implement it in C. ‣ Therefore, we would like to be able to implement a Regexp engine in Ruby. - It is getting ready to write fast code in Ruby (e.g. RJIT). ‣ To-do for implementing a Regexp engine in Ruby - Expose a Regexp internal APIs (parser, character-class, etc.) to Ruby. - Allow a Regexp engine to be replaced from the Ruby side.  (Just a personal vision...)
  45. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan TRegex ‣ TRegex is a Regexp matching engine used by Truf fl eRuby. ‣ It is DFA-based and JIT-enabled [Daloze and Haider, RubyKaigi '21]. ‣ However, TRegex is - implemented in Java (on GraalVM), and - not based on a CRuby's Regex engine 
 (Thus, a backward compatibility issue may exist...?) 
  46. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan Summary of today's talk ‣ ReDoS is a vulnerability about Regexp matching. ‣ To prevent ReDoS, the optimization by memoization is 
 introduced by makenowjust (me). ‣ Memoization speeds up matchings in many cases, 
 but in some cases memoization is not enabled. - Some extensions (look-around, back-references, etc.) are not supported. - There are some limitations (nested null loops, etc.) due to the implementation. ‣ Next: a new DFA-based Regexp engine written in Ruby...?  Thank you!
  47. Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi

    2023 at Matsumoto, Japan References ‣ Davis, James C., Francisco Servant, and Dongyoon Lee. "Using selective memoization to defeat regular expression denial of service (ReDoS)." 2021 IEEE symposium on security and privacy (SP). IEEE, 2021. ‣ Dan Moseley, Mario Nishio, Jose Perez Rodriguez, Olli Saarikivi, Stephen Toub, Margus Veanes, Tiki Wan, Eric Xu. "Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking Semantics" Proceeding of ACM SIGPLAN 2023 conference on Programming Language Design and Implementation. 2023. ‣ Benoit Daloze, and Josef Haider. "Just-in-Time Compiling Ruby Regexps on Truf fl eRuby" RubyKaigi 2021. 2021.