Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Implementations of Advanced LR Parser Algor...

The Implementations of Advanced LR Parser Algorithm

Junichi Kobayashi

April 17, 2025
Tweet

More Decks by Junichi Kobayashi

Other Decks in Programming

Transcript

  1. RubyKaigi 2025 Ehime Prefectural Convention Hall 2025/04/17(Thu.) The Implementations of

    Advanced LR Parser Algorithm Junichi Kobayashi (@junk0612) ESM, Inc.
  2. 自己紹介 Junichi Kobayashi • X / GitHub: @junk0612 • Working

    at ESM, Inc. ◦ Work as a Rails engineer ◦ A Member of Parser Club • Committer of Lrama • Hobbies ◦ Parsers ◦ Rhythm games ◦ Board games ◦ Haiku
  3. Attendee @nsgc Attendee @colorbox Attendees from ESM Attendee @S.H. Karaoke

    @fugakkbn Speaker @koic Attendee @wai-doi Speaker @junk0612 Attendee @kasumi8pon Attendee @mhirata Attendee @haruguchi Attendee @maimux2x
  4. Usage $ gem install lrama # from CLI $ lrama

    -D lr.type=ielr grammar.y # in grammar file %define lr.type ielr
  5. ✦ Theoretical side ✦ Utilize the tokens read so far

    to narrow down the set of possible next tokens more accurately than LALR ✦ Practical side ✦ Propagating conflict info backward and lookahead forward ✦ Identify the conflict's cause and split states if necessary "After All, What is IELR?"
  6. ✦ Shift-Reduce Parsing ✦ One of Bottom-up Parsing ✦ Use

    2 types of actions ✦ Shift: Get a next symbol from Lexer ✦ Reduce: Collapse recognized symbols into a parent node ✦ Use one of the actions depending on the situation ✦ Match one of grammar rules: Reduce ✦ Don't match any grammar rules: Shift Basis of LR Parser
  7. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1
  8. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1
  9. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1
  10. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num
  11. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp
  12. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp
  13. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp
  14. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num
  15. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp
  16. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp
  17. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp
  18. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp num
  19. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp num exp
  20. Behavior Example of LR Parser exp: exp + num |

    exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp num exp
  21. ✦ Modifier conditional statements Motivating Examples statement: method_call 'if' exp

    | method_call 'unless' exp method_call: method_name | method_name '(' args ')'
  22. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  23. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  24. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  25. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  26. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  27. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  28. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  29. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  30. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  31. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  32. statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name

    | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?
  33. ✦ The next token is... ✦ included in same (or

    child) rule → Shift ✦ included in parent (or ancestors / sibling) rule → Reduce ✦ not included in both of the above → Error Select Actions
  34. ✦ A set of tokens used to determine whether to

    Reduce ✦ LALR and IELR differ in how they compute their lookahead sets ✦ LALR: The set of all tokens that can appear immediately after a given nonterminal, based on the entire language ✦ IELR: The set of all tokens that can appear immediately after a given nonterminal, but only within the subset of language elements matching the tokens read so far Lookahead Set
  35. ✦ conditional statements Motivating Examples cond_stmt: 'if' method_call 'then' body

    'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')'
  36. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  37. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  38. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  39. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  40. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  41. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  42. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  43. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  44. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  45. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  46. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  47. cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do'

    body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end
  48. ✦ original example in the paper Motivating Examples S: 'a'

    A 'a' | 'b' A 'b' A: 'a' | 'a' 'a' aaa / aaaa / bab / baab
  49. ✦ A Conflict occurs when the parser can't uniquely decide

    the next action ✦ There are 2 types of Conflict ✦ Shift/Reduce Conflict ✦ Reduce/Reduce Conflict Conflicts
  50. LR Parser Model Parser State 0 State 1 State 2

    State 3 NUM + exp State 4 State 5 State 6 * ( exp NUM - … Token Stream Source Code Lexer Grammar File Parser Generator 8 4 1 0
  51. ✦ Can be parsed IELR but not LALR ✦ Caused

    by the difference of lookahead set ✦ May resolve by splitting states ✦ There are 3 types of Mysterious Conflicts ✦ Mysterious New Conflicts ✦ Mysterious Invasive Conflicts ✦ Mysterious Mutated Conflicts Mysterious Conflicts
  52. Lrama's Implementation States State @states Transition Shift Goto Reduce @transitions

    @reduces @items @conflicts @id @next_sym @from_state @to_state @next_sym @from_state @to_state @look_ahead @item
  53. ✦ Annotate conflicting states and their predecessors ✦ Recompute lookahead

    set from scratch ✦ Propagate it to next state ✦ Split the next state if the propagating lookahead set has no "compatibilities" with already propagated ones IELR Basic Idea
  54. ✦ Indicates potential contribution to conflicts ✦ Has the following

    conflict information inside ✦ State ✦ Token ✦ Actions ✦ Items Inadequacy Annotations
  55. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  56. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  57. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  58. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  59. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  60. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  61. Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2]

    Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}
  62. 1. Lookahead sets: from its associated item's lookahead 2. Kernel

    lookahead: from the same item's lookahead in predecessor states (dot is one symbol to the left) 3. Non-kernel lookahead: from the follow set of the state's goto on the LHS (i.e., tokens that can appear next) 4. Goto follow set: a. the remainder of the RHS after the goto’s nonterminal b. The lookahead sets of items where the remainder is nullable Recompute Lookahead Set original: https://www.sciencedirect.com/science/article/pii/S0167642309001191 summarized by: @junk0612
  63. Check Compatibilities • A: 'a'・, ['a', 'b'] • A: 'a'・'a',

    ['a', 'b'] • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a
  64. Check Compatibilities • A: 'a'・, ['a', 'b'] • A: 'a'・'a',

    ['a', 'b'] • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a
  65. Check Compatibilities • A: 'a'・, ['a', 'b'] • A: 'a'・'a',

    ['a', 'b'] • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}
  66. Check Compatibilities • A: 'a'・, ['a'] • A: 'a'・'a', ['a']

    • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}
  67. Check Compatibilities • A: 'a'・, ['a'] • A: 'a'・'a', ['a']

    • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}
  68. Check Compatibilities • A: 'a'・, ['b'] • A: 'a'・'a', ['b']

    • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}
  69. Check Compatibilities • A: 'a'・, ['b'] • A: 'a'・'a', ['b']

    • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}
  70. Check Compatibilities • A: 'a'・, ['a'] • A: 'a'・'a', ['a']

    • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a • A: 'a'・, ['b'] • A: 'a'・'a', ['b']
  71. Check Compatibilities • A: 'a'・, ['a'] • A: 'a'・'a', ['a']

    • S: 'b'・A 'b', ['#'] • A:・'a', ['b'] • A:・'a' 'a', ['b'] • S: 'a'・A 'a', ['#'] • A:・'a', ['a'] • A:・'a' 'a', ['a'] a a • A: 'a'・, ['b'] • A: 'a'・'a', ['b'] No Conflicts!
  72. ✦ Automaton has loops ✦ Compute lookahead sets slowly ✦

    Create many "useless" annotations Why got stack
  73. ✦ Introduce some caches ✦ Use strongly connected components algorithm

    ✦ Following the LALR Lookahead set computation Fasten Computation
  74. Generate IELR parser simply $ lrama --report=states parse.y $ mv

    parse.output lalr.output $ lrama --report=states -D lr.type=ielr parse.y $ mv parse.output ielr.output $ diff ielr.output lalr.output #=> no diffs!
  75. ✦ There are 4 'do's in Ruby ✦ keyword_do ✦

    keyword_do_cond ✦ keyword_do_block ✦ keyword_do_LAMBDA ✦ The tokens read so far provide sufficient information to decide Identify 4 'do's Smart obj.m(arg) do ... end while(true) do ... end obj.m arg do ... end -> (arg) do ... end
  76. while obj.m do #=> keyword_do_cond while(obj.m do #=> keyword_do while(obj.m

    arg do #=> keyword_do_block while -> do #=> keyword_do_LAMBDA while obj.m -> do #=> keyword_do_LAMBDA while(obj.m -> do end do #=> keyword_do_block
  77. ✦ @yui-knk, @ydah ✦ #LR_parser_gangs ✦ @koic, @S.H. ✦ #esm_parser_club

    ✦ My Wife ✦ #a_beautiful_life Acknowledgements
  78. ✦ Theoretical side ✦ Utilize the tokens read so far

    to narrow down the set of possible next tokens more accurately than LALR ✦ Practical side ✦ Propagating conflict info backward and lookahead forward ✦ Identify the conflict's cause and split states if necessary "After All, What is IELR?"