Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parsing RBS

Parsing RBS

RubyKaigi 2023

Soutaro Matsumoto

May 13, 2023
Tweet

More Decks by Soutaro Matsumoto

Other Decks in Programming

Transcript

  1. Soutaro station Soutaro timetable Soutaro bus stop Soutaro cedar trees

    (Forget the transliteration variations in photos)
  2. Soutaro station Soutaro timetable Soutaro bus stop Soutaro cedar trees

    Soutaro park (Forget the transliteration variations in photos)
  3. New syntaxes in RBS 3.0 Class/module alias syntax Use syntax

    (Import in Java/C# for RBS) (RBS) (Ruby)
  4. Signature help • A method signature list pops up on

    method calls to help developers typing arguments
  5. Signature help • A method signature list pops up on

    method calls to help developers typing arguments
  6. Better type name completion • Typing chan resolves to Parseg::TokenFactory::change

    • It inserts shorter names based on the current module nesting context
  7. Better type name completion • Typing chan resolves to Parseg::TokenFactory::change

    • It inserts shorter names based on the current module nesting context
  8. • When parameter type is being typed, it has syntax

    error and the module nesting context is lost → Absolute type name is inserted 🤷 module Parseg module ParsingSession def intersect?: (Parseg::TokenFactory::change) end end module Parseg module ParsingSession def intersect?: (c) end end
  9. • When return type is being typed, it's valid syntax

    → Relative type name is inserted 🙆 module Parseg module ParsingSession def intersect?: ... -> TokenFactory::change end end module Parseg module ParsingSession def intersect?: ... -> c end end
  10. Parsing broken RBS matters • The inconsistency is caused by

    parsing errors • We need a parser that continue working even with syntax errors to provide advanced IDE features
  11. 1. Demo 2. Top-down parser outline 3. Error recovery (1)

    4. Error recovery (2) 5. Error recovery (3) You will be able to write a top-down parser with error recovery. 💪
  12. Error tolerant parser generator • Generates top-down parser with error

    recovery • Grammar de fi nition in Ruby DSL • (Doesn't generate any parser code yet 😜) https://github.com/soutaro/parseg
  13. Grammar De fi nition class_decl ::= class module_name 
 class_member*

    
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...
  14. Grammar De fi nition class_decl ::= class module_name 
 class_member*

    
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...
  15. Grammar De fi nition class_decl ::= class module_name 
 class_member*

    
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...
  16. Grammar De fi nition class_decl ::= class module_name 
 class_member*

    
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...
  17. Grammar De fi nition class_decl ::= class module_name 
 class_member*

    
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...
  18. Grammar De fi nition class_decl ::= class module_name 
 class_member*

    
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...
  19. Parser implementation class_decl ::= class module_name 
 class_member* 
 end

    • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree
  20. Parser implementation class_decl ::= class module_name 
 class_member* 
 end

    • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree
  21. Parser implementation class_decl ::= class module_name 
 class_member* 
 end

    • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree
  22. Parser implementation class_decl ::= class module_name 
 class_member* 
 end

    • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree
  23. Parser implementation class_decl ::= class module_name 
 class_member* 
 end

    • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree
  24. Parser implementation class_decl ::= class module_name 
 class_member* 
 end

    • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree
  25. • Alternation is implemented with case analysis on the fi

    rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...
  26. • Alternation is implemented with case analysis on the fi

    rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...
  27. • Alternation is implemented with case analysis on the fi

    rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...
  28. • Alternation is implemented with case analysis on the fi

    rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...
  29. Parsing error • We can fi nd some structure from

    the input, even it has a syntax error • There is a class declaration • There is a method de fi nition • Non tolerant parser tells you nothing
  30. Skip tokens • One token blocks further parsing when no

    rule handles the token • 💡 Skip that tokens to continue parsing
  31. • Tokens that may be consumed by the parsing methods

    are: • Possible fi rst tokens of type (UIDENT, void, untyped, ...) • class, attr_reader, and def for next class_member • end for closing the class declaration • class for next class declaration attr_reader ::= attr_reader attribute_name : type
  32. Implementation • Skips tokens that cannot be consumed in the

    rule before processing every rule • (And calculate the consumable tokens set)
  33. Error tolerant parser (2) • Inserts MissingTree instead of raising

    errors • Skip tokens that cannot be consumed with other possible rules 😃 This is well-known error recovery strategy for top-down parsers. 
 (https://github.com/microsoft/tolerant-php-parser)
  34. Nested declaration • Inner class declaration eats the following method

    de fi nition • Conference#initialize disappears and unexpected type error will be detected • Better error recovery is to close the Talk de fi nition immediately
  35. What was happening? class Conference def initialize: (String, Integer) ->

    void end class Conference class Talk def initialize: (String, Integer) -> void end
  36. What was happening? class Conference def initialize: (String, Integer) ->

    void end class Conference class Talk def initialize: (String, Integer) -> void end class Conference class Talk end def initialize: (String, Integer) -> void end
  37. What was happening? class Conference def initialize: (String, Integer) ->

    void end class Conference class Talk def initialize: (String, Integer) -> void end class Conference class Talk end def initialize: (String, Integer) -> void end class Conference
  38. Key ideas • Let parser use the changes made on

    input since the last successful parsing result • Avoid moving existing elements into new trees
  39. Key ideas • Let parser use the changes made on

    input since the last successful parsing result • Avoid moving existing elements into new trees 😵 😁
  40. Change based error recovery • Identify which tokens are changed

    since the last successful parsing • Closes the declaration at the end of change Inserted tokens Close the declaration
  41. class Conference def initialize : ... class Conference class Talk

    def initialize : ... Changed tokens Text inserted class Talk
  42. class Conference def initialize : ... class Conference class Talk

    def initialize : ... Changed tokens Text inserted class Talk class Conference class Talk [EOC] def initialize : ... Inserts a marker token
  43. Change based error recovery • The error recovery runs only

    after normal parsing fails to keep successful results identical to the results of original parser
  44. Change based error recovery • 👍 Minimal grammar modi fi

    cation • 👍 Token based change detection • No tree di ff calculation required • Changed tokens are easily detected by LSP edit noti fi cations • 😵 Unsupported text editing patterns may result in confusing errors
  45. Error tolerant parser (3) • Inserts MissingTree instead of raising

    errors • Skip tokens that cannot be consumed with other possible rules • Avoid moving existing elements if parsing fails 😃
  46. Open problems • Translating the concrete syntax tree to AST

    • AST de fi nes a successful parsing result Attribute declarations must have names and types
  47. Summary • Planning to replace RBS parser for better development

    experience • Making a top-down parser error tolerant • Generates parsing tree even with syntax errors • Change based error recovery
  48. • Trial 1: Based on ML's type inference (2007) •

    Trial 2: Based on control fl ow analysis (2009) • (Break until Oedo RubyKaigi 2017) • Trial 3: Steep -- introducing type declarations My 15 years for type checking Ruby programs
  49. • Trial 1: Based on ML's type inference (2007) •

    Trial 2: Based on control fl ow analysis (2009) • (Break until Oedo RubyKaigi 2017) • Trial 3: Steep -- introducing type declarations My 15 years for type checking Ruby programs
  50. • Trial 1: Based on ML's type inference (2007) •

    Trial 2: Based on control fl ow analysis (2009) • (Break until Oedo RubyKaigi 2017) • Trial 3: Steep -- introducing type declarations My 15 years for type checking Ruby programs