Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a language parser in 15min (or less) - ...

GoDays
January 23, 2020

Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

Using regular expressions to process content may be enough in some cases, but as the grammar growths in complexity, they become a nightmare to maintain. This is were parsers based on Parsing Expression Grammars (PEG) come to the rescue.
In this talk, we will see how to build such a parser to handle a small subset of the Asciidoc markup language.

GoDays

January 23, 2020
Tweet

More Decks by GoDays

Other Decks in Technology

Transcript

  1. Writing a Language Parser in 15min (or less) Xavier Coulon

    - Red Hat twitter.com/xcoulon medium.com/xcoulon
  2. About me Working at Red Hat for 8+ years Working

    on OpenShift.io and its successor (we are hiring!) On my free time, coding on a library to convert Asciidoc to HTML
  3. PEG to the Rescue Parsing Expression Grammars: - Describe a

    language, using rules to recognize strings - Use the first matching rule recursively until the end of document is reached, or tries with the next rule
  4. Defining the Grammar Document <- DocumentBlock* EOF DocumentBlock <- Paragraph

    / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+
  5. Defining the Grammar (1/2) Document <- DocumentBlock* EOF DocumentBlock <-

    Paragraph / BlankLine BlankLine <- WS* Newline Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText WS <- " " Newline <- "\r\n" / "\r" / "\n" EOF <- !. EOL <- Newline / EOF
  6. Defining the Grammar (2/2) BoldText <- '*' !WS BoldTextElement (WS+

    BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+
  7. Grammar in action Some *bold and _italic and `monospace text`_*

    Document <- DocumentBlock* EOF DocumentBlock <- Paragraph / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' ItalicText <- '_' !WS ItalicTextElement (WS+ ItalicTextElement)* '_' MonospaceText <- '`' !WS MonospaceTextElement (WS+ MonospaceTextElement)* '`' Text <- [a-zA-Z0-9]+ WS <- " "
  8. Writing Generating a parser in Go BoldText <- '*' !WS

    BoldTextElement (WS+ BoldTextElement)* '*' BoldText <- '*' !WS elements:(BoldTextElement (WS+ BoldTextElement)*) '*' { return types.NewQuotedText(types.Bold, elements.([]interface{})) } { package parser import (...) }
  9. Advanced features of mna/pigeon // predicate to skip a rule

    if a condition is not met // check characters without processing