Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Analysis of the Redesign of the CoffeeScript Compiler

An Analysis of the Redesign of the CoffeeScript Compiler

Michael Ficarra

November 30, 2012
Tweet

More Decks by Michael Ficarra

Other Decks in Programming

Transcript

  1. Michael Ficarra /michaelficarra • CoffeeScript maintainer ◦ worked on jashkenas/coffee-script

    for years ◦ influential in the language's development • contribute to many ECMAScript projects ◦ constellation/escodegen ◦ constellation/esmangle ◦ documentcloud/underscore ◦ kriskowal/es5-shim • and plenty of my own -- check them out
  2. Project Goals • separation of concerns ◦ modularity ◦ use

    and expose standardised IRs • bug fixes ◦ especially two-pass symbol generation • source maps • better error reporting • mild extensibility ◦ support multiple (similar) compilation targets ◦ syntax extension is out of scope
  3. Where do we start? • Definitions: define the language ◦

    jashkenas/coffee-script is overly permissive ▪ loosely defines the language as whatever passes through the compiler without an error ▪ these need to be disallowed ◦ jashkenas/coffee-script is sometimes too restrictive ▪ mostly due to parser failings ▪ these need to be allowed $ coffee -bep 'a is b and c = d' var c; a === b && (c = d); $ coffee -bep 'fn ->, ->' Error: Parse error on line 1: Unexpected ','
  4. Where do we start? • Definitions: define the language with

    ◦ consistent syntactic rules ◦ consistent semantics to go with them ◦ an AST format that can represent CoffeeScript programs • Process ◦ break down compilation into individual components ◦ provide an interface for composition
  5. Parser Preprocessor Independent Components CS context free CS Code Generator

    CS AST JS AST context free CS CS AST JS AST JS + source map Compiler
  6. CS Code Generator Independent Components Analysis CS AST CS AST

    CS CS AST Optimiser Predicate Yes / No CS AST
  7. Syntax Formatter Compositions CS jashkenas/coffee-script CS JS CS • preprocessor

    • parser • compiler • JS code generator • discard the source map • preprocessor • parser • CS code generator
  8. CLI: Composition and I/O output destination: --output CS context free

    CS CS AST JS AST CS JS + source map JavaScript: --js source map: --source-map input source: (defaults to stdin) --input --cli preprocessed: (not standardised) N/A parsed: --parse compiled: --compile CoffeeScript: --cscodegen
  9. • Chose to generate the parser from a parsing expression

    grammar (PEG) • Upsides of PEGs ◦ operates in time linear to input length ◦ better error reporting ▪ can enumerate all valid inputs following read position ◦ good JS tooling support available at the time ◦ fully describe the syntax of the language in one place ▪ no separate lexer Parsing
  10. • Chose to generate the parser from a parsing expression

    grammar (PEG) • Downsides of PEGs ◦ not runtime extensible like parser combinators ▪ builds parsers from other parsers ▪ built at runtime, so may be overridden or extended ◦ can only accept context-free languages ▪ parser for context-sensitive languages needs an additional stack ▪ PDA accepts context-free languages ▪ LBA is needed to accept context-sensitive languages Parsing
  11. • one really simple job ◦ keep stack of context

    tokens as input is read ◦ insert context boundary markers context boundaries: • additional benefits ◦ assures pairing chars are paired before parsing ◦ enforces consistent indentation style Preprocessing (INDENT) (DEDENT) " " """ """ { } ` ` ' ' ''' ''' ( ) #{ } / / /// /// [ ] # (line terminator) ### ###
  12. Spidermonkey AST Example ariya/esprima input: { block: statement } ariya/esprima

    output: { type: 'Program', body: [ { type: 'BlockStatement', body: [ { type: 'LabeledStatement', label: { type: 'Identifier', name: 'block' }, body: { type: 'ExpressionStatement', expression: { type: 'Identifier', name: 'statement' } } } ] } ] } ariya/esprima input: ({object: expression}) ariya/esprima output: { type: 'Program', body: [ { type: 'ExpressionStatement', expression: { type: 'ObjectExpression', properties: [ { type: 'Property', key: { type: 'Identifier', name: 'object' }, value: { type: 'Identifier', name: 'expression' }, kind: 'init' } ] } } ] }
  13. Spidermonkey AST Tools ariya/esprima JS AST JS yahoo/istanbul JS AST

    (instrumented) • ECMAScript 5 parser • extremely true to spec. ◦ aside from some minor restrictions around early errors • harmony branch • instruments Spidermonkey AST for code coverage • instrumented code produces standardised report (LCOV) JS AST
  14. Spidermonkey AST Tools constellation/escodegen JS AST mozilla/sweet.js JS AST •

    JS code generator • configurable formatting with minification defaults • guarantees parse(gen(tree)) == tree • result of Tim Disney's Mozilla internship • Creates augmented parser using user-provided macro definitions JS (using macros) JS macro defs.
  15. Spidermonkey AST Tools constellation/esmangle JS AST • generates semantically equivalent,

    syntactically minimal AST • more difficult (and fun) than it sounds • name mangling • constant folding • fixed-point evaluation of set of declarative rules • 2 phases ◦ AST simplification rules ▪ !!!a => !a ◦ syntactic simplification (AST expansion) rules ▪ a.Infinity => a[1/0] ▪ true => !0 • declarative rule specification is extensible and modular JS AST
  16. Spidermonkey AST Tools constellation/estraverse • extracted from esmangle project •

    escodegen also uses it • provides AST traversal functions • implements simple visitor pattern on Spidermonkey AST pufuwozu/brushtail • tail call elimination on spidermonkey ASTs • uses estraverse and escope constellation/escope • extracted from esmangle project • provides static scope analysis • predicates such as ◦ isStatic (detects global, with, presence of direct eval) ◦ isArgumentsMaterialized • you probably don't know catch variables are block scoped in JS ◦ escope does ◦ (and CoffeeScript fixes this for you anyway)
  17. Spidermonkey AST • not perfect ◦ some trees are impossible

    syntactic constructs { type: 'IfStatement', test: ..., consequent: { type: 'IfStatement', test: ..., consequent: ..., alternate: null }, alternate: ... } ◦ no way to represent directive statements • still better than alternatives ◦ adoption has hit critical mass ◦ interop with those tools is too valuable
  18. Use Standardised IRs! • take advantage of other open source

    projects • your users can extract parts of your project • in case of jashkenas/coffee-script ◦ compiler and parser/rewriter are highly coupled ◦ code generation is intermixed with compilation ◦ code gen bugs are common ◦ code gen logic is strewn throughout the compiler ◦ no consistent concept of target's syntax ▪ statement vs. expression ({} is different in different positions) ▪ operator precedence ▪ special syntactic constructs (esp. surrounding `new` operator) ▪ significant whitespace
  19. Doing it Right • esprima • acorn • estraverse •

    escope • escodegen • esmangle • brushtail • Sweet.js • istanbul • ibrik • code painter • LLJS • RumCoke • JSX
  20. Calling You Out • TypeScript • ClojureScript • UglifyJS •

    UglifyJS2 (sigh) • Dart • Google Closure Compiler • Roy (soon!) • LiveScript (soon!) • jashkenas/coffee-script
  21. Optimisation / Compilation • declarative rule specification ◦ inherently extensible

    • optimiser: fixpoint evaluation strategy CS AST JS AST Compiler CS AST CS AST Optimiser
  22. Symbol Generation • long-running problem with jashkenas/coffee- script • common

    issue for our users • very difficult to fix with the current compiler design $ coffee -bep '_this = 0; fn = => this' var fn, _this = this; _this = 0; fn = function() { return _this; };
  23. Symbol Generation • did you catch my hypocrisy? • that

    IR is neither standardised nor exposed • don't want to force this to be two operations ◦ steps can be interleaved for performance ◦ but the IR might actually be useful; it's a tradeoff CS AST JS AST +gensyms Compiler (in reality) JS AST
  24. Source Maps • set of mappings from section of JavaScript

    to section of source text directly responsible for producing it • supported in Chrome • Firefox support coming soon ◦ see bugzilla #771597 • Debug as if the source text is actually running in your JS interpreter
  25. Source Maps 1. preserve source info in parser 2. preserve

    source info through transformations ◦ optimiser ◦ compiler 3. modify escodegen to create a CST instead of a string 4. use mozilla/source-map to generate source map and flatten CST to JS
  26. Current Status • fixed over 50 open bugs • implemented

    20 accepted enhancements • fairly stable interfaces • 98% feature complete • extensible design • source map generation + esmangle integration • great parser and runtime error reporting • being integrated with a popular IDE People are using it and contributing!
  27. Future Work • minor bug fixes • loosen some whitespace

    restrictions • more complete test suite • rewrite parser actions in CoffeeScript • remove some accidental mutation in compiler and optimiser rules • update text editor plugins • consider performance • release 2.0, replace jashkenas/coffee-script • fork and make it my own
  28. Summary • carefully choose your IRs ◦ use standards whenever

    possible ◦ expose them ◦ take advantage of others' tools that operate on your IRs ◦ for structured JS representation, use Mozilla's Spidermonkey API ◦ JS code gen in JS from this representation is a solved problem; use escodegen or equivalent • declarative behaviour specification is inherently extensible • this compiler is a huge improvement over what we had before ◦ start using it right now ◦ report bugs and tell me what to work on next