E4X parser implementaion & ECMA 357 spec issues

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils
the world E4X parser implementa8on & ECMA 357 spec issues Yusuke Suzuki a.k.a. @Constella8on At E4X memorial service

the world Introduc8on •  E4X, E4X 実況が見たいよー! •  正確には E4X で add-‐on 全部動かなくなる人の顔が見たいーあびゃー –  あいまいみー

the world Outline •  ECMA357 parsing problem •  How to implement ECMA357 scanner •  How to implement ECMA357 parser •  Demo •  ECMA357 spec issues •  Summary

the world E4X – ECMA357 •  ECMAScript for XML – ECMA357 •  Deﬁne Syntax & Seman8cs

the world Parsing E4X is hard •  E4X rule is diﬀerent from ECMAScript –  Example: –  XMLWhitespace is the part of token… –  And XMLName is not the same to Iden2ﬁer •  Need to change lexer mode, ECMAScript / E4X –  When some punctuator comes & current context is PrimaryExpression change lexer mode and parse E4X –  This is similar to RegExp scanning •  h\p://disnetdev.com/blog/2012/12/20/how-‐to-‐read-‐macros/ XMLElement: < XMLTagContent XMLWhitespaceopt /> < XMLTagContent XMLWhitespaceopt > XMLElementContentopt </ XMLTagName XMLWhitespaceopt >

the world Scanning E4X -‐ Whitespace •  Unlike ECMAScript scanner, E4X scanner scan Whitespace as 1 token •  Need to recognize there are whitespaces or not (Between < and XMLTagContent, whitespace is not allowed) •  And whitespace deﬁni8on is diﬀerent from ECMA262 –  Ver8cal tab, \u2028, \u2029, BOM etc. are not included in ECMA357 –  So special scanner for E4X is needed XMLElement: < XMLTagContent XMLWhitespaceopt /> < XMLTagContent XMLWhitespaceopt > XMLElementContentopt </ XMLTagName XMLWhitespaceopt >

the world Scanning E4X -‐ XMLName •  Scan XMLName, this is not Iden2ﬁer •  -‐, :, and others are accepted –  soap:Envelope, family-‐name var message = <soap:Envelope xmlns:soap="h\p://schemas.xmlsoap.org/soap/envelope/" soap:encodingStyle="h\p://schemas.xmlsoap.org/soap/encoding/"> <soap:Body> <m:GetLastTradePrice xmlns:m="h\p://mycompany.com/stocks"> <symbol>DIS</symbol> </m:GetLastTradePrice> </soap:Body> </soap:Envelope>

the world Scanning E4X – Special Nodes •  XMLComment, XMLCdata, XMLProcessingInstruc8on •  Scan them as 1 token –  Because they have no child nodes var comment = <!-‐-‐ This is XML comment -‐-‐>; // We loved this, here document was here!! var cdata = <><![CDATA[ This is CDATA sec8on ]]></> ; var instr = <?xml-‐stylesheet href=”style.css" 8tle=”Stylesheet" type="text/css"?>;

the world Scanning E4X – new tokens •  Some new tokens are needed •  Add new tokens, “..”, “@”, “::” // XMLA\ributeSelector var encodingStyle = message.@soap::encodingStyle; // XMLQualiﬁedIden8ﬁer var body = message.soap::Body; var orderChildren = order.*; var orderA\ributes = order.@*; // XMLFilterExpression var twoemployees = e.employee.(@id == 0 || @id == 1); // XML decendant accessor var names = e..name;

the world Parsing E4X – Add syntax exten8on •  Add rules to parse syntax extensions •  Parsing them as –  New node (XMLA\ributeSelector, XMLFilterExpression etc.) –  Ordinary node (BinaryExpression with operator “::”) •  It’s an easy part of E4X parsing var encodingStyle = message.@soap::encodingStyle; var twoemployees = e.employee.(@id == 0 || @id == 1); var names = e..name;

the world Parsing E4X – Add statements •  Add new statements •  for-‐each statement is the part of E4X! // namespace default xml namespace = ‘’; // for-‐each for each (var p in e..employee) { with (p) { if (@id == 0 || @id == 1) { twoEmployees[i++] = p; } } }

the world Parsing E4X -‐ XML •  Need to change scanner mode •  When “<“ comes (it is a punctuator), make scanner E4X mode and rescan –  At first, ECMAScript scanner produce “<“ (because this is used for rela8onal operator e.g. a < b) –  And if PrimaryExpression starts with it, it is XML •  This is similar to RegExp parsing var customer = <customer> <firstname>John</firstname> <lastname>{lastName}</lastname> </customer>;

the world Parsing E4X -‐ XMLEscape •  When “{“ at XMLEscape posi8on comes, get back scanner to ECMAScript mode, parse expression and return to E4X mode var customer = <customer> <ﬁrstname>John</ﬁrstname> <lastname>{lastName}</lastname> </customer>;

the world Parsing E4X – Mozilla Extension •  func8on namespace support –  This is not the part of ECMA357 •  When “func8on” comes at XMLQualifiedIden8fier posi8on, parse it as XMLFunc8onQualifiedIden8fier var length = message.func8on::length; xml.list.(func8on::hasOwnProperty(‘@id’) && @id === “b”)

the world Parsing E4X – Iden8fierName •  Edge case –  Control Iden8fierName / Iden8fier in XMLQualifiedIden8fier // keyword is allowed xml.list.(func8on::default(‘@id’) && @id === “b”) // keyword is not allowed default::test; // But func2on is allowed for Mozilla extension func8on::test;

the world Demo – Esprima with E4X •  Implement COMPLETE ECMA357 parser on Esprima •  h\p://constella8on.github.io/demo/e4x/index.html

the world ECMA357 spec issue •  Implemen8ng ECMA357 parser, we found the spec issue •  Seeing sec8on 11.1.4 •  This BNF doesn’t allow •  But this is used in spec example (11.1.4) XMLA\ribute : XMLWhitespace XMLName XMLWhitespaceopt = XMLWhitespaceopt { Expression } XMLWhitespace XMLName XMLWhitespaceopt = XMLWhitespaceopt XMLA\ributeValue var xml = <name {escapedA\ribute}=“value”></name>; var tagname = "name"; var a\ributename = "id"; var a\ributevalue = 5; var content = "Fred"; var x = <{tagname} {a\ributename}={a\ributevalue}>{content}</{tagname}>;

the world Conclusion •  Implement complete ECMA357 parser •  Describe HOW TO IMPLEMENT IT •  Show ECMA357 spec issue

the world Thank you for your a\en8on Any ques8ons?

E4X parser implementaion & ECMA 357 spec issues

E4X parser implementaion & ECMA 357 spec issues

Yusuke SUZUKI

More Decks by Yusuke SUZUKI

Other Decks in Programming

Featured

Transcript

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils

E4X, それは世界を曝くシステム -‐ E4X, that is, the system that unveils