the world Outline • ECMA357 parsing problem • How to implement ECMA357 scanner • How to implement ECMA357 parser • Demo • ECMA357 spec issues • Summary
the world Parsing E4X is hard • E4X rule is different from ECMAScript – Example: – XMLWhitespace is the part of token… – And XMLName is not the same to Iden2fier • Need to change lexer mode, ECMAScript / E4X – When some punctuator comes & current context is PrimaryExpression change lexer mode and parse E4X – This is similar to RegExp scanning • h\p://disnetdev.com/blog/2012/12/20/how-‐to-‐read-‐macros/ XMLElement: < XMLTagContent XMLWhitespaceopt /> < XMLTagContent XMLWhitespaceopt > XMLElementContentopt </ XMLTagName XMLWhitespaceopt >
the world Outline • ECMA357 parsing problem • How to implement ECMA357 scanner • How to implement ECMA357 parser • Demo • ECMA357 spec issues • Summary
the world Scanning E4X -‐ Whitespace • Unlike ECMAScript scanner, E4X scanner scan Whitespace as 1 token • Need to recognize there are whitespaces or not (Between < and XMLTagContent, whitespace is not allowed) • And whitespace defini8on is different from ECMA262 – Ver8cal tab, \u2028, \u2029, BOM etc. are not included in ECMA357 – So special scanner for E4X is needed XMLElement: < XMLTagContent XMLWhitespaceopt /> < XMLTagContent XMLWhitespaceopt > XMLElementContentopt </ XMLTagName XMLWhitespaceopt >
the world Scanning E4X -‐ XMLName • Scan XMLName, this is not Iden2fier • -‐, :, and others are accepted – soap:Envelope, family-‐name var message = <soap:Envelope xmlns:soap="h\p://schemas.xmlsoap.org/soap/envelope/" soap:encodingStyle="h\p://schemas.xmlsoap.org/soap/encoding/"> <soap:Body> <m:GetLastTradePrice xmlns:m="h\p://mycompany.com/stocks"> <symbol>DIS</symbol> </m:GetLastTradePrice> </soap:Body> </soap:Envelope>
the world Scanning E4X – Special Nodes • XMLComment, XMLCdata, XMLProcessingInstruc8on • Scan them as 1 token – Because they have no child nodes var comment = <!-‐-‐ This is XML comment -‐-‐>; // We loved this, here document was here!! var cdata = <><![CDATA[ This is CDATA sec8on ]]></> ; var instr = <?xml-‐stylesheet href=”style.css" 8tle=”Stylesheet" type="text/css"?>;
the world Scanning E4X – new tokens • Some new tokens are needed • Add new tokens, “..”, “@”, “::” // XMLA\ributeSelector var encodingStyle = message.@soap::encodingStyle; // XMLQualifiedIden8fier var body = message.soap::Body; var orderChildren = order.*; var orderA\ributes = order.@*; // XMLFilterExpression var twoemployees = e.employee.(@id == 0 || @id == 1); // XML decendant accessor var names = e..name;
the world Outline • ECMA357 parsing problem • How to implement ECMA357 scanner • How to implement ECMA357 parser • Demo • ECMA357 spec issues • Summary
the world Parsing E4X – Add syntax exten8on • Add rules to parse syntax extensions • Parsing them as – New node (XMLA\ributeSelector, XMLFilterExpression etc.) – Ordinary node (BinaryExpression with operator “::”) • It’s an easy part of E4X parsing var encodingStyle = message.@soap::encodingStyle; var twoemployees = e.employee.(@id == 0 || @id == 1); var names = e..name;
the world Parsing E4X – Add statements • Add new statements • for-‐each statement is the part of E4X! // namespace default xml namespace = ‘’; // for-‐each for each (var p in e..employee) { with (p) { if (@id == 0 || @id == 1) { twoEmployees[i++] = p; } } }
the world Parsing E4X -‐ XML • Need to change scanner mode • When “<“ comes (it is a punctuator), make scanner E4X mode and rescan – At first, ECMAScript scanner produce “<“ (because this is used for rela8onal operator e.g. a < b) – And if PrimaryExpression starts with it, it is XML • This is similar to RegExp parsing var customer = <customer> <firstname>John</firstname> <lastname>{lastName}</lastname> </customer>;
the world Parsing E4X -‐ XMLEscape • When “{“ at XMLEscape posi8on comes, get back scanner to ECMAScript mode, parse expression and return to E4X mode var customer = <customer> <firstname>John</firstname> <lastname>{lastName}</lastname> </customer>;
the world Parsing E4X – Mozilla Extension • func8on namespace support – This is not the part of ECMA357 • When “func8on” comes at XMLQualifiedIden8fier posi8on, parse it as XMLFunc8onQualifiedIden8fier var length = message.func8on::length; xml.list.(func8on::hasOwnProperty(‘@id’) && @id === “b”)
the world Parsing E4X – Iden8fierName • Edge case – Control Iden8fierName / Iden8fier in XMLQualifiedIden8fier // keyword is allowed xml.list.(func8on::default(‘@id’) && @id === “b”) // keyword is not allowed default::test; // But func2on is allowed for Mozilla extension func8on::test;
the world Outline • ECMA357 parsing problem • How to implement ECMA357 scanner • How to implement ECMA357 parser • Demo • ECMA357 spec issues • Summary
the world Outline • ECMA357 parsing problem • How to implement ECMA357 scanner • How to implement ECMA357 parser • Demo • ECMA357 spec issues • Summary
the world ECMA357 spec issue • Implemen8ng ECMA357 parser, we found the spec issue • Seeing sec8on 11.1.4 • This BNF doesn’t allow • But this is used in spec example (11.1.4) XMLA\ribute : XMLWhitespace XMLName XMLWhitespaceopt = XMLWhitespaceopt { Expression } XMLWhitespace XMLName XMLWhitespaceopt = XMLWhitespaceopt XMLA\ributeValue var xml = <name {escapedA\ribute}=“value”></name>; var tagname = "name"; var a\ributename = "id"; var a\ributevalue = 5; var content = "Fred"; var x = <{tagname} {a\ributename}={a\ributevalue}>{content}</{tagname}>;
the world Outline • ECMA357 parsing problem • How to implement ECMA357 scanner • How to implement ECMA357 parser • Demo • ECMA357 spec issues • Summary