Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
Search
aereal
August 04, 2017
3
3.7k
Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go
talked at builderscon tokyo 2017
aereal
August 04, 2017
Tweet
Share
More Decks by aereal
See All by aereal
盆栽転じて家具となる / Bonsai and Furnitures
aereal
0
1.8k
How to send distibuted traces to Datadog using build own OpenTelemetry-Lambda distribution
aereal
3
220
好きな技術《コト》で、 生きていく技術 / life with what you like
aereal
5
3k
qron: Cloud Native Cron Alternativeの今
aereal
2
2.2k
自動作曲入門 / introduction to programatic music composition
aereal
1
530k
はてなブログ タグとCDK / The epic of AWS CDK and Hatena Blog Tag
aereal
3
200k
はてなブログ タグの技術選択 / The technical details of Hatena Blog Tag
aereal
3
200k
ブログサービスのHTTPS化を支えたAWSで作るピタゴラスイッチ / The construction of large scale TLS certificates management system with AWS
aereal
3
400k
AWSではてなブログの常時HTTPS配信をバーンとやる話 / The Epic of migration from HTTP to HTTPS on Hatena Blog with AWS
aereal
14
18k
Featured
See All Featured
Site-Speed That Sticks
csswizardry
2
270
Measuring & Analyzing Core Web Vitals
bluesmoon
5
210
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
500
Dealing with People You Can't Stand - Big Design 2015
cassininazir
365
25k
Six Lessons from altMBA
skipperchong
27
3.6k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
960
Side Projects
sachag
452
42k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.2k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Bash Introduction
62gerente
610
210k
For a Future-Friendly Web
brad_frost
176
9.5k
Transcript
GoͰ࣮͢Δ ܰྔϚʔΫΞοϓݴޠ ύʔαʔ id:aereal @ builderscon tokyo 2017
͢͜ͱ • ܰྔϚʔΫΞοϓݴޠͱͯͳه๏ʹ͍ͭͯ • ςΩετॲཧͱύʔαʔδΣωϨʔλʔͷඞཁੑ • Go/goyaccʹΑΔͯͳه๏ύʔαʔͷհ • goyaccͷԠ༻ࣝ
ࣗݾհ • id:aereal • GitHub: aereal • גࣜձࣾͯͳ ΞϓϦέʔγϣϯΤϯδχΞ
⚠͓͜ͱΘΓ⚠ • αʔϏεΛ৮͍ͬͯͯײͨ͡ ݸਓతͳ՝ҙࣝʹجͮ͘ϓϥΠϕʔτϫʔΫͰ͢ • αʔϏεʹ࠾༻͞ΕΔ͔ෆ໌
ࢀߟใ • http://b.hatena.ne.jp/aereal/2017gokyoto/ • ͯͳϒοΫϚʔΫͰλάΛ͚ͯϒΫϚ͍ͯ͠·͢
ܰྔϚʔΫΞοϓݴޠͱ ͯͳه๏
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ܰྔϚʔΫΞοϓݴޠͱ • LML = Lightweight Markup Language • HTMLXMLͱϓϨʔϯςΩετͷதؒʹ͋Δ •
Markdown, Textile, ͯͳه๏, etc.
ͯͳه๏ͱ • ͯͳ͕ఏڙ͢Δ͍͔ͭ͘ͷαʔϏεͰ͑ΔLML • ͯͳϒϩάɺͯͳμΠΞϦʔɺetc. • HTMLʹม͞ΕΔศརͳه๏ • org-modeͱͪΐͬͱࣅ͍ͯΔจ๏
* ݟग़͠1 ** ݟग़͠2 [http://127.0.0.1/:title=΅͘ͷIPͰ͢] - Ruby - Perl -
Go + ى + ঝ + స + ݁
<h1>ݟग़͠1</h1> <h2>ݟग़͠2</h2> <p> <a href="http://127.0.0.1/">΅͘ͷIPͰ͢</a> </p> <ul> <li>Ruby</li> <li>Perl</li> <li>Go</li>
</ul> <ol> <li>ى</li> <li>ঝ</li> <li>స</li> <li>݁</li> </ol>
࣮͍Ζ͍Ζ • ͯͳϒϩάɺͯͳμΠΞϦʔɺͯͳάϧʔϓ • Text-Hatena (CPAN) • Text-Xatena (CPAN) •
chris4403/WikiTextConverter • motemen/pandoc
࣮͍Ζ͍Ζ • ༷ ≈ ࣮ • ࣮͕͍Ζ͍Ζ͋Δ • ͭ·Γ •
࣮ͷ͚༷͕ͩଘࡏ͢Δ • ༷ΛΔʹPerlͱਖ਼نදݱΛಡΈղ͘ඞཁ͕͋Δ
खࠒͳ࣮͕ແͯ͘ࠔΔ • PerlҎ֎Ͱॻ͔ΕͨΞϓϦέʔγϣϯͰ ͯͳه๏Λ͑ΔΑ͏ʹ͍ͨ͠ɺ͚Ͳ…… • Perlͷ֦ுਖ਼نදݱΛۦ͍ͯ͠ΔͷͰҠ২େม • HTMLม·ͰΔύʔαʔ͕ଟ͍
ϙʔλϏϦςΟ • ϒϥβͰϥΠϒϓϨϏϡʔͱ͔͍ͨ͠͡ΌΜ • ೖྗʹର͢Δग़ྗ (AST) ͚ͩΛܾΊ͍ͨ • PerlGoScala, JavaScriptͦͷଞͰॻ͖͍ͨ
HTMLม·ͰΓͨ͘ͳ͍ • ଟ͘ͷύʔαʔ࣮͕HTMLม·Ͱߦ͏ • ҰํɺೖྗʹͲΕ͘Β͍HTMLΛڐՄ͢Δ͔ αʔϏεຖ (!= ύʔαʔ࣮ຖ) ʹҟͳΔ •
→ ύʔαʔͱHTMLมΛ͍ͨ͠
͜Μͳͯͳه๏ύʔαʔ͕ ΄͍͠ • ϦϑΝϨϯεͨΓ͏Δૉͳ࣮ • = ਖ਼نදݱͰͳΜͱ͔͠Α͏ͱ͍͗ͯ͢͠ͳ͍ • ύʔε݁Ռ͕HTMLͰͳ͘தؒදݱ͕ಘΒΕΔ
ࡾߦͰ·ͱΊΔͱ • AST͘Ε!!!
࣍ճ༧ࠂ • ಛఆͷݴޠʹґଘ͠ͳ͍ ྑ͍͔Μ͡ͷςΩετॲཧάοζͳ͍ͷ͔ • ͨͩ͠ (֦ு) ਖ਼نදݱҎ֎ • Αͦ͞͏ͳςΩετॲཧٕज़Λ୳͠ʹ͍͖·͢
ςΩετॲཧͱ ύʔαʔδΣωϨʔλʔ
ςΩετॲཧͻͱΊ͙Γ • ςΩετॲཧͷςΫχοΫΛ͍Ζ͍Ζհ • έʔεʹΑͬͯύʔαʔΛॻ͘·Ͱͳ͔ͬͨΓ͢Δ
τʔΫϯͷग़ݱҐஔ "id:aereal".substring(3) // => "aereal"
τʔΫϯͷग़ݱҐஔ • τʔΫϯͷग़ݱҐஔ͕ݻఆͳΒ͜Ε͘Β͍Ͱ • Մมͩͱഁ͢Δ • ͓ͦͯ͠Αͦͷจ๏ՄมͷτʔΫϯ͔Γ
ਖ਼نදݱ /id:(.+)/.match("id:aereal")[1] // => "aereal"
ਖ਼نදݱ • ׅހͷඇରԠݕग़Ͱ͖ͳ͍ • (POSIXͷਖ਼نදݱͰෆՄɺ Perlͷ֦ுਖ਼نදݱͰͰ͖ͨͣ) • ҰຊΓ͕Ͱ͖ͳ͔ͬͨΒɺ ޙड़ͷঢ়ଶཧΛߦ͏ඞཁ͕͋Δ
ঢ়ଶભҠΛཧ var isInIdNotation = false; while (1) { if (isInIdNotation)
{ var name = readText(); // => "aereal" } else { switch (readChar()) { case ':': isInIdNotation = true; default: // ... } } }
ঢ়ଶભҠΛཧ var isInIdNotation = false; var isInHeading = false; var
isInUnorderedList = false; var isInOrderedList = false; while (1) { if (isInIdNotation) if (isInHeading) if (isInUnorderedList) if (isInOrderedList) }
None
• ͲΕจ๏Λ၆ᛌͮ͠Β͍ • ϞδϡʔϧԽ͕͍͠ • → খ͍͞෦ΛੵΈ্͍͛ͯ͘ελΠϧͰ࡞ΕͨΒ……
ͦ͜Ͱyacc • ύʔαʔδΣωϨʔλʔͷ1ͭ • BNFʹࣅͨߏจنଇ͔ΒύʔαʔΛੜ͢Δ • ෳͷنଇΛΈ߹Θͤͯ1ͭͷنଇΛ࡞Γ্͛Δ • ίʔϧόοΫελΠϧͰ نଇΛϓϩάϥϜʹม͢Δ
(ؐݩɺreduce)
https://tools.ietf.org/html/rfc7230 HTTP-Message = start-line *( header-field CRLF) CRLF [ message-body]
start-line = request-line / status-line
yacc • BNFͱ͍͏நతͳํ๏Ͱطड़Ͱ͖Δͷ͕Α͍ • ݴޠDSLʹରͯ͠ϙʔλϏϦςΟͰ༏Δ • ϨΩαʔ (ࣈ۟ղੳث) ผ్࣮͢Δඞཁ͕͋Δ •
ߏจنଇͷίʔϧόοΫ෦͕ ΤσΟλͰϋΠϥΠτ͞Εͳ͍ (ͳʹ͔͍͍ํ๏͋Γͦ͏)
࣍ճ༧ࠂ • yaccΑͦ͞͏ͱ͍͏͜ͱ͕Θ͔ͬͨ • GoͱyaccΛΈ߹ΘͤΒΕΔͷ͔ • ͨͯͯ͠ͳه๏ύʔαʔΛ࡞Δ͜ͱ͕Ͱ͖Δͷ͔
https://git.io/v7gcD github.com/aereal/gohn
gohn • Written in Go w/goyacc • pronounce as `gone`
• ओཁͳه๏࣮ࡁΈ
gohnͷσβΠϯ • ඪ४ೖྗ͔Βͯͳه๏Λड͚औΓɺ • ඪ४ग़ྗʹASTΛJSONʹγϦΞϥΠζͯ͠ग़ྗ͢Δ • → HTMLͷมผ్࣮͢Δ • ͱͯUNIXత
AST • JSONʹγϦΞϥΠζ • JSON schemaΛެ։͍ͯ͠Δ • εΩʔϚ͔ΒHTMLมثΛࣗಈੜ͢Δ͜ͱͰ͖ͦ͏ • https://github.com/aereal/gohn/blob/master/schema.json
Goͱyacc • goyaccͱ͍͏πʔϧ͕͋Δ • go get golang.org/x/tools/cmd/goyacc • ΞΫγϣϯΛGoͰॻ͚Δ
Goͱࣈ۟ղੳ • ࣈ۟ղੳ = ಡΜͩจࣈ͕ͲΜͳҙຯΛ࣋ͭͷ͔ฦ͢ • text/scannerͱ͍͏ඪ४ύοέʔδ͕ศར • ڍಈΛΧελϚΠζͰ͖Δ •
τʔΫϯΛফඅͨ͠࠷ޙͷҐஔΛهͯ͘͠ΕΔͷͰ Τϥʔϝοηʔδͷߏஙָ͕
σϞ
Ԡ༻ฤ
HTTPه๏ [http://example.com/] # <a href="http://example.com/"> # http://example.com/ # </a> [http://127.0.0.1/:title=΅͘ͷIP]
# <a href="http://example.com/"> # ΅͘ͷIP # </a>
HTTPه๏ • ΞϯΧʔϦϯΫʹม͞ΕΔه๏ • ඌʹల։࣌ͷΦϓγϣϯΛ `:` ʹଓ͚ͯطड़Ͱ͖Δ • `:` URLͷҰ෦ʹݱΕΔ͜ͱ͕͋Δ
• → ࣍ͷ1จࣈΛಡΉ͚ͩͰ:titleͷ։͔࢝அͰ͖ͳ͍
࠷ॳʹݱΕΔ `:` εΩʔϜ෦ͱݟͳͯ͠ແࢹ͢Δ͜ͱʹ if !l.seenColon { l.seenColon = true return
false // maybe part of URL } else { return true } https://github.com/aereal/gohn/blob/master/parser/ lex.go#L100
࠶ؼతͳϧʔϧ • N > 1ͷࢠنଇ͔ΒͳΔنଇͷॻ͖ํ • appendͷॱ൪͚ͩؒҧ͑ͳ͍Α͏ʹ
http_options: http_option { $$ = []string{$1} } | http_option http_options
{ options := $2 $$ = append([]string{$1}, options...) }
ςετ • Table-driven tests͕Φεεϝ • https://github.com/golang/go/wiki/TableDrivenTests • lexerΛؚΉparserͷػೳςετ͚ͩͰेͩͱࢥ͏ • https://github.com/aereal/gohn/blob/master/parser/
parser_test.go#L17
σόοά • tokenͷࣝผࢠ (int) ͔Β໊લ (string) Λ ٯҾ͖͢ΔϝιουΛఆ͓ٛͯ͘͠ͱศར • print͢ΔʹͤΑσόοΨΛ͏ʹͤΑ
• https://github.com/aereal/gohn/blob/master/parser/ lex.go#L29
·ͱΊ
Go/goyaccศར • GoෳࡶͳCLIΛϙʔλϒϧʹ࡞Δͷʹ͍͍ͯΔ • goyacc (yacc) ෳࡶͳจ๏ͷύʔαʔʹ͍͍ͯΔ
ܰྔϚʔΫΞοϓݴޠ ͍͠ • ਓؒʹͱͬͯͷಡΈॻ͖͢͠͞ͱ ػցʹͱͬͯͷಡΈॻ͖͢͠͞ҟͳΔ • ݫ֨ͳจ๏نଇʹैΘͤΔύʔαʔΑΓ ޡΓగਖ਼ͯ͘͠ΕΔ΄͏͕࣮༻తͳͷͰ?
ύʔαʔ࡞Γָ͍͠ • Ͱ͖Δ͜ͱɺΓ͍ͨ͜ͱɺؔ৺ͷ͋Δ͜ͱ͕ ͏·͘όϥϯε͞Εͨඪ • WebͱςΩετॲཧ • খ͞ͳඪΛগͣͭ͠ੵΈॏͶ͍͚ͯΔ • ʮࠓϦετه๏ͷ࣮͕Ͱ͖ͨͧʯ
ڵຯΛ࣋ͬͯ͘Εͨਓ • ·ͣJSONͷύʔαʔΛॻ͍ͯΈΔͱΑͦ͞͏ • RFC, relaxed JSON, etc. ʹൃలͤͯ͞ΈΔ •
࣍ࣈ۟ղੳثΛखॻ͖ͯ͠ΈΔ • ࣍ߏจղੳثखॻ͖ͯ͠ΈΔ