Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
More than Regexp
Search
Zete
November 18, 2012
Programming
1.3k
16
Share
More than Regexp
RubyConfChina 2012 超越正则表达式的正则表达式
Zete
November 18, 2012
More Decks by Zete
See All by Zete
Thread Mist
luikore
7
810
Intro to Rust Programming Language
luikore
2
280
GC in Ruby 2.2
luikore
0
190
Building an Async Server with Fiber
luikore
3
830
Ruby 2.1 Walk Thru (title bait)
luikore
14
7.9k
Other Decks in Programming
See All in Programming
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
2.3k
運用エージェントは "作る" から "育てる" へ - 記憶と自己進化の3層設計パターン / self-evolving-agents-three-layer-agent-design
gawa
12
3.3k
TSKaigi2026-静的解析への投資がAI時代のコード品質を支える ── カスタムESLintルールの設計と運用
hayatokudou
7
1.3k
Old Dog, New Tricks: The Java 25 Reinvention - JNation
bazlur_rahman
0
140
AIとRubyの静的型付け
ukin0k0
0
480
メソッドのジェネリクスでGoの夢は広がるか? / Kyoto.go #65
utgwkk
0
230
誰も頼んでない機能を出荷した話
zekutax
0
150
These Five Tricks Can Make Your Apps Greener, Cheaper, & Nicer
hollycummins
0
250
Stage 3 Decorators でできること / できないこと / TSKaigi 2026
susisu
1
1.4k
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3k
肥大化するレガシーコードに立ち向かうためのインターフェース分離と依存の逆転 / JJUG CCC 2026 Spring
hirokunimaeta
0
280
Lemonade + Foundry Toolkit でお手軽アプリ開発
seosoft
1
250
Featured
See All Featured
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
160
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Skip the Path - Find Your Career Trail
mkilby
1
130
Building Adaptive Systems
keathley
44
3k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
950
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
The Language of Interfaces
destraynor
162
26k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
420
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.6k
Paper Plane (Part 1)
katiecoart
PRO
0
8.1k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
130
Evolving SEO for Evolving Search Engines
ryanjones
0
210
Transcript
More Than Regexp https://github.com/luikore Monday, November 19, 12
Who m i Monday, November 19, 12
I mastered Basic, C, C++, Objective-C, Java, C#, Mathematica, Ruby,
Perl, Python, CoffeeScript, Haskell, Scala, Groovy, R, SML, Erlang, F#, MASM / GNU assembly, LLVM assembly Monday, November 19, 12
I mastered Basic, C, C++, Objective-C, Java, C#, Mathematica, Ruby,
Perl, Python, CoffeeScript, Haskell, Scala, Groovy, R, SML, Erlang, F#, MASM / GNU assembly, LLVM assembly ’s hello world Monday, November 19, 12
I know quite a lot about web development technologies, compiler
techniques, functional programing, quantum mechanics, abstract algebra, category theory and medieval history Monday, November 19, 12
I know quite a lot about web development technologies, compiler
techniques, functional programing, quantum mechanics, abstract algebra, category theory and medieval history a little bit Monday, November 19, 12
I am a rubyist just like you Monday, November 19,
12
on Languages Monday, November 19, 12
Parsing Parsing is reading, the computer way Monday, November 19,
12
Regexp The built-in parsing tool of Ruby Monday, November 19,
12
★ Search and replace Monday, November 19, 12
★ Search and replace ★ Validate form data Monday, November
19, 12
★ Search and replace ★ Validate form data ★ Implement
protocols Monday, November 19, 12
★ Search and replace ★ Validate form data ★ Implement
protocols ★ Virus scan Monday, November 19, 12
★ Search and replace ★ Validate form data ★ Implement
protocols ★ Virus scan ★ Matching mRNA motif Monday, November 19, 12
★ Search and replace ★ Validate form data ★ Implement
protocols ★ Virus scan ★ Matching mRNA motif ★ ... Monday, November 19, 12
A Brief History Monday, November 19, 12
★ Stephen Cole Kleene, 1950s ★ Ken Thompson, ed, g/re/p
★ Larry Wall, Perl Monday, November 19, 12
★ Stephen Cole Kleene, 1950s ★ Ken Thompson, ed, g/re/p
★ Larry Wall, Perl Monday, November 19, 12
★ Stephen Cole Kleene, 1950s ★ Ken Thompson, ed, g/re/p
★ Larry Wall, Perl Monday, November 19, 12
★ Stephen Cole Kleene, 1950s ★ Ken Thompson, ed, g/re/p
★ Larry Wall, Perl Monday, November 19, 12
★ Stephen Cole Kleene, 1950s ★ Ken Thompson, ed, g/re/p
★ Larry Wall, Perl Monday, November 19, 12
Modern Implementations Goes far beyond the original definition Monday, November
19, 12
★ Looks around Monday, November 19, 12
★ Looks around ★ Unicode support Monday, November 19, 12
★ Looks around ★ Unicode support ★ Matching history Monday,
November 19, 12
★ Looks around ★ Unicode support ★ Matching history ★
PEG engine in fact Monday, November 19, 12
★ Looks around ★ Unicode support ★ Matching history ★
PEG engine in fact ★ Ruby 1.9 Oniguruma (َ⻋车) Monday, November 19, 12
★ Looks around ★ Unicode support ★ Matching history ★
PEG engine in fact ★ Ruby 1.9 Oniguruma (َ⻋车) ★ Ruby 2.0 Onigmo (َӠ) Monday, November 19, 12
Regexp in Ruby RUBY_VERSION =~ /1\.9|2\.0/ s = “肿㜮办?” s[/肿㜮/]
= “ዎ㜮” #=> “ዎ㜮办?” %r(#{words.join ‘|’}) Monday, November 19, 12
foo “foo” exactly Monday, November 19, 12
foo “foo” exactly . matches ANY char Monday, November 19,
12
foo “foo” exactly . matches ANY char a|b or Monday,
November 19, 12
foo “foo” exactly . matches ANY char a|b or a?
maybe yes, maybe no Monday, November 19, 12
foo “foo” exactly . matches ANY char a|b or a?
maybe yes, maybe no a* kleeeeeene star Monday, November 19, 12
foo “foo” exactly . matches ANY char a|b or a?
maybe yes, maybe no a* kleeeeeene star a{0} repeat by 0 times Monday, November 19, 12
(a) group Monday, November 19, 12
(a) group \1 back ref (fixed) Monday, November 19, 12
(a) group \1 back ref (fixed) (?<name>a) define named group
Monday, November 19, 12
(a) group \1 back ref (fixed) (?<name>a) define named group
\g<name> use named ref Monday, November 19, 12
Difference between back ref and named group backref = /(\w+)
\1/ backref =~ ‘ha ha’ # 0 backref =~ ‘ha ho’ # false named = /(?:<word>\w+) \g<word>/ named =~ ‘ha ha’ # 0 named =~ ‘ha ho’ # 0 Monday, November 19, 12
★ Complex regexp contains much information Monday, November 19, 12
★ Complex regexp contains much information ★ Add space to
make it human-readable Monday, November 19, 12
★ Complex regexp contains much information ★ Add space to
make it human-readable ★ Try not to make too-complex regexps Monday, November 19, 12
What does it do? /^[ \t]*(?:class)\s*(.*?) \s*(<.*?)?\s*(#.*)?$/ Monday, November 19,
12
Add margins and paddings /^ [\ \t]* (?:class)\s* (.*?)\s* (<.*?)?\s*
(#.*)? $/x Monday, November 19, 12
Alignment reduces visual complexity: /^ [\ \t]* (?:class) \s* (.*?)
\s* (<.*?)? \s* (#.*)? $/x Monday, November 19, 12
Add comments r = /^ [\ \t]* (?:class) \s* (.*?)
\s* # class name (<.*?)? \s* # inheritance (#.*)? # line comment $/x r =~ “class A < B # match!” Monday, November 19, 12
Mathematical modeling languages Formal Language Theory Monday, November 19, 12
Mathematical modeling languages Formal Language Theory ֶ Monday, November 19,
12
Mathematical modeling languages Formal Language Theory ֶ 语จ Monday, November
19, 12
Regular Expression expresses Regular Grammar which recognizes Regular Language, which
is Non-Recursive Monday, November 19, 12
Parsing Expression Grammar recognizes parsing expression language, which can be
Recursive Monday, November 19, 12
Monday, November 19, 12
Monday, November 19, 12
Monday, November 19, 12
Example -- Match the following strings: Monday, November 19, 12
զಓ Example -- Match the following strings: Monday, November 19,
12
զಓ զಓ㟬ಓզಓ Example -- Match the following strings: Monday, November
19, 12
զಓ զಓ㟬ಓզಓ զಓ㟬ಓզಓ㟬ಓզಓ Example -- Match the following strings: Monday,
November 19, 12
զಓ զಓ㟬ಓզಓ զಓ㟬ಓզಓ㟬ಓզಓ զಓ㟬ಓզಓ㟬ಓզಓ㟬ಓզಓ Example -- Match the following strings:
Monday, November 19, 12
զಓ զಓ㟬ಓզಓ զಓ㟬ಓզಓ㟬ಓզಓ զಓ㟬ಓզಓ㟬ಓզಓ㟬ಓզಓ ... Example -- Match the following
strings: Monday, November 19, 12
The Language (A Regular Language): L = { զಓ, զಓ㟬ಓզಓ,
... } Monday, November 19, 12
The Regexp (A Regular Grammar): /զಓ(㟬ಓզಓ)*/ Monday, November 19, 12
Structural Analysis: զಓ(㟬ಓ( զಓ(զಓ( 㟬ಓ(զಓ) )) )) Monday, November 19,
12
Grammar in BNF Notation: <A> ::= “զಓ” <B>? <B> ::=
“㟬ಓ” <A> Monday, November 19, 12
Grammar in BNF Notation: <A> ::= “զಓ” <B>? <B> ::=
“㟬ಓ” <A> Tail Recursion Monday, November 19, 12
The “PEG-Flavored” Regexp: / (?<A> զಓ \g<B>? ){0} (?<B> 㟬ಓ
\g<A> ){0} \g<A> /x Monday, November 19, 12
ᥨښੋষ⻥鱼ത࢜తၰࢠ The really-recursive example (1): Monday, November 19, 12
ᥨښੋষ⻥鱼ത࢜తၰࢠ The really-recursive example (1): ओ语 Monday, November 19, 12
ᥨښੋষ⻥鱼ത࢜తၰࢠ The really-recursive example (1): ओ语 谓语 Monday, November 19,
12
ᥨښੋষ⻥鱼ത࢜తၰࢠ The really-recursive example (1): ओ语 宾语 谓语 Monday, November
19, 12
(?<ओ语>ᥨښ) (?<谓语>ੋ) (?<宾语>ষ⻥鱼ത࢜తၰࢠ) (?<陈ड़۟>\g<ओ语>\g<谓语>\g<宾语>) Monday, November 19, 12
ᥨښੋষ⻥鱼ത࢜తၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ The really-recursive example (2): Monday, November 19, 12
ᥨښੋষ⻥鱼ത࢜తၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ The really-recursive example (2): 陈ड़۟ Monday, November 19,
12
ᥨښੋষ⻥鱼ത࢜తၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ The really-recursive example (2): 陈ड़۟ ݪҼဓ۟ Monday, November
19, 12
(?<ݪҼဓ۟>Ҽ为ᥨষ⻥鱼༗ീḰ) (?<ྫྷস话>\g<陈ड़۟>,\g<ݪҼဓ۟>) Monday, November 19, 12
զࡏRubyConfChina্讲ྃྫྷস话: “ᥨښੋষ⻥鱼ത࢜త ၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ” The really-recursive example (3): Monday, November
19, 12
զࡏRubyConfChina্讲ྃྫྷস话: “ᥨښੋষ⻥鱼ത࢜త ၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ” The really-recursive example (3): 陈ड़۟ Monday,
November 19, 12
զࡏRubyConfChina্讲ྃྫྷস话: “ᥨښੋষ⻥鱼ത࢜త ၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ” The really-recursive example (3): 陈ड़۟ 宾语ಉҐ语
Monday, November 19, 12
(?<ओ语>ᥨښ|զ) (?<谓语>ੋ|ࡏRubyConfChina্讲ྃ) (?<宾语>ষ⻥鱼ത࢜తၰࢠ|ྫྷস话:\g<宾语ಉҐ语>) (?<宾语ಉҐ语>“\g<陈ड़۟>”) Monday, November 19, 12
զࡏRubyConfChina্讲ྃྫྷস话: “ᥨښੋষ⻥鱼ത࢜త ၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ”,ୠେՈ༗স The really-recursive example (4): Monday, November
19, 12
զࡏRubyConfChina্讲ྃྫྷস话: “ᥨښੋষ⻥鱼ത࢜త ၰࢠ, Ҽ为ᥨষ⻥鱼༗ീḰ”,ୠେՈ༗স The really-recursive example (4): 转ંဓ۟ Monday,
November 19, 12
(?<转ંဓ۟>ୠେՈ༗স) (?<ྫྷস话>\g<陈ड़۟>, (\g<ݪҼဓ۟>|\g<转ંဓ۟>)) Monday, November 19, 12
Combine them all Monday, November 19, 12
/ (?<ओ语> ᥨښ|զ ){0} (?<谓语> ੋ|ࡏRubyConfChina্讲ྃ ){0} (?<宾语> ষ⻥鱼ത࢜తၰࢠ|ྫྷস话:\g<宾语ಉҐ语> ){0}
(?<陈ड़۟> \g<ओ语>\g<谓语>\g<宾语> ){0} (?<ྫྷস话> \g<陈ड़۟>,(\g<ݪҼဓ۟>|\g<转ંဓ۟>) ){0} (?<ݪҼဓ۟> Ҽ为ᥨষ⻥鱼༗ീḰ ){0} (?<宾语ಉҐ语> “\g<ྫྷস话>” ){0} (?<转ંဓ۟> ୠେՈ༗স ){0} \g<ྫྷস话> /x Monday, November 19, 12
Use dictionaries in the sentence components, you can make a
natural language parser with “Regexp” (PEG in fact) Monday, November 19, 12
Use dictionaries in the sentence components, you can make a
natural language parser with “Regexp” (PEG in fact) (?<ओ语>ᥨښ|㫴ښ|౮笼ښ|...) Monday, November 19, 12
Real world language is a bit more than PEG, generally
Context Free Grammar. Monday, November 19, 12
In CFG, the branches are not ordered: Assume A and
B are two rules, A|B is the same as B|A in CFG. Monday, November 19, 12
Even CFG can’t solve some ambiguity Monday, November 19, 12
Even CFG can’t solve some ambiguity ༗Ұେၣረਖ਼ࡏۙ Monday, November 19,
12
Even CFG can’t solve some ambiguity ༗Ұେၣረਖ਼ࡏۙ ( ) ??
Monday, November 19, 12
Now you know more than regexp parsec, rsec, treetop, parselet
... Monday, November 19, 12
Simple markdown parser in 130 lines (many features ignored but...)
Real world example Monday, November 19, 12
It supports nested parens! (while ruby-china doesn’t) [ruby](http://en.wikipedia.org/wiki/Ruby_(programming_language)) <a href=’
http://en.wikipedia.org/wiki/Ruby_(programming_language)’> ruby </a> Monday, November 19, 12
The parser for nested parens: /(?<paren> \( ( [^\(\)]+ #
non-paren chars | # or \g<paren> # a paren )* \) )/x Monday, November 19, 12
★ ri Regexp ★ https://github.com/k-takata/Onigmo/tree/master/doc/RE ★ http://en.wikipedia.org/wiki/Parsing_Expression_Grammar Helpful Links Monday,
November 19, 12
More sugars with Onigmo If there’s time... Monday, November 19,
12
/\p{Han}/ /\p{Hiragana,Katakana}/ /\g'name'/ /\g<3>/ /\g'-3'/ /\k<1>/ # /\1/ /\k'name'/ /\k<-1>/
/\k'-n-level'/ Monday, November 19, 12
/(?<=backward)something(?=forward)/ /(?<!backward)something(?!forward)/ /(?<\b)hello/ # 1.9 :( # 2.0 :) Monday,
November 19, 12
? + * +? *? ?+ ++ *+ Can you
tell all of them? Monday, November 19, 12
Thanks Monday, November 19, 12