Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Regex-fu

 Regex-fu

Presented on September 10 2020 at the PHPBenelux virtual meetup.
https://www.meetup.com/phpbenelux/events/273015264/
---------------------------------------------------------------
Regular expression, you either hate them or you love them, but do you really know how to harness their power ? Based on the PCRE implementation, this talk will show you how to get the most out of your /^regex(es)?$/, how switches affect your results, how to be less greedy, how to assert your power and let's not forget: when *not* to use regex.
---------------------------------------------------------------

Juliette Reinders Folmer

September 10, 2020
Tweet

More Decks by Juliette Reinders Folmer

Other Decks in Programming

Transcript

  1. Regex Engines POSIX PCRE ECMAscript Oniguruma Boost DEELX RE2 TRE

    Pattwo GRETA GLib/ GRegex FREJ RGX QT CL-PPCRE Jakarta Henry Spencer’s regex
  2. Regex Engines Boost DEELX RE2 TRE Pattwo GRETA GLib/ GRegex

    FREJ RGX QT CL-PPCRE Jakarta Henry Spencer’s regex Oniguruma POSIX ECMAscript PCRE
  3. A a 1 . ? * + {#} [...] (

    ... | ... ) ^ ... $ \w \d \s g m s i  Literals  Wildcard  Quantifiers  Character ranges  Grouping and alternation  Anchors  Shorthand character codes  Modifiers Basic Syntax A a 1 ? * + {#} [...] \w \d \s ( ... | ... ) ^ ... $ g m s i .
  4. Jamie Zawinski, August 1997 alt.religion.emacs Some people, when confronted with

    a problem, think "I know, I'll use regular expressions." Now they have two problems.
  5. 2. Nothing in life is to be feared. It is

    only to be understood. Marie Curie
  6. / / o on one one. one.* one.*s one.*s. one.*s.?

    one.*s.?t one.*s.?t [a-z] one.*s.?t[a-z]+ one.*s.?t[a-z]+p = space one.*s.?t[a-z]+p one.*s.?t[a-z]+p . one.*s.?t[a-z]+p . {2,} one.*s.?t[a-z]+p .{2,}, one.*s.?t[a-z]+p .{2,}, We take one step forward, two steps back ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  7. Character classes PCRE POSIX [0-9] [^0-9] \d \D [[:digit:]] [^[:digit:]]

    [A-Za-z0-9_] [^A-Za-z0-9_] \w \W [[:word:]] [^[:word:]] [\t\f\r\n \v] [^\t\f\r\n \v] \s \S [[:space:]] [^[:space:]] [\t\f ] [^\t\f ] \h \H [[:blank:]] [^[:blank:]] [\r\n] [^\r\n] \v \V - -
  8. String delimiter - for prog language Regex delimiter - for

    regex - for prog language Meta-characters - for regex - for prog language What to Escape ?
  9. \[ \] \( \) \| \. \? \* \+ \{

    \} \^ \$ \\ \/ Literals [ ] ( ) | . ? * + { } ^ $ \ / (delimiter) Special Meaning Escaping Meta Characters
  10. [(] [)] [|] [.] [?][*][+][{][}] [$] [/] Literals [ ]

    ( ) | . ? * + { } ^ $ \ / (delimiter) Special Meaning Escaping Meta Characters
  11. Java String.quote() quoteReplacement() PHP preg_quote() Matlab regexptranslate() Python re.escape() Objective-C

    escapedTemplateForString() escapedPatternForString() Ruby Regexp.escape() Regexp.quote() Escaping Arbitrary Strings // Javascript: function escapeInputString( str ) { return str.replace(/[[\]\/\\{}()|?+^$*.-]/g, "\\$&"); }
  12. /^(( 25[0-5]| # Match 250-255 range 2[0-4][0-9]| # Match 200-249

    range [01]?[0-9]{1,2} # Match 0-199 range )\.){3} # Repeat 3 times with period (25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2}) # and once without $/x
  13. [0] – Complete match [1] – Match against sub-pattern 1

    [2] – Match against sub-pattern 2 [3] – Match against sub-pattern 3 ... Match Array Photo by Petr Kratochvil
  14. [0] – Complete match [firstname] – Match against named sub-pattern

    firstname [lastname] – Match against named sub-pattern lastname ... Match Array Photo by Petr Kratochvil
  15. — Richard Feynman Know how to solve every problem that

    has been solved. What I cannot create, I do not understand. Photo by Gleick, J. Genius. p. 310f