Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PCRE With PHP
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Thomas Weinert
January 24, 2015
Programming
810
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
PCRE With PHP
PHP Benelux 2015
Thomas Weinert
January 24, 2015
More Decks by Thomas Weinert
See All by Thomas Weinert
Build Automation with Phive and Phing
thomasweinert
0
280
Introduction: PHP Extensions
thomasweinert
2
870
PCRE - Matching Patterns
thomasweinert
0
170
Controlling Arduino With PHP
thomasweinert
2
600
Modern PHP
thomasweinert
3
250
Controlling Arduino With PHP
thomasweinert
1
190
XPATH WITH PHP AND JS
thomasweinert
0
160
PHPUG CGN: Arduino With PHP
thomasweinert
0
160
IPC 2013: Controlling Arduino With PHP
thomasweinert
0
260
Other Decks in Programming
See All in Programming
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
3
710
さぁV100、メモリをお食べ・・・
nilpe
0
140
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
250
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
260
メソッドのジェネリクスでGoの夢は広がるか? / Kyoto.go #65
utgwkk
3
830
ユニットテストの先へ:テスト技法で要求・仕様を整理するJava開発実践 / Beyond_Unit_Testing_Practical_Java_Development_Techniques_for_Organizing_Requirements_and_Specifications
shimashima35
0
410
OSもどきOS
arkw
0
570
Lessons from Spec-Driven Development
simas
PRO
0
210
Vite+ Unified Toolchain for the Web
naokihaba
0
320
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
180
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
550
CSC307 Lecture 17
javiergs
PRO
0
320
Featured
See All Featured
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
250
How to make the Groovebox
asonas
2
2.2k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
440
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
340
How to build a perfect <img>
jonoalderson
1
5.7k
WCS-LA-2024
lcolladotor
0
650
It's Worth the Effort
3n
188
29k
Music & Morning Musume
bryan
47
7.2k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
140
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.8k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
400
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
Transcript
PCRE WITH PHP @Thomas Weinert
ABOUT PHP functions and classes PCRE syntax
WARNING! Slides contain a lot of example source Most of
the examples are really stupid
PREG_MATCH() Pattern Subject Matches Flags Offset
PREG_MATCH() EXAMPLE preg_match('(a.?)', 'abac', $match); var_dump($match); array(1) { [0]=> string(2)
"ab" }
FLAG: PREG_OFFSET_CAPTURE preg_match('(a.?)', 'abac', $match, PREG_OFFSET_CAPTURE, 2); var_dump($match); array(1) {
[0]=> array(2) { [0]=> string(2) "ac" [1]=> int(2) } }
OFFSET $subject = 'aa ab ac ad'; $offset = 0;
$length = strlen($subject); while ($offset < $length) { if (preg_match('(a.)', $subject, $match, PREG_OFFSET_CAPTURE, $offset)) { $offset = $match[0][1] + strlen($match[0][0]); var_dump($match[0][0]); } else { break; } } string(2) "aa" string(2) "ab" string(2) "ac" string(2) "ad"
PATTERN Delimiter Expression Modifiers /expression/x
DELIMITER Any non alphanumeric character Escaping Special meaning Brackets
DELIMITER: BRACKETS preg_match('((one)(two))', 'onetwo', $match); var_dump($match); array(3) { [0]=> string(6)
"onetwo" [1]=> string(3) "one" [2]=> string(3) "two" }
PATTERN String Escaping $pattern = '(\\\n)'; $text = <<<'TEXT' foo\nbar
TEXT; preg_match($pattern, $text, $match); var_dump($pattern, $text, $match); string(5) "(\\n)" string(8) "foo\nbar" array(1) { [0]=> string(2) "\n" }
MODIFIERS x - PCRE_EXTENDED u - PCRE_UTF8 D - PCRE_DOLLAR_ENDONLY
s - PCRE_DOTALL m - PCRE_MULTILINE i - PCRE_CASELESS ...
PCRE_EXTENDED $pattern = <<<'REGEX' (^ (d‐)? # optional country prefix
(\d{5}) # german zip code $)Dix REGEX; var_dump((bool)preg_match($pattern, 'D‐50670')); bool(true)
PCRE_UTF8 (^.*$)u Pattern and subject need to be valid UTF-8!
UTF-8 1 to 4 (5 and 6 are invalid)
PCRE_DOLLAR_ENDONLY $examples = [ ["(^\\d+$)", "123"], ["(^\\d+$)", "123\n"], ["(^\\d+$)D", "123\n"],
["(\\A\\d+\\G)", "123\n"] ]; foreach ($examples as $example) { var_dump((bool)preg_match($example[0], $example[1], $match)); } bool(true) bool(true) bool(false) bool(false)
PCRE_DOTALL $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)s", "123\n456"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(7) "123 456" }
PCRE_MULTILINE $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)m", "123\n456"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(3) "123" }
PCRE_CASELESS $examples = [ ["(foo)", "foo"], ["(foo)", "FOO"], ["(foo)i", "FOO"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "foo" } array(0) { } array(1) { [0]=> string(3) "FOO" }
PREG_MATCH_ALL() $subject = 'aa ab ac ad'; preg_match_all('(a.)', $subject, $match);
var_dump($match); array(1) { [0]=> array(4) { [0]=> string(2) "aa" [1]=> string(2) "ab" [2]=> string(2) "ac" [3]=> string(2) "ad" } }
PREG_PATTERN_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match); var_dump($match); array(2)
{ [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(2) "ac" } [1]=> array(2) { [0]=> string(1) "b" [1]=> string(1) "c" } }
PREG_SET_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match, PREG_SET_ORDER); var_dump($match);
array(2) { [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(1) "b" } [1]=> array(2) { [0]=> string(2) "ac" [1]=> string(1) "c" } }
PREG_REPLACE() var_dump( preg_replace("(')", '"', "'Hello'") ); string(7) ""Hello""
ARRAY ARGUMENTS var_dump( preg_replace(['(\\\r)', '(\\\n)'], ['CR', 'LF'], '\\r and \\n')
); string(9) "CR and LF"
REFERENCING SUBPATTERNS var_dump( preg_replace('(a(.))', 'a#${1}#', 'ab ac') ); string(9) "a#b#
a#c#" \\1 $1 ${1}
PREG_REPLACE_CALLBACK() No need for modifier "e" (PREG_REPLACE_EVAL) var_dump( preg_replace_callback( '(a(.))',
function ($match) { return strtoupper($match[1]); }, 'ab ac' ) ); string(3) "B C"
FUNCTOR class Replacer { public function __invoke($match) { return strtoupper($match[1]);
} } var_dump( preg_replace_callback( '(a(.))', new Replacer(), 'ab ac' ) );
PREG_SPLIT() $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,
$subject); var_dump($match); array(5) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(0) "" [3]=> string(5) "three" [4]=> string(4) "four" }
PREG_SPLIT_NO_EMPTY $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,
$subject, ‐1, PREG_SPLIT_NO_EMPTY); var_dump($match); array(4) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(5) "three" [3]=> string(4) "four" }
PREG_SPLIT_OFFSET_CAPTURE $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree"; $flags = PREG_SPLIT_NO_EMPTY
| PREG_SPLIT_OFFSET_CAPTURE; $match = preg_split($pattern, $subject, ‐1, $flags); var_dump($match); array(3) { [0]=> array(2) { [0]=> string(3) "one" [1]=> int(0) } [1]=> array(2) { [0]=> string(3) "two" [1]=> int(4) } [2]=> array(2) { [0]=> string(5) "three" [1]=> int(9) } }
PREG_SPLIT_DELIM_CAPTURE $highlights = ['small' => '*', 'short' => '_']; $pattern
= '((small|short))u'; $subject = "A small, short example"; $match = preg_split($pattern, $subject, ‐1, PREG_SPLIT_DELIM_CAPTURE); foreach ($match as $part) { if (isset($highlights[$part])) { echo $highlights[$part], $part, $highlights[$part]; } else { echo $part; } } A *small*, _short_ example
PREG_QUOTE() var_dump('('.preg_quote('/.*/').')'); string(8) "(/\.\*/)"
REGEXITERATOR $data = new ArrayIterator(['aa', 'ab']); $iterator = new RegexIterator(
$data, '(.(.))', RegexIterator::REPLACE ); $iterator‐>replacement = '$1'; var_dump(iterator_to_array($iterator)); array(2) { [0] => string(1) "a" [1] => string(1) "b" }
REGEXITERATOR MODES MATCH GET_MATCH ALL_MATCHES SPLIT REPLACE USE_KEY
UNICODE Modifier u All: \X Token: \x{A9} Category: \p{L} Negation:
\P{L}, \p{^L} Scripts: \p{Hangul} Blocks: \p{Arrows}
UNICODE EXAMPLE $data = <<<'DATA' English German 한국어 日本語 DATA;
preg_match_all('(\\pL+)u', $data, $match); var_dump($match[0]); array(4) { [0] => string(7) "English" [1] => string(6) "German" [2] => string(9) "한국어" [3] => string(9) "日本語" }
NON CATCHING SUBPATTERNS preg_match('((?:one)(two))', 'onetwo', $match); var_dump($match); array(2) { [0]=>
string(6) "onetwo" [1]=> string(3) "two" }
SUBPATTERN MODIFIERS (?i‐sm) $examples = [ ["((?i)foo)", "FOO"], ["((?‐i)foo)i", "FOO"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "FOO" } array(0) { }
NAMED SUBPATTERNS $pattern = "(^ (?P<year>\d{4}) (?:‐(?<month>\d{1,2}))? (?:‐(?'day'\d{1,2}))? )x"; preg_match($pattern,
"2015‐01‐24", $match); var_dump($match);</month></year> array(7) { [0]=> string(10) "2015‐01‐24" ["year"]=> string(4) "2015" [1]=> string(4) "2015" ["month"]=> string(2) "01" [2]=> string(2) "01" ["day"]=> string(2) "24" [3]=> string(2) "24" }
PRE-DEFINED SUBROUTINES $pattern = "( ^ (?&number) (?:\\.(?&number)){3} $ (?(DEFINE)
(?'number'25[0‐5]|2[1‐4]\d|1\d{2}|\d{1,2}) ) )x"; var_dump((bool)preg_match($pattern, "127.0.0.1", $match)); var_dump((bool)preg_match($pattern, "355.0.0.1", $match)); bool(true) bool(false)
ASSERTIONS Look Around Look Ahead Look Behind
LOOK AHEAD $examples = [ ["(h(?=e))", "hello"], ["(h(?=e)llo)", "hello"], ["(h(?=e).llo)",
"hello"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "h" } array(0) { } array(1) { [0]=> string(5) "hello" }
LOOK AHEAD - NEGATION $examples = [ ["(h(?!e))", "hello"], ["(h(?!e))",
"hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(0) { } array(1) { [0]=> string(1) "h" }
LOOK BEHIND $examples = [ ["((?<=h).)", "hello"], ["((?<!h).)", "hallo"] ];
foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "e" } array(1) { [0]=> string(1) "h" }
LOOK BEHIND - ALTERNATIVES $examples = [ ["((?<=e|ha|.{2})l)", "hello"], ["((?<=e|ha)l)",
"hallo"], ["((?<=e|.{2})l)", "hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" }
LOOK BEHIND - UNKNOWN LENGTH preg_match("((?<=.{2,})l)", 'hello', $match); Warning: preg_match():
Compilation failed: lookbehind assertion is not fixed length at offset 9 in /tmp... on line 2
CONDITIONALS $pattern = '((?<quote>[\'"])?(?(quote).*?\\k<quote>|\\w+))'; $data = ['foo', '"foo"', "'foo'", 'foo
bar', '"foo bar"']; foreach ($data as $subject) { if (preg_match($pattern, $subject, $match)) { echo $match[0], "\n"; } }</quote></quote> foo "foo" 'foo' foo "foo bar"
RECURSIONS $pattern = <<<'PCRE' ( \( ( (?>[^()]+) | (?R)
)* \) )Ux PCRE; preg_match_all($pattern, '(ab(cd)ef)(gh)', $match); var_dump($match); array(2) { [0] => array(2) { [0] => string(10) "(ab(cd)ef)" [1] => string(4) "(gh)" } [1] => array(2) { [0] => string(1) "f" [1] => string(1) "h" } }
START OF PATTERN MODIFIERS (*UTF), (*UTF8), (*UTF16), (*UTF32) (*UTF)(*UCP) =
u (*CR), (*LF), (*CRLF), (*ANYCRLF), (*ANY) (*BSR_ANYCRLF), (*BSR_UNICODE) - \R (*LIMIT_MATCH=x), (*LIMIT_RECURSION=d) (*NO_AUTO_POSSESS), (*NO_START_OPT) (*NOTEMPTY), (*NOTEMPTY_ATSTART)
CONTROL VERBS (SKIP*)(?!) (PRUNE*) (THEN*) (COMMIT*) (ACCEPT*) http://perldoc.perl.org/perlre .html#Special-Backtracking-Control-Verbs
REGEX101.COM
VERSIONS PCRE2 10.0 2015-01-05 PCRE 8.36 2014-09-26 3V4L.ORG PHP7, HHVM
>= 3.3: 8.35 2014-04-04 PHP >= 5.5.10: 8.34 2013-12-15
LINKS http://www.rexegg.com/ http://www.regular-expressions.info/ https://www.regex101.com/
THANKS