Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learn to Love Regular Expressions

Learn to Love Regular Expressions

Regular expressions, or regex, need not be scary and thus a barrier to adoption in your code. This session introduces the basic building blocks, shows how easy they are to test and use via PowerShell and goes into grouping which arguably is its most useful feature as it allows you to validate and parse/tokenise strings in a single statement.

Guy Leech

April 03, 2020
Tweet

More Decks by Guy Leech

Other Decks in Technology

Transcript

  1. LEARN TO LOVE REGULAR EXPRESSIONS Guy Leech (@guyrleech) Citrix Technology

    Professional (CTP), VMware vExpert CUGC Leeds (Online), March 2020
  2. WHAT IS A REGULAR EXPRESSION (AKA REGEX)? • A search

    pattern • Pattern matching on steroids • Originated in 1951 • Used in many programming languages and products • PowerShell / PowerShell_ISE • Visual Studio (Code) • Perl • Vi • Findstr.exe (which has a quick reference in the help) • Notepad++ • …. • Be aware that there are a number of subtly different implementations
  3. WHY USE REGULAR EXPRESSIONS? • To find some text •

    Global search and replace • Checking log files • Validating input • To get a specific string or number from text • To replace some text with different text • To insert or delete text • To tokenise a string for further processing
  4. BASIC REGULAR EXPRESSION BUILDING BLOCKS • . (any character) "fred"

    –match '.' • * (zero or more of preceding item) "1234" –match '\d*' • + (one or more of preceding item) "1234" –match '\d*' • Very little practical difference between + and * • ? (one or none) "plural" –match 'plurals?' • ^ (start of line) "Guy Leech" –match "^Guy" • Also used to negate range in [ ] • $ (end of line) "Guy Leech" –match "Leech$" • Beware of arrays
  5. BASIC REGULAR EXPRESSION BUILDING BLOCKS • {} (count) "1234" –match

    '\d{2,4} • | (or) $surname –match "Leech|Smith|Jones" • [] (any one of or range) "Bit" –match 'B[aeiou]t' "Bit" –match 'B[a-z]t' • \ • \d (digit) • \s (space) • \\ (backslash) • \w (word character (alphanumeric)) • \b (word boundary)
  6. EXAMPLE REGULAR EXPRESSIONS • 'Windows 10' • 'Error.*(\d+)' • 'Yes|No'

    • '[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
  7. REGULAR EXPRESSIONS IN POWERSHELL • Operators • -match • -cmatch

    • -imatch • -notmatch • -split (-csplit & -isplit) • -split can take optional count • Not Split() method • -replace (-creplace & -ireplace) • Don't specify replacement to delete the matched text • Not Replace() method • Cmdlets • Select-String (alias sls) –CaseSensitive • Statements • Switch –regex –casesensitive • Scripts • Lots of mine – more flexible than –like • Compatible with Perl 5
  8. GROUPING • A way to be able to use matched

    items afterwards, e.g. parsing log records • Don't need to match and then split • '[Local SSD]\Folder\Filename' –match '\[([\w\s]+)\](.*)' • 'machine1 # this is a comment' –match '([^\s#]+)' • Named groups • Easier to understand when matches are used afterwards • But makes regex harder to read/understand • If match string changes, don't need to change $Matches item usage • '[Local SSD]\Folder\File' -match '\[(?<Datastore>[\w\s]+)\](?<Filename>.*)' • Can also use in –replace in PowerShell • '[Local SSD]\Folder\Filename' -replace '\[([\w\s]+)\](.*)' , 'vmstore:\$1$2' • '[Local SSD]\Folder\Filename' -replace '\[([\w\s]+)\](.*)' , "vmstore:\`$1`$2"
  9. TIPS’N’TRICKS • Use variables for complex looking patterns • $GUID

    = '[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}' • $input –match "A log line with a $guid" • Use $Matches to see what matched • It's a hashtable (dictionary) not an array • Build them up bit by bit interactively and test as you go with a string literal • if( "Bit" -match 'B[aeiou]t' ){ $matches } • Use matching groups to tokenise strings via $Matches • If you match then split, tokenise, etc there may be a more efficient way • Put an example of the text you are trying to match in a comment • Anchoring to start ^ or end $ of line can speed up matching • Usually multiple ways of matching – use what you find easiest • Flatten arrays via Out-String if not iterating line by line • Use [regex]::Escape() if you are using user input as part of a bigger regex • Use –or rather than trying to concoct a single fiendish regex • $Matches will get overwritten after another -match so user $Matches.Clone() if necessary • Or use [regex]::Match() to assign the matches to a variable
  10. RESOURCES • https://www.rexegg.com/regex-quickstart.html • https://regex101.com/ • https://regexr.com/ • Interactive PowerShell

    • https://docs.microsoft.com/en- us/powershell/module/microsoft.powershell.core/about/about_regular_expressi ons • https://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-A63F- AEB6DA0B921C/Regular%20expressions%20quick%20reference.pdf • Web Search – there's a lot out there
  11. GUY LEECH • Independent consultant, developer, trainer, adviser, troubleshooter, comedian

    • @guyrleech • [email protected] • guyrleech.wordpress.com • linkedin.com/in/guyrleech/ • github.com/guyrleech • Available for hire