Regular Expression

Regular Expression, short as Regex, is an expression method for string.

Accurate Match

  • “abc” match “abc”
  • “a&c” match “a&c”
  • “a\u548cc” match “a和c”, unicode of “和” is “548c”

Fuzzy Match

  • ‘.’ match any one character
    • ‘a.c’ match “abc”, “a&c”, “acc”
  • ‘\w’ match any one alphabet, number or underline
    • ‘\W’ match any non of above
  • ‘\d’ mach any one digit
    • ‘\D’ match any one non-digit
  • ‘\s’ match one space or tab
    • ‘\S’ match any one non-spce

Duplicate Match

  • ‘*’ match any number of characters (including 0 character) (what character is defined previously)
    • ‘A\d*’ match ‘A’, ‘A0’, ‘A138’
  • ‘+’ match at least one chracter
  • ‘?’ match 0 or 1 character
  • ‘{n}’ match n characters
    • ‘A\d{3}’ match ‘A320’
  • ‘{n,m}’ match n-m characters
    • ‘A\d{1,3}’ match ‘A0’, ‘A00’, ‘A000’
  • ‘{n,}’ match at least n characters

Complex Match

  • ‘^’ match the beginning of a string
  • ‘$’ match the end of a string
  • ‘[…]’ match any character in this range
    • ‘[ABC]’ match A or B or C
    • ‘[0-9]’ match digit from 0 to 9
    • ‘[a-f]’ match alphabet from a to f
    • ‘[^1-9]’ match non 1 to 9
  • ‘(|)’ means or
    • ‘(A|B)’ match A or B
  • ‘?’ at the end means using non-greedy strategy (non match as much number of characters as possible)
    • ‘\d??’ means match 0 character because \d? match 0 or 1, and the second ‘?’ means non-greedy thus means 0

Group Match

  • ‘()’ makes the matched string as a group