Regular Expression
Regular Expression, short as Regex, is an expression method for string.
Accurate Match
- “abc” match “abc”
- “a&c” match “a&c”
- “a\u548cc” match “a和c”, unicode of “和” is “548c”
Fuzzy Match
- ‘.’ match any one character
- ‘a.c’ match “abc”, “a&c”, “acc”
- ‘\w’ match any one alphabet, number or underline
- ‘\W’ match any non of above
- ‘\d’ mach any one digit
- ‘\D’ match any one non-digit
- ‘\s’ match one space or tab
- ‘\S’ match any one non-spce
Duplicate Match
- ‘*’ match any number of characters (including 0 character) (what character is defined previously)
- ‘A\d*’ match ‘A’, ‘A0’, ‘A138’
- ‘+’ match at least one chracter
- ‘?’ match 0 or 1 character
- ‘{n}’ match n characters
- ‘A\d{3}’ match ‘A320’
- ‘{n,m}’ match n-m characters
- ‘A\d{1,3}’ match ‘A0’, ‘A00’, ‘A000’
- ‘{n,}’ match at least n characters
Complex Match
- ‘^’ match the beginning of a string
- ‘$’ match the end of a string
- ‘[…]’ match any character in this range
- ‘[ABC]’ match A or B or C
- ‘[0-9]’ match digit from 0 to 9
- ‘[a-f]’ match alphabet from a to f
- ‘[^1-9]’ match non 1 to 9
- ‘(|)’ means or
- ‘(A|B)’ match A or B
- ‘?’ at the end means using non-greedy strategy (non match as much number of characters as possible)
- ‘\d??’ means match 0 character because
\d?
match 0 or 1, and the second ‘?’ means non-greedy thus means 0
- ‘\d??’ means match 0 character because
Group Match
- ‘()’ makes the matched string as a group