PHP Tips and Tricks – Match Pattern with PHP Regular expression

PHP offers  powerful way to create and match patterns in text. You can create rule that lets you look for patterns in strings of text. These rules are referred as regular expressions, or regex.

In simple terms, regular expressions are rule used to match patterns in one or more strings.

Example, lets look at the regular expression which will match 10-digits in a row. That means following pattern will match a string that consist of 10 digits number.
String will not match the pattern if it is longer or shorter than 10. String will also not match if string contains anything which is not number.

More concise way of writing same above regular expression using curly braces. Curly braces used to indicate repetition.

Regular expressions are cryptic and often difficult to read. But they are powerful.
PHP’s regex allow us to use more special expressions like \d to match characters other than digits. These expressions are called metacharacters.

Frequently used Metacharacters:

\d

  • This metacharacter look for digits.
  • It will match any number from 0 to 9
  • \d matches just one digit. If you would like to match multiple digits say 2, then use multiple metacharacters either \d\d or \d{2}

\s

  • This metacharacter looks for whitespace.
  • This is not just the space character that you get on hitting space bar. But this also match tab character or newline or carriage return.
  • \s will match just single space. If you wanted to match multiple space say 2, then you will have to use two \s\s or \s{2}.

\w

  • This metacharacter looks for alphanumeric character. That means either a number or a letter.
  • This will match one character from a-z (lowercase) and A-Z (uppercase), as well as 0-9 (digits)

^

  • Caret metacharacter looks for beginning of a string. You can use it to represent that match must happen at the start of text string, instead of anywhere in the string.
  • For example, /^\d{2}/ will match string “20″. But  /^\d{2}/ will not match string “test 20″.

.

  • Period metacharacter will any one character except new line. It can match a letter or digit, just like \w. Also this can match a space or a tab, like \s

$

  • This metacharacter looks for end of a string.
  • Example, /^\w{5}\s\d{3}$/ will match “Abcde 311″, but this pattern will not match “Abcde 311 test” or “Temp Abcde 311″

Examples: Phone number pattern and corresponding strings

RegeX Matching string
/^\d{3}\s\d{7}$/ 555 1234567
/^\d{3}\s\d{3}\s\d{4}$/ 555 123 4567
/^\d{3}\d{3}-\d{4}$/ 555123-4567
/^\d{3}-\d{3}-\d{4}$/ 555-123-4567
/^\d{3}\s\w\w\s\w{5}$/ 555 ME MYSQL
/^\d{10}$/ 5551234567

Quantifiers:
If we have regex /^\d{3}-\d{3}-\d{4}-\d{4}$/ pattern for phone number, where last for digit represent say phone number extension. Many times extension may not be present in phone numbers. In such cases, we can use regular expressions to indicate some part of the string as optional.
Regex support a feature called quantifiers that lets you specify how many times character or metacharacter should appear in a pattern. We are already familiar with quantifier.


Here, curly brace acts as quantifier.

Other frequently used quantifiers:
{min , max}
When there are two numbers in curly braces separated by comma, this indicate randg of possible times the preceding character or metacharacter should be repeated.  Example {2 , 4}. Here it indicates that its should appear 2, 3 or 4 times in a row.

+
The preceding character or metacharacter must appear one or more time.

*
The character or metacharacter can appear one or more times… or not at all.

?
The preceding character or metacharacter must appear once or not at all.

Now, lets come back to our main discussion about matching optional digits at the end of our phone numbers. We can us following patter for this:


Character Class
Character class let you match characters from specific set of values. You can look of range of digits with character class. You can use caret to look for everything that isn’t in the set.
To represent those characters or metacharacters, you can use character class surrounded by square brackets, [].

In simple terms, a character class is a set of rules for matching a single character.

Examples of Character Class:
[0 - 2]
This matches range of numbers. It will match 0, 1, or 2.

[A - D]
This will match A, B, C, or D

[^b - f]
Here caret has different meaning. Here it is not used to represent that “starting must match”. Here it is used to represent “match everything except”. This will match everything except b, c, d, e, or f

Final Example:
International(US) phone number pattern: (note: US phone number cannot start with 0 or 1. because o is for operator assistance and 1 is for long distance. )

This entry was posted in Web Application Development and tagged . Bookmark the permalink.

Comments are closed.