What is Regex?

Introduction

Regex or regular expression is a sequence of character that defines a search pattern for a string. Imaging given a string, we need to write a program in Java to differentiate between “A2”, “ABC”, “123”, “-123”, “-123.123”, “123.123” or “1 2 3”. Without regex, it would be hassle implement brute force type of checking that involves looping a character after another and check if it is equal to a matching character. Using regex, we can define our pattern string and check if our string.matches(pattern) .

Rules of Writing Regex

Common Matching Symbol

Defines matching behavior

.         Matches any character
^regex    Finds regex that must match at the beginning of line
regex$    Finds regex that must match at the end of line
[abc]     Set definition, can match a/b/c
[abc][de] Set definition, can match a/b/c followed by d/e
[^abc]    Set definition, anything except a/b/c
[a-z0-9]  Ranges, matches from a to z or 0 to 9
X|Z       Matches X or Z
XZ        X followed by Z
$         Check if line end follows

Meta Characters

Defines what the character is

d       a digit, short for [0-9]
D       a non-digit, short for [^0-9]
s       a white space character
S       a non-white space character
w       a word character
W       a non-word character
S+      1 or more non-white space character
b       Matches a word boundary where a word character is 
[a-zA-Z0-9_]

Quantifier

Defines how often a character should occur

*       Occurs zero or more times
+       Occurs 1 or more times
?       Occurs either zero or 1 time
{X}     Occurs X times
{X,Y}   Occurs between X and Y (inclusive)

Grouping

Several characters can be grouped together with a common quantifier using ()

Example Usage

String line1 = "A2";
String pattern1 = "[a-zA-Z][\d]";
String line2 = "ABC";
String pattern2 = "[a-zA-Z]+";
String line3 = "123";
String line4 = "-123";
String line5 = "-123.123";
String line6 = "123.123";
String pattern3 = "-?\d+(\.\d+)?";
String line7 = "3 3";
String pattern4 = "\d\s\d";
String line8 = "dav1d_cheah17@gmail.com";
String line9 = "d2vid-cheah1796@gmail.net";
String pattern5 = "[a-zA-Z0-9]+[\\-]?[\\_]?[a-zA-Z0-9]+@gmail.(com|net)";

(BONUS) Finding the right Regex using Sublime or Atom

At sublime or atom, hit keys command(mac) or control(win) with F to bring up find panel. Select .* which denotes regex and Aa for case sensitive. Simply type the regex expression on the search field and the result will be highlighted.

Conclusion

Regex is powerful that just by using 2 lines, you are able to achieve string matching that was previously done using brute force.

About the author

Founder of tattweicheah.com. Loves music, sport and most importantly software development.

Leave a Reply

Your email address will not be published. Required fields are marked *