As the old joke goes: “I had a problem that I decided to fix by using RegEx. Now I have two problems.”
Let’s face it, using regular expressions is like a trip to the dentist. We are always glad when it’s over, and we hope to keep the frequency to once or twice a year. The reality is that as systems administrators and developers we are faced with fun challenges that may require the use of regular expressions for searching in different tools and documents.
What exactly is a regular expression? For those lucky folks who never run into this, a regular expression is also called a RegEx which is a sequence of characters to define a search pattern. It is a profoundly powerful tool, but it also comes with some baggage because RegEx is known for its complexity. Why would we use this if it is so complex? The reason that it is widely used is because of the common appearance in many programming languages and search engines.
NOTE: RegEx is language and platform specific sometimes. There may be different behavior in Java, Ruby, PowerShell, Perl, as well as within certain search platforms. The Linux tool grep (which stands for globally search a regular expression and print) will also behave differently as some operators are not supported.
Testing RegEx to Make Life Easier
Luckily, in our internet connected world where we can always say “Hey, there’s an app for that!” we truly can save some time and pain. There is a great web site called RegExr that lets you test drive some regular expressions right in your browser with sample text. This has been an invaluable resource for me and for many admins and developers, so hopefully it can ease the pain a little with RegEx for you too.
Take a look over at the RegExr site and try out the search criteria we have listed above, and then we can see what is next in your RegEx journey and also to dive into very cool ways that you can use RegEx in your day to day tasks. This is just a little taste, so don’t worry. There are many more lessons to go through which will get your RegEx game up in a big way hopefully 🙂
Getting Started with RegEx
You already know the typical search wildcards like asterisk (*) to mean anything, or percent (%) which usually means anything in a single character. RegEx has much more selection criteria, and as such needs a few more lessons to get rolling.
Here are some basic operators that you can use:
. means any character (except a newline) so searching for d.g would result in ‘dag’ ‘dbg’ ‘dcg’ ‘ddg’ etc.
* means zero or more such as d*e could be ‘done’ or ‘discoposse’ or ‘de’
| means OR so that you can use left|right which would find ‘left’ or ‘right’ (Note: this won’t work with grep due to the pipe symbol having a different meaning)
A big challenge in using RegEx is that you may wan to search for a character which is a meaningful character within a regular expression. For instance, what if you needed to search for an asterisk within a string? Searching for parenthesis, asterisks, quotation marks and other special characters means that you will have to escape those within your RegEx. Escape characters will be a little to challenging to the eyes sometimes, but make sense once you realize all of the special phrases that must be excluded.
escapes the following character such as * means it ignores the asterisk
. would find a period in the string you are searching
For example, if you wanted to find
in your string, you would need to use
There are also character classes which can add to the mix:
w means any single character from A-Z
w+ means any word which is a series of characters from A-Z
W means any sequence that isn’t a word
d means any digit, which is any number sequence at all
D means anything not a digit
s means any space
S means anything that is not a space
Confusing? Yes. Powerful? Also, yes! We will do some really cool examples in our next post in the series to help illustrate real world examples of how RegEx can be used.
Image Source: http://www.quickmeme.com/meme/3rs412