Regular Expressions - Access 2 Learn

Regular Expressions, often called RegEx, is a sequence of characters that defines a search pattern to run against another string of data. This is often used to perform a find and replace or check for validation.

Different languages apply regular expressions differently. The strictest use the full range of special characters and commands, while others have properties which are set in their object instead for some of the options.

While the idea has been around since the 1950’s, regular expressions became more popular in the 1980’s as use to computer languages and applications became applicable. They can be found in word processors for finding sub-strings, search engines, and text and lexical analysis tools.

POSIX is the most common implementation of the rules, and what we’ll look at since that is what you would most likely find in the real world environment. There are various extended formats, but we’ll try to keep it simple.

The Basic Rules

There are some basic rules to RegEx, that knowing them will allow you to start building out your regular expressions. It helps define what we are searching for, often called a token.

Most often the token will need to be contained within certain characters. This is based on the language/library being used, but most commonly it will be either forward slashes or a pair of double quotes.

Wildcards

The wildcard character, the period, will allow you to search for any character. There is not limit with the wildcard characters as to what it can represent.

a.e

This example will let you find any three character match that starts with an a and ends with an e. So while ape, ate, and ale will all match, so will a8e, aue, and a$e.

While the wildcard is powerful, it can cause issues with “catching” too many possible combinations.

Boolean Or

Since the wildcard can be a little too “wild” at times, we might want to implicitly specify what options we want to see. For example, if we only wanted to match ape, ate, or ale, we could define a regex that does that.

The best way is to use a boolean or, or the pipe symbol (|) to specify value1 or value 2. This of course can be expanded to multiple levels of complexity.

value1|value2
ape|ate|ale

Grouping

Regex expressions can use grouping to make it easier to look for a fixed string and little changes. Consider the previous example where you have two set characters and one that changes. Combining grouping with the boolean or we can write a smaller regex. However, some might state that it’s more complicated, which seems to vary per person.

a(p|t|l)e

In this example, we start with looking for an a, then look for a p or t or l. If that is found, then we search for an e.

You can have as many characters as you want to in a group, including whole words.

Brackets

Square Brackets allow you to define any one character found within the brackets. So if you wanted to find a, b, or c, would write it as below.

[abc]

This is easier than using the boolean or in many cases. Especially when working with ranges which can be included within brackets

Range

Image the complexity of trying to specify every number, or every letter using the boolean or. Not only would it take a long time, it might be difficult to impossible to do without errors. Even when using brackets, it can be a real challenge.

Therefore inside brackets we can use a dash to list all characters between the two listed. So for example if we wanted to get all lower case letters we could use the following.

[a-z]

All uppercase, then we’d modify it to be:

[a-zA-Z]

Quantification

Quantification, is specifying how many times a character is supposed to appear. Regex describes five basic formats, (with a variation for a sixth) which allow you to build a quantifier for your token as to explain how many times it should appear for a match.

It could be as small as zero or one times, or as many as you want, as long as it exist. Here are the rules for quantification.

`?`	The question mark indicates zero or one occurrences of the preceding element. For example, `colou?r` matches both “color” and “colour”.
`*`	The asterisk indicates zero or more occurrences of the preceding element. For example, `ab*c` matches “ac”, “abc”, “abbc”, “abbbc”, and so on.
`+`	The plus sign indicates one or more occurrences of the preceding element. For example, `ab+c` matches “abc”, “abbc”, “abbbc”, and so on, but not “ac”.
`{n}`	The preceding item is matched exactly n times.
`{min,}`	The preceding item is matched min or more times.
`{min,max}`	The preceding item is matched at least min times, but not more than max times.

If you want your quantification to support multiple characters, you will need to group it in parenthesis.

Let’s say you wanted to support the US Social Security Number, you could do it with:

[0-9]{3}-[0-9]{2}-[0-9]{4}

A phone number seems simple, until you start looking at country codes, dashes, parenthesis, dots, or nothing for separators, etc. This is how regex gets complicated, because the real rules for building out an email, phone number, etc are actually very difficult. Even simple things are not – consider a rule for defining an instance name (variable, object, or function/method) in C++. It might look something like:

[a-zA-Z]{1}[a-zA-Z_0-9]{0,31}
[a-zA-Z]{1}[a-zA-Z_0-9]*

This has meets the requirement of starting with a single letter, upper case or lower case, and then having between 0 and 31 letters, numbers, or underscores in the name. This limits you to 32 characters, which some old compilers did for internal memory handling themselves.

The second one gets rid of the 32 character limit and opens it up to be anything from nothing on to infinity…which may not be the best choice.

This is just an introductory level into RegEx. As you build more and more complex patterns, you get into complexity that can provide for a lot of simple and obvious results, but also be potentially difficult to build and debug.

I always recommend starting simple, and building complexity into the process to ensure that the answer is correct that you get out of your regular expression.

Regular Expressions was originally found on Access 2 Learn

Walter Wimberly

Assistant Professor

Walter Wimberly is an Assistant Professor at a regional college in Tennessee, teaching Computer Science in the Software Engineering track. He works as a student advisor, oversees curriculum changes, develops new courses, and manages the advisory panel.
Walter taught full time for about 7 years, before going back into “industry” as a full stack Software Developer for a dozen years. There he focused on web based projects coding in JavaScript/jQuery and utilizing the Bootstrap CSS Framework on the front-end, and coding in PHP, ASP/ASP.Net, SQL on the back-end.

Since he loves teaching, he taught as an adjunct web and digital media classes for eight (8) years, while working in industry, and has since returned to teaching full time.

He has been married for over 25 years, and is father to several special needs boys. As such, he is working on some projects to help others who have special needs to be self-sufficient, and support the care givers of those with special needs. Check out his Autism blog for more info.

Algorithms

Merge Sort

ByWalter Wimberly February 3, 2020February 3, 2020

Like the Quick Sort, the Merge Sort uses a divide and conquer method to help sort the data. In a given array, you start to subdivide your array into smaller arrays, sorting the data as you go along, only to merge it back together when you are done. The way you sort the data is…

Algorithms

Working with JSON – Reading a Data File

ByWalter Wimberly April 20, 2022April 20, 2022

With reading a JSON object, you will need to have your JSON data. You can either read in a string, or read a file into a string. Here we have some sample data: This minor sample came from a much larger file set: https://domohelp.domo.com/hc/en-us/articles/360043931814-Fun-Sample-DataSets Then we need to have a class which we can use…

Algorithms

Calculating a Password’s Strength

ByWalter Wimberly April 26, 2022April 26, 2022

Password entropy predicts how difficult a given password would be to crack through guessing, brute force cracking, dictionary attacks or other common methods. Entropy essentially measures how many guesses an attacker will need to make to guess your password. As computing power increases, the amount of time required to guess passwords decreases, in many cases significantly especially…

Algorithms

Binary Search

ByWalter Wimberly November 9, 2020November 9, 2020

If our data is pre-sorted then a binary search starts to make a lot of sense. It is like the guessing game we mentioned earlier where you need to find a number between 1 and 100. No one asks for 1, then 2, then 3… until they find it. Instead we ask for the mid…

Algorithms

Data Transformations

ByWalter Wimberly February 27, 2020February 27, 2020

Data transformation is often simple like splitting a full name into the first and last, or vice versa. However, sometimes it is more complicated, much more complicated. I once wrote a data converter which had to convert data stored in a file with the size of data store being a certain number of bytes into…

Algorithms

An Introduction to JSON

ByWalter Wimberly April 13, 2020April 13, 2020

JavaScript Object Notation, or JSON, is a solution to the problem of how do we move data between systems, especially data between a server and a web client. JSON is an open-standards, data interchange format that uses a serializable string of data to store information. It was initiated in the early 2000’s by Douglas Crockford,…

Algorithms

The Basic Rules

Wildcards

Boolean Or

Grouping

Brackets

Range

Quantification

Similar Posts