Standard regex for multiple different domains
Optimized regex for URLs with the same domain (reading the section "How does it work" in the FAQ before using this option is highly recommended)
Ignore protocols (http:// + https://) ->
Ignore traling slashes (/) ->
Use double slash escaping (\\.) ->
Regex end matching changes
Regex start matching changes
What is URL Regexator
URL Regexator is a tool for creating regex from multiple URLs. It will help you with creating custom segments in Google Analytics and analyzing website data. Using Regexator is much faster than creating custom spreadsheets in Excel or Google Sheets with many tricky functions.
How Does It Work
Input dataset of URLs or domains into the first text area. These will be used for creating your final regex. The optimal amount is approximately 1.000 rows. Then, based on your data, choose from two options, how to create your regex.
Regex from dataset with different domains
A button Regex (different domains) will create an essential, longer, and less efficient regular expression. On the other hand, it will be much more foolproof and more reliable to use. Let’s see an example with four URLs, four different subdomains, four different paths, one domain name, and 109 characters on input.
Final regex will be 137 characters long. You can see that there are some parts repeated over and over again, wasting a lot of space. Specifically, it’s the domain. This type of expression is best used for multiple different domains.
Regex from dataset with the same domain
The second option triggered with a button Regex (same domain) is useful for a dataset with URLs from one single domain. It’s more efficient and saves a lot of characters. Unfortunately, it’s more tricky, and you should be pretty damn sure you know precisely what you are doing and have superb knowledge of your data.
First of all, you need to specify your domain name in the text box below the button. It’s necessary since there is no way to detect the domain with 100 %. And there are several ways to specify the domain. We’ll get to that later. For now, let’s input the same dataset of URLs as before, type
link-brain.com into the domain detection field, and check the result.
It’s only 87 characters long and looks a bit more compact. There is still some room for improvement. In the text box, you can specify more or fewer details. Type dot before a domain and/or trailing slash after. It can further reduce the final length of the expression. But it will also influence the regex matching logic. Let’s check other quick examples.
Now type domain with the dot at the beginning
.link-brain.com. You will save six characters with a total regex length of 81 characters.
And now try the dot at the beginning and slash at the end
.link-brain.com/. You will save another six characters with a total regex length of 75 characters.
Empty rows, empty groups, whitespaces
All empty rows will be automatically ignored. And all empty groups as well. For example, instead of regex with an empty group, you will get a nice and clean expression. Whitespaces are also automatically trimmed.
How to use this tool depends on your data and knowledge of it. Some level of understanding of regular expressions is necessary. URL Regexator is simple automation for more advanced users. You should know at least a little bit about what you are doing and how it all works.
Basic and Advanced Customization
You can customize your final regex with two select buttons for ignoring protocols (http:// and https://) at the beginning and trailing slashes at the end of your URLs.
Generally, protocols are not a very useful and necessary thing for matching. With trailing slashes, it depends on the specific situation. It’s up to you.
By default, both options are inactive. If you want to use them, you must check them before clicking on any Regex creation button.
Wild card matching with end
At this moment, you can customize the end of your regex. By default ending characters are
.*. This means open-end and such regex will match any character. Let’s check a few examples.
^link\-brain\.com\/page.* will match all of the following URLs:
Exact matching with end
You can change
$. A dollar sign is an anchor used at the end of the regex. It marks the end of the string. And only URLs with an exact match will be selected.
Let’s change the example regex
^link\-brain\.com\/page$. From the URLs above, only one will be matched:
Exact matching with start
The final character to explain is also an anchor. Caret
^ marks the beginning of the string and matching. Strings with different start won’t be matched. Let’s use the open-ended regex again
^link\-brain\.com\/page.* and slightly modify some of the URLs.
Wild card matching with start
You can also replace caret with
.* to perform wild card match. Let’s check a quick example with following regex
What Are The Limits
Theoretically, there are no limits. But it’s a good idea to use this tool for approximately 1.000 URLs. With 10.000 URLs, it will be noticeably slower. Above these limits, every task can take up to 60 seconds and more.
If you need to process more data, you should try using Excel, Google Sheets, or some other table processor. URL Regexator is suitable for up to 1.000-10.000 URLs. And it’s good to notice that Google Analytics also has some limits and 1.000 URLs in a regex is quite a lot, and there is a good chance it won’t be handled well by GA.