Question 1

What is URL Regexator

Accepted Answer

URL Regexator is a tool for creating regex from multiple URLs. It will help you with creating custom segments in Google Analytics and analyzing website data. Using Regexator is much faster than creating custom spreadsheets in Excel or Google Sheets with many tricky functions.

Question 2

How Does It Work

Accepted Answer

Input dataset of URLs or domains into the first text area. These will be used for creating your final regex. The optimal amount is approximately 1.000 rows. Then, based on your data, choose from two options, how to create your regex.

Regex from dataset with different domains

A button Regex (different domains) will create an essential, longer, and less efficient regular expression. On the other hand, it will be much more foolproof and more reliable to use. Let’s see an example with four URLs, four different subdomains, four different paths, one domain name, and 109 characters on input.

www.link-brain.com/page
test.link-brain.com/
link-brain.com/page=123
link-brain.com/pagespeed

Final regex will be 137 characters long. You can see that there are some parts repeated over and over again, wasting a lot of space. Specifically, it’s the domain. This type of expression is best used for multiple different domains.
^www\.link\-brain\.com\/page.*|^test\.link\-brain\.com\/contact.*|^alpha\.link\-brain\.com\/resources.*|^omega\.link\-brain\.com\/tools.*

Regex from dataset with the same domain

The second option triggered with a button Regex (same domain) is useful for a dataset with URLs from one single domain. It’s more efficient and saves a lot of characters. Unfortunately, it’s more tricky, and you should be pretty damn sure you know precisely what you are doing and have superb knowledge of your data.

First of all, you need to specify your domain name in the text box below the button. It’s necessary since there is no way to detect the domain with 100 %. And there are several ways to specify the domain. We’ll get to that later. For now, let’s input the same dataset of URLs as before, type link-brain.com into the domain detection field, and check the result. ^(www\.|test\.|alpha\.|omega\.)link\-brain\.com(\/page|\/contact|\/resources|\/tools).*

It’s only 87 characters long and looks a bit more compact. There is still some room for improvement. In the text box, you can specify more or fewer details. Type dot before a domain and/or trailing slash after. It can further reduce the final length of the expression. But it will also influence the regex matching logic. Let’s check other quick examples.

Empty rows, empty groups, whitespaces

All empty rows will be automatically ignored. And all empty groups as well. For example, instead of regex with an empty group, you will get a nice and clean expression. Whitespaces are also automatically trimmed.

Final tip

How to use this tool depends on your data and knowledge of it. Some level of understanding of regular expressions is necessary. URL Regexator is simple automation for more advanced users. You should know at least a little bit about what you are doing and how it all works.

Question 3

Basic and Advanced Customization

Accepted Answer

You can customize your final regex with two select buttons for ignoring protocols (http:// and https://) at the beginning and trailing slashes at the end of your URLs.

Generally, protocols are not a very useful and necessary thing for matching. With trailing slashes, it depends on the specific situation. It’s up to you.

By default, both options are inactive. If you want to use them, you must check them before clicking on any Regex creation button.

Wild card matching with end `.*`

At this moment, you can customize the end of your regex. By default ending characters are .*. This means open-end and such regex will match any character. Let’s check a few examples.

Regex ^link\-brain\.com\/page.* will match all of the following URLs:

link-brain.com/page
link-brain.com/page/
link-brain.com/page=123
link-brain.com/pagespeed

Exact matching with end `$`

You can change .* to $. A dollar sign is an anchor used at the end of the regex. It marks the end of the string. And only URLs with an exact match will be selected.

Let’s change the example regex ^link\-brain\.com\/page$. From the URLs above, only one will be matched:

link-brain.com/page
link-brain.com/page/
link-brain.com/page=123
link-brain.com/pagespeed

Exact matching with start `^`

The final character to explain is also an anchor. Caret ^ marks the beginning of the string and matching. Strings with different start won’t be matched. Let’s use the open-ended regex again ^link\-brain\.com\/page.* and slightly modify some of the URLs.

link-brain.com/page
www.link-brain.com/page/
link-brain.com/page=123
brain.com/pagespeed

Wild card matching with start `.*`

You can also replace caret with .* to perform wild card match. Let’s check a quick example with following regex .*link\-brain\.com\/page.*.

link-brain.com/page
www.link-brain.com/page/
link-brain.com/page=123
brain.com/pagespeed

Question 4

What Are The Limits

Accepted Answer

Theoretically, there are no limits. But it’s a good idea to use this tool for approximately 1.000 URLs. With 10.000 URLs, it will be noticeably slower. Above these limits, every task can take up to 60 seconds and more.

If you need to process more data, you should try using Excel, Google Sheets, or some other table processor. URL Regexator is suitable for up to 1.000-10.000 URLs. And it’s good to notice that Google Analytics also has some limits and 1.000 URLs in a regex is quite a lot, and there is a good chance it won’t be handled well by GA.

Question 5

Privacy Policy

Accepted Answer

This tool does not gather personal data, inputs or outputs. Usage is fully anonymous, free of charge and free of ads.

Question 6

Contact

Accepted Answer

info@entrop.ee

URL Regexator

URL Input

Regex Output

Create Regex

Regex Customization

FAQ

What is URL Regexator

How Does It Work

Regex from dataset with different domains

Regex from dataset with the same domain

Empty rows, empty groups, whitespaces

Final tip

Basic and Advanced Customization

Wild card matching with end `.*`

Exact matching with end `$`

Exact matching with start `^`

Wild card matching with start `.*`

What Are The Limits

Privacy Policy

Contact

URL Regexator

URL Input

Regex Output

Create Regex

Regex Customization

FAQ

What is URL Regexator

How Does It Work

Regex from dataset with different domains

Regex from dataset with the same domain

Empty rows, empty groups, whitespaces

Final tip

Basic and Advanced Customization

Wild card matching with end .*

Exact matching with end $

Exact matching with start ^

Wild card matching with start .*

What Are The Limits

Privacy Policy

Contact

Wild card matching with end `.*`

Exact matching with end `$`

Exact matching with start `^`

Wild card matching with start `.*`