Excel Formulas for SEOs: Extracting URL Strings

Whether you’re mapping keywords, auditing backlinks, analysing log files or just playing with data, an understanding of applicable Excel formulas can help you complete your work in half the time.

We’ve already published articles on how to use Vlookup and understanding Power Query and Power Pivot. This article will focus on manipulating URLs, providing ready-to-use Excel formulas with clear explanations on how these functions can work together.

Extract the domain name from URLs

How to extract the domain name from a URL including HTTP(S) and www

=LEFT(A2,FIND("/",A2,9)-1)

Starting from the LEFT side, look at a specific cell A2 and FIND the first instance of the slash (“/”) in the same cell A2 starting from the 9th character (this way, it omits the ‘http..://’ bit).

Optional: you can remove the last character from the returned value by adding -1 so the domain does not end with a slash.

How to extract the domain name from a URL not including HTTP(S)

=MID(A2,SEARCH("://",A2)+3,SEARCH("/",A2,9)-SEARCH("://",A2)-3)

Starting from the middle (MID) of cell A2, SEARCH the first occurrence of a specific character (here, “://”) in cell A2. Add +3 characters to the returned value from the search formula (this way, it omits the double trailing slash). The second step is to SEARCH for another specific character (here, “/”) in cell A2 starting from the 9th character (this way it omits the protocol bit and up to the first two slashes). Lastly, subtract () the returned value from a SEARCH for “://” in cell A2 and remove () the last three characters from the returned value.

Note: if you want your domain to finish with a trailing slash, change -3 to -2 at the end of the formula.

How to extract the domain name from a URL not including HTTP(S) and www

=MID(SUBSTITUTE(A2,"www.",""),SEARCH(":",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH(":",A2)-3)

Starting from the middle (MID) of a text string, SUBSTITUTE in cell A2 the “www” subdomain with nothing (“”). SEARCH the first occurrence of “:” in cell A2. Add +3 characters to the returned value (this way it omits the double trailing slash). Do a second SEARCH, this time looking for “/” within the returned value when SUBSTITUTE-ing in cell A2 the “www” subdomain with nothing (“”) and specify the search should start from the 9th character (to again omit the protocol bit and up to the first two slashes). Lastly, subtract () the returned value from a SEARCH for “://” in cell A2 and remove () the last 3 characters from the returned value.

Note: if you want your domain to finish with a trailing slash, change -3 to -2 at the end of the formula.

How to extract the website name from a URL

=LEFT(MID(SUBSTITUTE(A2,"www.",""),SEARCH("://",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH("://",A2)-3),FIND(".",MID(SUBSTITUTE(A2,"www.",""),SEARCH("://",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH("://",A2)-3))-1)

As the amount of functions in the above formula is quite large, you can simplify it by dividing it into two steps.

Step 1

The first step in column B cleans up the text so only the domain without the http(s) protocol and “www” subdomain is returned. The formula for this is:

=MID(SUBSTITUTE(A2,"www.",""),SEARCH(":",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH(":",A2)-3)

Note: a detailed explanation of this formula is provided in the section above.

Step 2

The second step in column C is in charge of returning the text before the dot from the value returned from the first step. The formula is:

=LEFT(B2,FIND(".",B2)-1)

Starting from the LEFT side, look at a specific cell B2 and FIND the first instance of the slash (“.”) in the same cell B2. Subtract -1 character so it omits the dot.

Merge the two formulae by changing all references to B2 in the formula in step 2 by the full formula in step 1. Use this method whenever we mention ‘merge the two formulae’ throughout the article.

Extract or remove URL strings

How to extract the protocol only from a URL

=LEFT(A2,(FIND(":",A2)-1))

Starting from the LEFT in cell A2, FIND the character “:” in cell A2 and extract everything on the left of and not including “:” since -1 is added at the end.

How to extract the domain name extension from a URL

=MID(MID(SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))+3,SEARCH("/",SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),9)-SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))-3),FIND(".",MID(SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))+3,SEARCH("/",SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),9)-SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))-3)),LEN(MID(SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))+3,SEARCH("/",SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),9)-SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))-3)))

Again, you can break the formula down into two steps to make it easier.

Step 1

The first step ensures all URLs in your list end with a trailing slash for consistency.

=IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/"))

IF starting from the RIGHT in cell A2 the first character (1) is “/”, then return the exact same value in cell A2. Otherwise, CONCAT the value in cell A2 and “/” so the returned value finished with a slash.

Step 2

The second step is in charge of returning the text between the last dot and slash from the value in cell B2.

=MID(MID(SUBSTITUTE(B2,"www.",""),SEARCH("://",B2)+3,SEARCH("/",SUBSTITUTE(B2,"www.",""),9)-SEARCH("://",B2)-3),FIND(".",MID(SUBSTITUTE(B2,"www.",""),SEARCH("://",B2)+3,SEARCH("/",SUBSTITUTE(B2,"www.",""),9)-SEARCH("://",B2)-3)),LEN(MID(SUBSTITUTE(B2,"www.",""),SEARCH("://",B2)+3,SEARCH("/",SUBSTITUTE(B2,"www.",""),9)-SEARCH("://",B2)-3))-2)

Using a mixture of MID, SUBSTITUTE, SEARCH, FIND and LEN functions, the formula from step 2 is able to extract the extension from the domain name – no matter what the domain name was originally (http and https, www and non-www versions, .com and .co.uk, ending with a trailing slash or not, etc).

Merge the two formulae.

How to see which URLs end with a trailing slash or not

=IF(RIGHT(A2,1)="/","Ends with a trailing slash","Does not end with a trailing slash")

IF starting from the RIGHT in cell A2 the first character (1) is “/”, then return “Ends with trailing slash”), or “Does not end with trailing slash”.

How to extract the URL path

=RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1)

Starting from the RIGHT in cell A2, calculate the length (LEN) of that entire cell A2 and subtract () a specific part, which you can FIND by looking for “/” in cell A2. Also FIND “/” in cell A2 that occurs for the second time (+2) and add +1 character to the returned value so the slash at the beginning of the URL path is included.

How to extract any URL folder from a URL

=IFERROR(LEFT(TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255)),FIND("/",TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255)))-1),TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255)))

Let’s break the formula into three steps to make it easier.

Goal example: Let’s say you want to fetch the third folder from a URL:

Step 1

Extract anything AFTER the fifth instance of a trailing slash “/”. This is because the fifth trailing slash in the full URL precedes the name of the third folder:

=TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255))

Step 2

From the above extraction, extract anything BEFORE the first occurrence of a trailing slash. Add an IFERROR function to ensure URLs that do not have a fourth folder and do not end with a trailing slash are not missed out. If that’s the case, the IFERROR function should return the exact same value returned in the previous step (this is showcased on row 5 on the screenshot below).

=IFERROR(LEFT(B3,FIND("/",B3)),B3)

Step 3

Merge it all together to only have one formula:

=IFERROR(LEFT(TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255)),FIND("/",TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255)))-1),TRIM(MID(A2,FIND("#",SUBSTITUTE(A2,"/","#",5))+1,255)))

This formula is valid to return *any* URL folder. Simply adjust the number ‘5’ coloured in blue in the formula with the desired number/occurrence.

To extract the first folder, replace “5” by “3”, to extract the second folder, replace “5” by “2”, to extract the fourth folder, replace “5” by “6”, and so on.

Note: for URLs that do not have a third  folder in our example, the formula will return a blank cell (see row 4):

How to extract anything before or after a parameter in a URL

Extract text after the parameter in URL

=RIGHT(A2,LEN(A2)-FIND("?",A2))

Starting from the RIGHT in cell A2, calculate the total length (LEN) of the text in cell A2 and subtract () the text you FIND before (“?”) in cell A2 so only the text after the parameter is extracted.

Extract text before the parameter in URL

=IFERROR(LEFT(A2,FIND("?",A2)-1),"No parameter in URL")

Starting from the LEFT in cell A2, extract the text you FIND from “?” in cell A2. Remove -1 character so the parameter itself is also excluded from the extracted text.

How to extract anything from second to last folder

=SUBSTITUTE(TRIM(LEFT(RIGHT(SUBSTITUTE("/"&IF(RIGHT(A2,1)="/",LEFT(A2,LEN(A2)-1),A2),"/",REPT(" ",99)),2*99),999))," ","/")

Using a mixture of SUBSTITUTE, TRIM, LEFT, RIGHT, IF and REPT functions, the formula can be broken down into three steps this time.

Step 1

The first step in column B cleans the full URLs so none of them ends with a trailing slash using:

=IF(RIGHT(A2,1)="/",LEFT(A2,LEN(A2)-1),A2)

Step 2

The second step in column C extracts anything in column B from the second to last folder using:

=TRIM(LEFT(RIGHT(SUBSTITUTE("/"&B2,"/",REPT(" ",99)),2*99),999))

Step 3

The third and last step is to replace blanks with slashes in column D using:

=SUBSTITUTE(C2," ","/")


Hopefully now you now have a better understanding of particular Excel formulae and how combining them can achieve SEO tasks much faster.

And if you use any formulas that are more advanced or better composed, please let me know in the comments. I’d love to hear them!

Comments are closed.


Join the Inner Circle

Industry leading insights direct to your inbox every month.