SEO

Excel Formulas for SEOs: Extracting URL Strings

by on 27th August 2019

Whether you’re mapping keywords, auditing backlinks, analysing log files or just playing with data, an understanding of applicable Excel formulas can help you complete your work in half the time.

We’ve already published articles on how to use Vlookup and understanding Power Query and Power Pivot. This article will focus on manipulating URLs, providing ready-to-use Excel formulas with clear explanations on how these functions can work together.

Extract the domain name from URLs

How to extract the domain name from a URL including HTTP(S) and www

=LEFT(A2,FIND("/",A2,9)-1)

Starting from the LEFT side, look at a specific cell A2 and FIND the first instance of the slash (“/”) in the same cell A2 starting from the 9th character (this way, it omits the ‘http..://’ bit).

Optional: you can remove the last character from the returned value by adding -1 so the domain does not end with a slash.

How to extract the domain name from a URL not including HTTP(S)

=MID(A2,SEARCH("://",A2)+3,SEARCH("/",A2,9)-SEARCH("://",A2)-3)

Starting from the middle (MID) of cell A2, SEARCH the first occurrence of a specific character (here, “://”) in cell A2. Add +3 characters to the returned value from the search formula (this way, it omits the double trailing slash). The second step is to SEARCH for another specific character (here, “/”) in cell A2 starting from the 9th character (this way it omits the protocol bit and up to the first two slashes). Lastly, subtract () the returned value from a SEARCH for “://” in cell A2 and remove () the last three characters from the returned value.

Note: if you want your domain to finish with a trailing slash, change -3 to -2 at the end of the formula.

How to extract the domain name from a URL not including HTTP(S) and www

=MID(SUBSTITUTE(A2,"www.",""),SEARCH(":",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH(":",A2)-3)

Starting from the middle (MID) of a text string, SUBSTITUTE in cell A2 the “www” subdomain with nothing (“”). SEARCH the first occurrence of “:” in cell A2. Add +3 characters to the returned value (this way it omits the double trailing slash). Do a second SEARCH, this time looking for “/” within the returned value when SUBSTITUTE-ing in cell A2 the “www” subdomain with nothing (“”) and specify the search should start from the 9th character (to again omit the protocol bit and up to the first two slashes). Lastly, subtract () the returned value from a SEARCH for “://” in cell A2 and remove () the last 3 characters from the returned value.

Note: if you want your domain to finish with a trailing slash, change -3 to -2 at the end of the formula.

How to extract the website name from a URL

=LEFT(MID(SUBSTITUTE(A2,"www.",""),SEARCH("://",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH("://",A2)-3),FIND(".",MID(SUBSTITUTE(A2,"www.",""),SEARCH("://",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH("://",A2)-3))-1)

As the amount of functions in the above formula is quite large, you can simplify it by dividing it into two steps.

Step 1

The first step in column B cleans up the text so only the domain without the http(s) protocol and “www” subdomain is returned. The formula for this is:

=MID(SUBSTITUTE(A2,"www.",""),SEARCH(":",A2)+3,SEARCH("/",SUBSTITUTE(A2,"www.",""),9)-SEARCH(":",A2)-3)

Note: a detailed explanation of this formula is provided in the section above.

Step 2

The second step in column C is in charge of returning the text before the dot from the value returned from the first step. The formula is:

=LEFT(B2,FIND(".",B2)-1)

Starting from the LEFT side, look at a specific cell B2 and FIND the first instance of the slash (“.”) in the same cell B2. Subtract -1 character so it omits the dot.

Merge the two formulae by changing all references to B2 in the formula in step 2 by the full formula in step 1. Use this method whenever we mention ‘merge the two formulae’ throughout the article.

Extract or remove URL strings

How to extract the protocol only from a URL

=LEFT(A2,(FIND(":",A2)-1))

Starting from the LEFT in cell A2, FIND the character “:” in cell A2 and extract everything on the left of and not including “:” since -1 is added at the end.

How to extract the domain name extension from a URL

=MID(MID(SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))+3,SEARCH("/",SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),9)-SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))-3),FIND(".",MID(SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))+3,SEARCH("/",SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),9)-SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))-3)),LEN(MID(SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))+3,SEARCH("/",SUBSTITUTE(IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")),"www.",""),9)-SEARCH("://",IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/")))-3)))

Again, you can break the formula down into two steps to make it easier.

Step 1

The first step ensures all URLs in your list end with a trailing slash for consistency.

=IF(RIGHT(A2,1)="/",A2,CONCAT(A2,"/"))

IF starting from the RIGHT in cell A2 the first character (1) is “/”, then return the exact same value in cell A2. Otherwise, CONCAT the value in cell A2 and “/” so the returned value finished with a slash.

Step 2

The second step is in charge of returning the text between the last dot and slash from the value in cell B2.

=MID(MID(SUBSTITUTE(B2,"www.",""),SEARCH("://",B2)+3,SEARCH("/",SUBSTITUTE(B2,"www.",""),9)-SEARCH("://",B2)-3),FIND(".",MID(SUBSTITUTE(B2,"www.",""),SEARCH("://",B2)+3,SEARCH("/",SUBSTITUTE(B2,"www.",""),9)-SEARCH("://",B2)-3)),LEN(MID(SUBSTITUTE(B2,"www.",""),SEARCH("://",B2)+3,SEARCH("/",SUBSTITUTE(B2,"www.",""),9)-SEARCH("://",B2)-3))-2)

Using a mixture of MID, SUBSTITUTE, SEARCH, FIND and LEN functions, the formula from step 2 is able to extract the extension from the domain name – no matter what the domain name was originally (http and https, www and non-www versions, .com and .co.uk, ending with a trailing slash or not, etc).

Merge the two formulae.

How to see which URLs end with a trailing slash or not

=IF(RIGHT(A2,1)="/","Ends with a trailing slash","Does not end with a trailing slash")

IF starting from the RIGHT in cell A2 the first character (1) is “/”, then return “Ends with trailing slash”), or “Does not end with trailing slash”.

How to extract the URL path

=RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1)

Starting from the RIGHT in cell A2, calculate the length (LEN) of that entire cell A2 and subtract () a specific part, which you can FIND by looking for “/” in cell A2. Also FIND “/” in cell A2 that occurs for the second time (+2) and add +1 character to the returned value so the slash at the beginning of the URL path is included.

How to extract the first URL folder from a URL

=IFERROR(IF(RIGHT(IF(ISERROR(FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)),RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),LEFT(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)- FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1))),1)="/",LEFT(IF(ISERROR(FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)),RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),LEFT(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1))),LEN(IF(ISERROR(FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)),RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),LEFT(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1))))-1),IF(ISERROR(FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)),RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),LEFT(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)))),"-")

You can split this formula into two steps:

Step 1

The first step cleans up the text so only the URL folder is returned.

=IF(ISERROR(FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)),RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),LEFT(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)- FIND("/",A2,FIND("/",A2)+2)+1),FIND("/",RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),1)+1)))

Step 2

The second step ensures all values returned are consistent (here, not ending with a trailing slash). It also uses the IFERROR function to ensure that for all returned values resulting in an error (whatever the reason is), it returns a hyphen instead of #VALUE! or #N/A.

=IFERROR(IF(RIGHT(B2,1)="/",LEFT(B2,LEN(B2)-1),B2),"-")

Merge the two formulae.

How to extract the second URL folder from a URL

=IFERROR(IF(TRIM(MID(SUBSTITUTE(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),"/",REPT(" ",100)),200,100))="","-",TRIM(MID(SUBSTITUTE(RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1),"/",REPT(" ",100)),200,100))),"-")

Here’s how to split the formula:

Step 1

The first step fetches the entire URL path:

=RIGHT(A2,LEN(A2)-FIND("/",A2,FIND("/",A2)+2)+1

Note: a detailed explanation of this formula is provided in the “How to extract the URL path” section.

Step 2

The second step fetches only the second folder from the full URL path.

=IFERROR(IF(TRIM(MID(SUBSTITUTE(B2,"/",REPT(" ",100)),200,100))="","-",TRIM(MID(SUBSTITUTE(B2,"/",REPT(" ",100)),200,100))),"-")

The core formula is = TRIM(MID(SUBSTITUTE(B2,"/",REPT(" ",100)),200,100)). We added the IFERROR and IF functions in order to return a cleaner value:

  • IF makes sure the original URL is only a domain name with no path. The returned value is a hyphen instead of a blank cell.
  • IFERROR ensures that for all returned values resulting in an error (whatever the reason is), it returns a hyphen instead of #VALUE! or #N/A.

Merge the two formulae.

How to extract anything before or after a parameter in a URL

Extract text after the parameter in URL

=RIGHT(A2,LEN(A2)-FIND("?",A2))

Starting from the RIGHT in cell A2, calculate the total length (LEN) of the text in cell A2 and subtract () the text you FIND before (“?”) in cell A2 so only the text after the parameter is extracted.

Extract text before the parameter in URL

=IFERROR(RIGHT(A2,LEN(A2)-FIND("?",A2)),"No parameter in URL")

Starting from the LEFT in cell A2, extract the text you FIND from “?” in cell A2. Remove -1 character so the parameter itself is also excluded from the extracted text.

How to extract anything from second to last folder

=SUBSTITUTE(TRIM(LEFT(RIGHT(SUBSTITUTE("/"&IF(RIGHT(A2,1)="/",LEFT(A2,LEN(A2)-1),A2),"/",REPT(" ",99)),2*99),999))," ","/")

Using a mixture of SUBSTITUTE, TRIM, LEFT, RIGHT, IF and REPT functions, the formula can be broken down into three steps this time.

Step 1

The first step in column B cleans the full URLs so none of them ends with a trailing slash using:

=IF(RIGHT(A2,1)="/",LEFT(A2,LEN(A2)-1),A2)

Step 2

The second step in column C extracts anything in column B from the second to last folder using:

=TRIM(LEFT(RIGHT(SUBSTITUTE("/"&B2,"/",REPT(" ",99)),2*99),999))

Step 3

The third and last step is to replace blanks with slashes in column D using:

=SUBSTITUTE(C2," ","/")


Hopefully now you now have a better understanding of particular Excel formulae and how combining them can achieve SEO tasks much faster.

And if you use any formulas that are more advanced or better composed, please let me know in the comments. I’d love to hear them!

Responses

  1. Formulas like these make me wish MS excel would support regex

  2. These are great.

    I’ve bookmarked this page so I can come keep coming back to this – as a quick cheat sheet.

  3. Hi Fiona,
    This is a great and helpfull post. Especially for Tech SEO’s.

    Isn’t it possible to publish the whole Excel also with all the formula’s in it?

  4. Just dropping by to say that this is awesome! The subdomain extraction formula is beautiful and elegant – the lads at stackexchange are having some trouble with this stuff, might want to spread this post around different networks. Best one on the net.

Your comment

We're hiring – check out our careers page Careers

Get insights straight to your inbox

Stay one step ahead of the competition with our monthly Inner Circle email full of resources, industry developments and opinions from around the web.