## Extract or remove text or number from a string

### Extract anything before or after the 1^{st} occurrence of a character

To extract anything ** before** the 1

^{st}occurrence of a character, input this formula:

```
LEFT(A2,FIND("[specific-character]",A2)-1)
```

For instance, use the following formula to extract anything ** before** the first space:

`LEFT(A2,FIND(" ",A2)-1)`

To extract anything

**the 1st occurrence of a character, input this formula:**

__after__```
RIGHT(A2,LEN(A2)-FIND("[specific-character]",A2))
```

In human language, the above formula says starting from the right in cell A2, calculate the full length of cell A2 and subtract (or remove) whatever you find before a specific character, including that character, to the whole content of A2.

For instance, the following formula can be used to extract anything ** after** the first space:

`RIGHT(A2,LEN(A2)-FIND(" ",A2))`

### Extract anything before or after the 2^{nd} occurrence of a character

To extract anything ** before** the 2

^{nd}occurrence of a character, input this formula:

```
LEFT(A2,FIND("#",SUBSTITUTE(A2,"[specific-character]","#",2))-1)
```

For instance, use the following formula to extract anything ** before** the second space:

`LEFT(A2,FIND("#",SUBSTITUTE(A2," ","#",2))-1)`

One thing that is important to mention here is that the number 2 represents the __second __** occurrence** of the specific character. It can be replaced with the desired occurrence number. For instance, to extract before the 3rd occurrence of the character, replace 2 by 3.

To extract anything ** after** the 2nd occurrence of a character, input this formula:

```
TRIM(MID(A2, FIND("[specific-character]",A2,FIND("[specific-character]", A2)+1)+1,256))
```

For instance, use the following formula to extract anything ** after** the second space:

`TRIM(MID(A2, FIND(" ",A2,FIND(" ", A2)+1)+1,256))`

### Extract anything after the second to last occurrence of a character

You may want to fetch a folder path in a URL that is positioned ** after** the

**of a slash. In this case, input this formula:**

__second to last occurrence__```
TRIM(LEFT(RIGHT(SUBSTITUTE("[specific-character]"&A2,"[specific-character]",REPT(" ",99)),2*99),999))
```

For instance, use the following formula to extract anything ** after** the second to last space:

`TRIM(LEFT(RIGHT(SUBSTITUTE(" "&A2," ",REPT(" ",99)),2*99),999))`

### Extract anything before or after the 3rd occurrence of a character

To remove anything ** before** the 3

^{rd}occurrence of a character, input this formula:

```
LEFT(A2,FIND("#",SUBSTITUTE(A2,"[specific-character]","#",3))-1)
```

Note: the number 3 represents the third occurrence of the specific character. It can be replaced with the desired occurrence number. For instance, to extract before the 4th occurrence of the character, replace 3 by 4.

For instance, use the following formula to extract anything ** before** the third space:

`LEFT(A2,FIND("#",SUBSTITUTE(A2," ","#",3))-1)`

To remove anything after the 3rd occurrence of a character, input this formula:

```
TRIM(RIGHT(SUBSTITUTE(A2,"[specific-character]",CHAR(9),3),LEN(A2)-FIND(CHAR(9),SUBSTITUTE(A2,"[specific-character]",CHAR(9),3),1)+1))
```

For instance, use the following formula to extract anything ** after** the third space:

`TRIM(RIGHT(SUBSTITUTE(A2," ",CHAR(9),3),LEN(A2)-FIND(CHAR(9),SUBSTITUTE(A2," ",CHAR(9),3),1)+1))`

### Extract a piece of text located in the middle of a string

```
TRIM(MID(A2,FIND("[piece-of-text-located-right-before-the-wanted-bit]",A2)+LEN("[piece-of-text-located-right-before-the-wanted-bit]"),FIND("[piece-of-text-located-right-before-the-wanted-bit]",A2)-FIND("[piece-of-text-located-right-before-the-wanted-bit]",A2)-LEN("[piece-of-text-located-right-before-the-wanted-bit]")))
```

The formula works with text or URLs (see screenshot) but the data has to be consistent (ie. the piece ** before and after** the wanted bit must be the same for all instances).

For instance, use the following formula to extract the city within the H1 of a page:`TRIM(MID(A2,FIND("jobs in",A2)+LEN("jobs in"),FIND("available on",A2)-FIND("jobs in",A2)-LEN("jobs in")))`

Note the TRIM function was added at the beginning of the formula to ensure no additional space is returned.

### Extract text between the 2^{nd} and 3^{rd} occurrence of a character

```
TRIM(MID(SUBSTITUTE(A2,"[specific-character]",REPT(" ",100)),200,100))
```

For instance, use the following formula to extract anything between the 2nd and the 3rd space:`TRIM(MID(SUBSTITUTE(A2," ",REPT(" ",100)),200,100))`

### Extract anything after the last instance of a character

```
MID(A2,FIND("=",SUBSTITUTE(A2,"[specific-character]","=",LEN(A2)-LEN(SUBSTITUTE(A2,"[specific-character]",""))))+1,256)
```

For instance, use the following formula to extract anything after the last space:`MID(A2,FIND("=",SUBSTITUTE(A2," ","=",LEN(A2)-LEN(SUBSTITUTE(A2," ",""))))+1,256)`

### Remove duplicates from one or multiple columns and return ONLY a unique list of values in a specific column

Removing duplicates is something we do on a daily basis. Commonly, we use the ‘Remove duplicates’ feature available from the ‘Data tools’ option in the ribbon. However, that can be time consuming, especially when dealing with multiple columns. Thankfully, there’s a =UNIQUE function that does the job brilliantly on Google Sheets and the beta version of Excel: just insert **=UNIQUE(targeted_range)** and Voila! But, what about a formula that does the same task on the basic version of Excel?

Starting with the one column version (column A used as default, with data from row 2 to row 8), the ** array** formula to input is:

```
INDIRECT(TEXT(MIN(IF(($A$2:$A$8<>"")*(COUNTIF($B$1:B1,$A$2:$A$8)=0),ROW($2:$8)*100+COLUMN($A:$A),7^8)),"R0C00"),)&""
```

Breakdown of the formula:

- $A$2:$A$8: selection of the whole column where the values you want to clean are
- $B$1:B1: header (or top cell) of the column where you want your unique value list to be returned
- $2:$8: first and last row number where all the values you want to clean are
- $A:$C: first and last column letter where all the values you want to clean are
- As this is an array formula, press CTRL+SHIFT+ENTER instead of ENTER to insert the curly braces

If you’d like a unique list of all values containing within a range of cells (across multiple columns and rows), then input this ** array formula**:

```
INDIRECT(TEXT(MIN(IF(($A$2:$C$8<>"")*(COUNTIF($D$1:D1,$A$2:$C$8)=0),ROW($2:$8)*100+COLUMN($A:$C),7^8)),"R0C00"),)&""
```

Here is a screenshot representing how the two formulas can be used with data only in column A on the left and data spread across column A to C on the right, including a mixture of blank cells and non-blank cells.

And again, remember to press CTRL+SHIFT+ENTER instead of ENTER to insert the curly braces, as in any array formulas.

## Calculating the number of characters and words in a cell

### Count the number of characters (not words!) in a cell

```
LEN(cell)
```

For instance, use this formula to calculate the length of a page title (or meta description):`=LEN(A2)`

### Count the number of words (not characters!) in a cell

```
IF(LEN(TRIM(cell))=0,0,LEN(TRIM(cell))-LEN(SUBSTITUTE(cell," ",""))+1)
```

For instance, use this formula to calculate the number of words in a keyword:`=IF(LEN(TRIM(A2))=0,0,LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1)`

## Categorising cell content

### See if a cell contains *any* question words

```
SUMPRODUCT(COUNTIFS(A2,"*"&
```**[name you defined]**&"*"))>0

[name you defined] is a name created via the “Defined Names” feature that refers to a list of question words from another tab (ie. what, how to, etc).

Step 1: Create another tab on your Excel file

Step 2: Add all the question words you can think of in column A.

Step 3: Select (or highlight) all your question words

Step 4: Name the range as you want (in the example, we used q.words)

Step 5: Go back to the tab where the data you want to categorise is

Step 6: Copy/paste the formula given above and make sure you change the [name you defined] bit

Step 7: Press Enter

Troubleshooting Defined Names on Excel:

Is the formula returning an error or an incorrect value? The first thing to check is the Name Manager from the Defined Names section (assuming you already made sure the cell and the Defined name you are referring to in your formula are correct).

Go to the ‘Formulas’ tab, ‘Defined Names’ section and click on ‘Name Manager’.

Common mistakes to watch out for:

- Typos: check in the Name column to spot any typos in the Defined Names
- Duplication: check in the Name column to spot multiple instances of the same Defined Names referring to different ranges
- Blank cells: ensure the range selection does not include any blank cells by clicking on ‘Edit’. A new Edit Name window will pop-up with the range reference in the cell next to ‘Refers to:’ – ensure this is correct and does not include blank cells
- Incorrect reference: check in the Value column to quickly spot if the range is referring to the correct values
- Incorrect scope: check in the Scope column to see if the scope is correctly set on Workbook instead of a single sheet (which will not allow cross-tab references)

### Classify your values (e.g. “Low”, “Medium” and “High”)

```
IFS(cell>=[low-number-threshold],"Low",AND(cell<=MIN([low-number-threshold]),cell >MAX([high-number-threshold])),"Medium",cell<=[high-number-threshold],"High")
```

For instance, use the following formula to assess your ranking data:

`IFS(B2>=40,"Low",AND(B2<=MIN(40),B2>MAX(10)),"Medium",B2<=10,"High")`

## Working with blank cells in your Excel table

### Count the number of keywords displayed in a row which are mapped to a page

If your cells are adjacent, input:

```
COUNTA(first_cell:last_cell)
```

For instance, use the following formula to count the number of keywords mapped to a single URL:`COUNTA(A2:C2)`

If your cells are non-adjacent cells, select each targeted cell separated by a comma:

```
COUNTA(first_cell,second_cell,third_cell,etc)
```

For instance, use the following formula to count the number of keywords mapped to a single URL, disregarding the ranking position columns in between:`COUNTA(A2,C2,E2)`

### Return the first non-blank cell in a row

```
INDEX(first_cell:last_cell,MATCH(TRUE,INDEX((first_cell:last_cell<>0),0),0))
```

For instance, use the following formula to return the first city that appears in each row:`INDEX(A2:E2,MATCH(TRUE,INDEX((A2:E2<>0),0),0))`

### Fix non-blank cells that appear as blank cells

Sometimes, and especially when exporting data from a third-party tool, some cells will visually look blank but contain hidden characters. In order to make these hidden characters appear, simply follow the process below:

- On your Excel file, click the ‘File’ tab (Excel ribbon)
- Click on ‘Option’
- Click on ‘Advanced’ in the category list
- In the ‘Lotus Compatibility’ section
- Check ‘Transition Navigation Keys’
- Click ‘Ok'”

Hopefully these formulas are useful to you and help you process your data in a much faster, clearer and easier way. You can also find all formulas from this post and the previous one on a Googlesheet available here, something that was requested from multiple readers.

And of course, if you use any formulas that are more advanced or better composed, please let me know in the comments. I’d love to hear them!

## Emirhan Yasdiman

I love the simplicity in REPT(” “,99)

Brilliant!

## Marcin

Amazing job! Thank you Fiona :)

## Michael

Great article, thanks for sharing with us.

## Alex Buraks

Nice list. I would like to add my favourite formula – URL to Domain:

=IF(ISNUMBER(FIND(“www.”;A2));MID(A2;FIND(“www.”;A2)+4;FIND(“/”;A2;9)-FIND(“www.”;A2)-4);MID(A2;FIND(“//”;A2)+2;FIND(“/”;A2;9)-FIND(“//”;A2)-2))

## natan

I have URLs like below:

https://www.example.com/xx/yy/bat/text-text-text-10550/

and i wish to exclude only the ID in the end without – and trailing slash i.e. 10550. didn’t come across any specific formula? Any suggestions or workarounds?