What do you do when you have two or more data sets that need merging, but don’t have a single set of data column values that match?
VLOOKUP becomes a bit useless, or at least you need to execute some wild Excel-fu to be able to consolidate your data.
I faced this problem when dealing with two large sets of product data, one from a manufacturer and one scraped from the web. The objective with the work was to consolidate the two sets of data ready to rewrite the product descriptions, perform a basic price-competitor analysis and upload the finalised dataset as individual product items to a Woocommerce retail site.
When VLOOKUP won’t work, some Excel users might resort to manually matching the data. Laborious, time-consuming and costly, as usually you’ll find yourself outsourcing with varied results.
Enter the Fuzzy Lookup Add-In for Excel
The Fuzzy Lookup add-in for Excel performs fuzzy matching of textual data in Excel.
You can use this add-in to cleanup difficult problems like weeding out (“fuzzy match”) duplicate rows within a single table where the duplicates *are* duplicates but don’t match exactly or to “fuzzy join” similar rows between two different tables.
The add-in is remarkably powerful, especially to a user who’s only used to matching functions like VLOOKUP.
Let’s take a look at my basic example to get you started. Really, this outcome could be achived by some simple find/replace editing but hopefully it’s enough for you to see the potential power in this process. Hopefully you’ll have a go yourself!
Firstly, install the add-in by following these instructions. When you’re done, head back to this post!
To prepare, you’ll need your two (or three, or four!) separate data sets in separate tabs. I make each one a table, by selecting the sheet and pressing CTRL-L on the data.
You’ll note a Fuzzy Lookup tab has appeared in the toolbar across the top of your Excel application. Clicking the tab opens this Fuzzy Lookup task pane:
The process to set up a match requires you to select one or more data points from each table to create a “fuzzy data binding”. In short, match rows by identifying similar matches between these columns.
Can you see why that’s so powerful now? None of the data items need to be exactly the same, just similar enough, or more similar than any other possible matches in the data set.
So, in my case something like this should be enough:
While some of the column headers have the same name, the actual data is not exactly the same!
When you’re happy with your selection, click the little button between the selected data points. The match is then added here:
In the Output Columns section, check the box next to each field you want included in your results. Open a new tab (or fuzzy match will dutifully scribble all over your data!)
Click Go when you’re ready.
In my simple example I’ve matched a list of similar products that all have a “unique SKU” because of the size variable added to the product data.
The results can be a bit fiddly to get right at first, and you should experiment with the match table, number of matches settings and similarity score to get to a point where you’re comfortable.
Fortunately, there’s an undo button in the tool, so you can rinse and repeat until you’re happy.