Ten Ways to Clean Data in Excel
This 10 Ways to Clean Data in Excel tutorial is suitable for users of Excel 2013/2016/2019 and Microsoft 365.
Objective
Use Excel’s functions and utilities to clean a raw dataset, removing errors and making it consistent ready for analysis.
Cleaning Data Explained
Data is the foundation of any analysis you do in Excel. If you receive a dataset from someone or maybe you download it from another system and import it in Excel, there are tons of things that can go wrong. Blank rows, inconsistent case, spelling errors, duplicates – all these need to be fixed before performing an analysis. If you do not take the time to clean data in Excel, you will achieve inconsistent and inaccurate results when you attempt to analyze it.
Excel has many tools, functions, and utilities that can help you clean your data efficiently. In this post, we are going to explore ten of them.
Video Tutorial
Removing Blank Rows and Populating Blank Cells
Removing Blank Rows
Blank cells and blank rows can cause havoc if left, particularly if you intend to analyze your dataset with a Pivot Table. If you have a large amount of data with blank rows and cells throughout, removing them can be tedious and time-consuming. Fortunately, Excel has a neat way of highlighting all blanks for you to remove them in one go.
- Select the data
- From the Home ribbon, in the Editing group, click Find & Select
- Select Go To Special from the menu
- Select Blanks
The blanks rows in the range selected will be highlighted. You can now remove them all.
- From the Home ribbon, in the Cells group, click the lower half of the Delete button
- Click Delete Sheet Rows
Populating Blank Cells
If your worksheet contains blank cells, you might want to populate them all at once with a value, as opposed to removing them. When you remove blank cells, Excel will shift the values to fill the space, which can throw off your dataset—populating the blanks with a value such as a ‘0’ can prove to be more beneficial.
- Select the data
- From the Home ribbon, in the Editing group, click Find & Select
- Select Go To Special from the menu
- Select Blanks
- Type ‘0’ and press CTRL+Enter
Removing Duplicate Entries
Duplicate entries can occur in large datasets. You will want to remove these before analysis.
- Select the data
- From the Data ribbon, in the Data Tools group, click Remove Duplicates
The checkboxes allow you to specify which fields you are searching for duplicates. If you want to remove complete duplicates, ensure all the checkboxes are selected.
Get Rid of Extra Spaces
Blank spaces in front of text or values in cells can cause issues when analyzing data. Sometimes, these erroneous spaces are hard to see, mainly if they are at the end of a text string or value. It is always worth checking for and removing extra spaces before analyzing. We can do this using a text function in Excel.
In this example, there are extra spaces in column B that we need to remove. To do this, I am going to insert a ‘Helper’ column to the right of column B.
- Select column C
- Right-click and select Insert
- Use the TRIM function to remove the extra spaces
- Use the Auto-fill handle to copy the formula down
We now need to remove column B. Deleting column B will produce an error because the TRIM formula in column C refers to column B. To get around this, we need to copy and paste the values only to ‘throw-away’ the underlying formulas.
- Select column C
- Press CTRL+C to Copy
- From the Home ribbon, in the Clipboard group, click the lower half of the Paste button
- Select Paste Values
You can now go ahead and safely delete column B.
Change the Case of Text
It is good to ensure that your case is consistent throughout your dataset. There are three functions in Excel, UPPER, LOWER, and PROPER that can help you switch the case to upper case, lower case, or proper case where the first letter of each word is capitalized.
In the example below we have inconsistent case in column B. To fix this, I am going to insert a ‘Helper’ column to the right of column B.
- Select column C
- Right-click and select Insert
- Use the PROPER, UPPER or LOWER function to change the case of the text.
- Use the Auto-fill handle to copy the formula down
Once again, you will need to copy and paste the values before deleting column B.
Find and Replace
You can utilize the Find and Replace feature in Excel to find specific values or text strings and replace them with something else.
In this example, I want to replace the word ‘United Kingdom’ in column A with the shortened form of ‘UK’.
- Select column A
- Press CTRL+F to show the Find and Replace dialog box
- On the Replace tab, type the word or value to replace in the ‘Find What’ field and the word of value to replace it within the ‘Replace With’ field.
- Click Replace All
Splitting Text
When you import data from a database or another file, it may be that the text is all in one cell. There are numerous functions and utilities available in Excel that can help you split your data into separate columns.
Text to Columns
The Text to Columns utility in Excel splits up text into columns using a delimiter. The delimiter is a special character such as space, tab, comma, semi-colon, etc. that denotes where you want the split to occur.
- Select the column you want to split
- From the Data tab, in the Data Tools group, select Text to Columns
- Step through the wizard, ensuring you select the correct delimiter for your data
Flash Fill
Flash Fill is another utility available in Excel 2013 and above that makes splitting data quick and easy. I am going to insert a ‘Helper’ column to the right of column B.
- Select column C
- Right-click and select Insert
- Type the text you want to separate out into the first cell
- From the Data tab, in the Data Tools group, click Flash Fill
- Repeat this process to separate the remaining text.
Joining Text Together
If you import data, you may find that text is in different columns that you’d like in one column.
In this example, I want to see the text ‘Kensington – Luxury’ in one cell. I can join the text in columns B and C by using the CONCAT function. This function takes the text in cell B2, the delimiter ( specified in quotations), and then the text in cell C2 and joins them together.
I could also use the Flash Fill method outlined previously to join values and text strings together.
Applying Number Formatting
It is essential to ensure that you have the correct formatting applied to each column in your dataset. For example, if you have a column that contains dates, and you do not have the column formatted as a short date or long date, the field will not display when adding a timeline slicer. That is just one example of how incorrect number formatting can cause issues at a future point in time.
- Select a column
- From the Home tab, in the Number group, click the Number Format drop-down menu.
- Select the correct format for the type of data in the column
Spell Check
Nothing lowers the credibility of your work more than spelling errors. Spelling errors can also cause problems when trying to analyze data. For example, if you have the word, ‘Kensington’ spelled correctly in one cell and incorrectly in another, Excel will see that as two separate items. This spelling issue is problematic when it comes to things like filtering data or analyzing with PivotTables.
- Press the keyboard shortcut F7 to run a spell check on your dataset, OR
- From the Review tab, in the Proofing group, select Spelling
Clear all Formatting
If you import data from an external source, it can come across with the formatting from the system or application it was created in. To start with a blank canvas so you can apply your formatting, you should clear any problematic formatting from the dataset.
- Select the data you want to remove the formatting from
- From the Home tab, in the Editing group, select the lower half of the Clear button
- Select Clear Formats
These are just ten of the utilities available in Excel to assist with cleaning data ready for analysis. For more tips related to cleaning data, please check out the following links:
Microsoft – Top Ten ways to clean your data
Software Advice – Techniques for Data Cleaning and Integration in Excel
Other Excel classes you might like:
- Excel Crash Course – Learn Pivot Tables in 1 Hour
- Protecting WorkBooks in Excel
- Logical Functions in Excel
- Designing Better Spreadsheets in Excel
- Introduction to Power Pivot & Power Query in Excel
To learn Excel with Simon Sez IT. Take a look at the Excel courses we have available.