How to Generate Datasets in Excel with Functions and Utilities
(Note: Suitable for users of Excel 2021 and Excel for Microsoft 365.)
Objective
Generate a ‘dummy’ dataset quickly for practice or training purposes using Excel functions and utilities.
How to Generate Datasets in Excel Explained
Finding good and, most importantly, free datasets to download from the web is not an easy task. Most datasets we find are either not relevant, are too complex, or require us to join a website or pay a fee before downloading them. We also need to be careful that we are not using data that someone else has painstakingly put together without their permission.
In my own experience as a trainer, my role requires me to constantly generate datasets to use in training sessions, demo videos, and blog posts. Indeed, you might need a dataset so that you can practice your Excel skills, create some documentation or run your own training sessions.
In this article, I will show you how I generate datasets in Excel quickly and efficiently using Excel functions: CHOOSE, RANDBETWEEN, RANDARRAY, IFS, DATE, and Excel utilities: Flash Fill and Paste Values. I will also show you a useful website you can use to generate random lists for free.
We are going to be generating a list of ‘dummy’ HR data that shows the Employee Name, Department, Hire Date, Job Title, and Salary.
NOTE: If you are an Excel for Microsoft 365 user, you have access to the new Data Types utility. This is very useful for generating datasets, but we won’t be focusing on this in this article as not everyone has access to it.
- Generating a List of Employee Names
- Generating a Department for Each Employee
- Generating Random Dates
- Generating a Job Title
- Generating a Salary
Related reads:
How to Split and Combine Data in Excel
Generating a List of Employee Names
In this example, we are going to generate a dataset for an HR Team. We will start by building a list of 50 Employees. If you are anything like me, you don’t to spend time thinking of and typing out a list of 50 random names.
Fortunately, there are many websites that offer an online Random Name Generator. We can choose how many names we want to generate, the gender and the name style.
- Select the Number of names from the drop-down.
- Choose the Gender from the drop-down.
- Choose the Name style from the drop-down.
- Click Generate random names.
- Select the names and press CTRL+C to copy.
- Go to your Excel spreadsheet and press CTRL+V to paste.
The formatting from the website will be pasted with the names, to remove this:
- Click the Paste Options button.
- Select Match Destination Formatting.
Now, we have our list of names but we need to remove the leading number. We can do this using Flash Fill.
- In column B, type the first name as you want it to appear.
- Press CTRL+Enter to stay in the same cell.
- From the Data tab, in the Data Tools group, click Flash Fill.
- Select column A.
- Right-click and choose Delete.
- Add the column heading ‘Employee Name’.
Generating a Department for Each Employee
Next, we need to generate a ‘Department’ for each employee in the list. I want to randomly assign 5 departments: Marketing, Sales, Finance, HR and IT. To do this, we can use a combination of two Excel functions: CHOOSE and RANDBETWEEN.
The CHOOSE function is a lookup function that allows us to choose a value from a list. The RANDBETWEEN function generates random numbers based on a top and bottom value that we specify. Combining these functions together will assign a number from 1-5 (the number of departments) randomly to each employee and then based on the number assigned will choose a department from the list.
This method ensures the departments are completely random and there are no obvious patterns.
- In the blank area of the spreadsheet, type the list of departments to assign.
- Click in the cell next to the first employee name.
- Type the following formula:
The RANDBETWEEN part of the formula will generate random numbers between 1 and 5. The top number should be equal to the number of items, in this case departments, in the list. The CHOOSE function will then assign a department based on the random number from 1 to 5 in the cell.
Marketing = 1, Sales = 2, Finance = 3, HR = 4 and IT = 5.
- Double-click to copy the formula down.
It’s always best with this type of formula, to copy and paste the values. This means the data won’t keep refreshing and changing and we can delete the list of departments.
- Select column B and press CTRL+C.
- Right-click and select Paste Values.
- Add the column heading ‘Department’.
Suggested reads:
How to Calculate Mean in Excel
How to Calculate Correlation Coefficient in Excel
How to Add Sparklines in Excel
Generating Random Dates
Next, we need to generate a column to show the ‘Hire Date’ of each employee. We are going to say that everyone was hired between 01/01/2000 and 01/01/2022. For this, we can use a combination of the RANDBETWEEN function and the DATE function.
- Click in the cell next to the first employee name.
- Type the following formula:
In this formula we set our date range 01/01/2000 – 01/01/2022 and then RANDBETWEEN assigns random dates that fall within that range to each employee.
- Double-click to copy the formula down.
- Select column C and press CTRL+C.
- Right-click and select Paste Values.
- Add the column heading ‘Hire Date.
- Apply Short Date Number Formatting to column C.
Generating a Job Title
Next, we need to generate a ‘Job Title’ for each Employee. We want the ‘Job Title’ to be reflective of the ‘Department’ the employee works in. To do this, we will use the IFS Function.
The IFS Function is a slightly quicker way of constructing a Nested IF’s formula. Both IF and IFS are logical functions in Excel that allow us to perform a test and get a result based on if the outcome of the test was TRUE or FALSE.
For example, if the ‘Department’ in cell B2 is equal to “HR” then return “HR Advisor”, etc. We need to add a logical test for each ‘Job Title’ we need to return.
- Double-click to copy the formula down.
- Select column C and press CTRL+C.
- Right-click and select Paste Values.
- Add the column heading ‘Job Title.
The ‘Job Titles’ should now accurately correspond to the ‘Department’.
Generating a Salary
Next, we need to generate a salary for each employee. The salaries will be between 20,000 and 50,000 and will be rounded to the nearest 100 multiple. We can use a combination of RANDARRAY, COUNTA and MROUND to do this.
- Click in the cell next to the first employee name.
- Type the following formula:
The RANDARRAY formula is similar to RANDBETWEEN but it allows us to specify how many rows and columns of number we need and if we want those numbers to be integers or decimals. We can use the COUNTA function to count the number of rows we need.
Notice that the Salaries are not formatted and not rounded to the nearest $100. Let’s fix that.
- Click on the first salary in column E.
- In the formula bar, add the MROUND function to round the values to multiples of 100.
- Select column E and press CTRL+C.
- Right-click and select Paste Values.
- Add the column heading ‘Salary’.
- Apply Currency Number Formatting.
So those are some of the techniques we can use to generate a completely unique dataset for use in training sessions, demonstrations and practice sessions.
Also read:
Ten Ways to Clean Data in Excel
Please visit our free resources center for more high-quality Excel and Microsoft Suite application guides.
Ready to dive deep into Excel? Click here for advanced Excel courses with in-depth training modules.
Simon Sez IT taught Excel and other business software for over ten years. You can access 150+ IT training courses for a low monthly fee.