Statistical Functions in Excel 2013 can address various statistical problems pertaining to Averages and Measures of Spread.
Excel 2013 retains some legacy Functions from prior versions but in many instances provides newer Statistical Functions which supersede the previous versions. Examples of this include Percentile, Percent Rank, Quartile and Rank Functions.
Excel features pertaining to Measure of Spread include the Interquartile Range, which is the difference between the 75th and 25th percentiles, and Standard Deviation Functions.
Excel 2013 includes two similar Functions for Standard Deviation, associated with Population and Sample Standard Deviation and also superseding prior Standard Deviation Functions.
Watch the free video here, transcripts for the entire video follow:
Do you need to learn Microsoft Excel 2013? Get 19 hours of Microsoft Excel 2013 training – click here.
Hello again and welcome back to our course on Excel 2013 Advanced. In the previous section we looked at averages for data using the average and average if function in Excel 2013. And in this section we’re going to start by looking at percentiles.
These changed in Excel 2010. If you’ve been using an older version of Excel you probably used the percentile function and very often people look at 25 percentiles, 50 percentiles, 75 percentiles and use these of ways of describing data sets. The straightforward use of the percentile function, the legacy percentile function if you like is still available in Excel 2013. Basically if you say percentile, define the data set, so in our case that will be our named Policy Sales Orlando data set, and then you define as a fraction in the range zero to one the percentile that you’re interested in. So let’s suppose you wanted the 25th percentile. You’d put in there 0.25 and what that tells you is the value below which 25% of the data occur. So in our particular case for the whole four year data set the 25th percentile is 86.
Now in Excel 2013 you are discouraged from using the percentile function and the function to use which has replaced the percentile function is either PERCENTILE.INC or PERCENTILE.EXC. Now largely speaking these two functions produce the same results but there is an important subtle difference between them. PERCENTILE.INC is an inclusive function whereby for any value of K in the range zero to one you will get a result. PERCENTILE.EXC actually throws an error if you use a value of K which is outside the valid range for the data set that you’re working with. Now this is actually a little bit tricky to explain but I’ll try to do so.
Let’s suppose that you’ve got a data set with just ten items of data in it. If you put them in order then the first data item will effectively be your tenth percentile because it will mean that 10% of your data, one out of ten, is equal to or below that value. If you try to find a five percentile it’s a bit of a ridiculous question really because you’ve only got ten data items so you can’t find a value below which 5% of your data exists. Now in that particular case PERCENTILE.INC will just do its best to find you a value, PERCENTILE.EXC will give you an error. The difference is really summarized in this little statement down here about PERCENTILE.EXC. PERCENTILE.EXC will interpolate when the value for the specified percentile lies between two values in the array. If it cannot interpolate for the percentile case specified Excel will return an error.
So what I’ve done here is to replace the old percentile function here with PERCENTILE.INC. I’ve also actually put in 50th percentile which is the median. Note again the PERCENTIL.INC function and the 75th percentile.
Now as I mentioned earlier there are other functions in Excel that have been superseded by newer versions. When you’re doing things like percent rank there’s now a pair of functions, PERCENTRANK.EXC and PERCENTRANK.INC. There are quartile functions that now represented by QUARTILE.EXC, QUARTILE.INC’s. And then other things such as ranking which have always been a little bit of a problem in Excel where you have equal scores have been replaced by RANK.AVG and RANK.EQ functions. Well worth you reading up on these if you use the statistical functions in Excel.
Let’s now look at measures of spread in Excel 2013. One of the most commonly used measures of spread is the interquartile range which is basically the difference between the upper quartile and the lower quartile; that is the 75th percentile and the 25th percentile. So a very straightforward calculation here. All I’d need to do is to say that that is C61 minus C59 and my interquartile range is 53.
Now when it comes to measuring the spread in a sample by far the most important measure is the standard deviation. And in Excel 2013 again we’ve got a newer version of function that you may have used in the past for calculating standard deviation and again there’s a little bit of a sort of technical complexity here. If you’re not very familiar with statistics this may also be a little bit of a difficult one to understand but again I’ll try to explain it. If you have a set of numbers and you calculate the standard deviation it is a measure of how much those numbers are spread out. A large standard deviation indicates that you have a very spread out set of numbers.
Now there are two ways of calculating standard deviation, two formulae that are slightly different. There is a population standard deviation calculation. That’s the function STDEV.P and there is a sample standard deviation calculation that’s STDEV.S. These replace the earlier functions STDEVP and STDEV. Now the principle of the P version is that the numbers you’re calculating the standard deviation of are all the numbers you’ve got there. They’re the whole population. That’s it. You’re not going to use that standard deviation for anything else. You just want to know what the standard deviation is. If you use the S version you’re calculating the standard deviation of those numbers but you’re going to use it as an estimate of the standard deviation of the population from which those numbers are taken. So if for example you were taking measurements say of the heights of people. Let’s suppose you were trying to find the heights of all the women in Orlando. If you stopped 100 women in the street and measured their heights and you got an average height and a standard deviation of heights and wanted to use those to estimate the overall distribution of heights of all the women that live in Orlando then you would use the S version because you’re going to use that standard deviation as an estimate of the population standard deviation. Now a statistician would probably need a slightly more rigorous explanation than that but that’s basically what it is.
Now when I do a standard deviation calculation here I need to be clear about which of those I’m doing but what I’m going to do in this case is just do this calculation as though this set of numbers of policies sold per month is my whole population. I’m not going to use it as an estimate. So the formula I put in will be Equals ST, it’s STDEV.P. Note the two functions at the bottom there, STDEV and STDEVP which are the old ones. There’s little warning signs on there are saying basically don’t use these, use the new versions. The old ones there are kept there for compatibility with older versions. We’re going to use STDEV.P and then all we have to do is to put in our range which as usual, Policy Sales Orlando, tick it, and the standard deviation is 34.
And then finally in this section let’s have a quick look at how we might use these kinds of descriptive statistics. I’ve now put on the figures for the equivalent sales and inquiries for the Tampa branch of the insurance company. If you look at the numbers there are massive numbers. You can generally see that the Tampa ones tend to be quite a bit lower than the Orlando ones but to get a better handle on exactly how much lower if you go down here I’ve basically done the equivalent calculations. If you look at the minimum and maximum you can see that the minimum for the Tampa figures of policies sales is 10 compared to 57 for Orlando, so very much less business in some cases at least in Tampa. And the maximum there is 116; again much lower than the maximum for Orlando. If you look at the mean you can see that the mean number of sales in Tampa is only roughly half, just over half of the mean sales for Orlando. And similarly if I look at the mean sales for 2012 they’re only just over half the values for Orlando.
So as a straightforward rule of thumb sort of calculation you can see how that easily tells you that the Tampa figures are just over half the Orlando figures. That’s a much better way of summarizing the situation then you might be able to achieve by looking at the long list of numbers that we started with. Similarly if you look at the spread figures you can see that the percentiles are much lower. The interquartile range is just over half, so just over half the spread but the numbers are only just over half as big and the standard deviation for the Tampa figures is significantly lower than the one for the Orlando figures. So without going into a lot of statistical analysis you can see how those summary figures give you a really good, clear snapshot of the differences between the performance at the two offices.
Okay so that’s it for statistical functions used for description. In the next section we’re going to look at statistical functions used for prediction. Please join me for that.