Measures of Variability Range Standard Deviation Variance

Daniel Storage · ⏱ 9:30 · 484.5K مشاهدة · 6 years ago

النص الكامل للفيديو

in this video we're going to learn about measures of variability another form of descriptive statistics that people often want to know in addition to measures of central tendency but before we get to any of the nitty-gritty details wanted to kind of motivate why we need measures of variability with two examples so here's two different datasets one on the top and one on the bottom I'll just go ahead and tell you that the mean for both datasets is 87 now if were to just tell you the mean of these data would be misleading you little bit because in reality the situation in each dataset is quite different if were to plot it out for example you would see this difference clearly in the top dataset all the scores are very clustered together everything is close but in the bottom dataset scores are very spread out so again need some way to quantify these differences and measure of central tendency like the mean simply can't capture that alone here's another example let's say you're working for pharmaceutical company something like that and you need to decide between two different medications for depression we'll call them medication and medication so let's say you did study where you measured how much improvement happened when people took one over the other and this is what you got so let's say over here that higher scores mean you know more improvement and lower scores mean little to no improvement well let's kind of compare the means in this case are the same in both cases people improved by about 10-ish points or so but the variability is very different on the left some people benefited very greatly whereas others really didn't benefit at all but on the right everyone benefits good amount in this case would personally pick medication because it's more consistent and so this is an example of why knowing the variability might help us to you know make some real-life decisions so in general and statistics measures of variability are ways to describe these differences statistically they describe how scores in given data set differ from one another and they capture things like how spread out or how clustered together the points are things we've been looking at already so there are three that we're going to talk about we have the range standard deviation and variance let's start with the range the range is nice because it's really simple measure of variability of dispersion of how spread out points are it can often be calculated in five or ten seconds here's the formula so we have the range are don't get confused later on when we learn about correlations which are often also described by our we'll use some different subscripts to make that difference clear when the time comes but for now ranges are and then we have equals minus means the highest score in the data set means the lowest score in the data set so you can see that this is very simple calculation and if we go back to the example we were working with minute ago we can calculate the range very quickly so for the first data set we have 95 minus 80 so the range is 15 and in the second data set we have hundred and fifty minus 25 giving us much larger range of hundred and twenty-five so in this case would do well to kind of report both to you I'll tell you the mean and this measure of variability because that gives you more full picture of what's going on so mean of 87 and range of 15 describes very different situation compared to mean of 87 and range of 125 so again it's great idea for me to report both and this is what's often done big limitation of the range though is that by using it even though it's simple and it's pretty effective you might miss little bit of the data little bit of the information in your data set and let me show you an example to illustrate here's the data set here although these bars are quite high there's really just one sort of value in each bar so we have one person who scored thirty one person who scored forty and so on another range here is 120 it's hundred fifty minus thirty but let's look at second data set in this case the range is still one hundred and twenty because our highest and lowest values are the same but everybody's kind of over here and there's just couple outliers beyond that so again if were to just tell you the range might be misleading you little bit because you're not sure if it looks like this on the left or if the data looks like this on the right and this is where standard deviation and variance come into play standard deviation just like the name suggests describes the standard or typical amount that scores deviate from the mean hence standard deviation now we'll get into exactly what this looks like once we learn to calculate standard deviation but just want to show you some symbols for now so like with means we have different symbols to describe population standard deviation versus sample standard deviation population standard deviation is described by Sigma this sort of with Elvis hair like to think of it as not to be confused with this Sigma which is capital unfortunately they're named the same thing which means take the sum of we learned about that previously this is Sigma with little so for sample standard deviation is simply described by so want to take step back and talk about why standard deviations are really useful whenever you have normal curve normally distributed set of data which is very common in the world things like height weight and so on are all normally distributed standard deviations have this really interesting property of telling you lot of information about what's common and what's uncommon so if we have zero this is right at the mean of whatever we're talking about right this is the mean 0 standard deviations away from the mean is right here you're right at the mean we can look at one standard deviation above the mean and one standard deviation below and we automatically know just because of how standard deviations work that 68% of people will fall within this range we can go beyond that we know that between two standard deviations in either direction of the mean ninety-five percent of people will be contained and three you're getting really extreme really far out really rare ninety-nine point seven percent of the data will be contained within three standard deviations in either direction from the mean to illustrate this little bit more let's talk some specifics so let's say I'm looking at IQ scores we know lot about IQ scores we know for example the population mean of IQ is 100 and we know that the population standard deviation Sigma is 15 so let's go ahead and draw that same sort of normal curve we know that intelligence is normally distributed and let's kind of take look at what information we have just by knowing standard deviation so average IQ is right here at hundred one standard deviation above the mean would be 115 two standard deviations above the mean would be 130 and three standard deviations would be hundred and forty five and we could do the same in the opposite direction one standard deviation below the mean of intelligence is eighty five two standard deviations below is seventy and three standard deviations below the mean of intelligence is fifty five so again automatically know 68% of people will fall between an IQ of 85 and 115 also know that ninety-five percent of people will fall between an IQ of 70 and 130 and finally that ninety-nine point seven or so will fall between an IQ of 55 and 145 so this is great to know because if you tell me you have an IQ of 146 I'm really impressed this is rare this is very extreme but if you tell me you have an IQ of say you know hundred and six something like that you know that's fine good for you not very impressed right so knowing standard deviations helps you to kind of get this extra information about data set so finally we have variance variance is very simple it's just the square of standard deviation so it's the average squared deviation from the mean unfortunately for variance it doesn't get its own symbols we just take the symbols we already have for standard deviation and we put squared because it's just squared standard deviation so here for population we would call the variance in population Sigma squared and for sample we would call the sample variance squared so in the next video we'll learn how to calculate some of these things but want to at least highlight some of the formulas you're going to see so we have four different formulas because we have standard deviation and variance and we have the population versions and the statistic for sample versions so for standard deviation in the population this is our formula notice we have Sigma on the left and we have all this mess which I'll get into next time one thing I'll mention is that for all of these formulas the numerator is called the sums of squares SS and we're gonna learn about what the sums of squares really means in the next video but for now just keep that in mind so for our sample statistic we have this you're gonna see an on the left here and it's gonna have some similarities but you're gonna notice difference or two that we'll talk about in the next video for variance we have Sigma squared and for sample statistic version of variance we have squared