The 6 MUST KNOW Statistical Distributions MADE EASY 4 13

👁 1 مشاهدة

The 6 MUST KNOW Statistical Distributions MADE EASY 4 13

النص الكامل للفيديو

If we were looking to understand the spread of player heights in the NBA, we may well use normal distribution where the continuous numerical height data from the population of NBA players is distributed symmetrically around the mean. In other words, 50% of the observations lie below the mean and the other 50% lie above the mean. The spread of the data in normal distribution is represented by the standard deviation. And this is all well and good, but not all not all data exists in continuous numeric form like height does. Not all data falls symmetrically around mean, and not always do we have the full population of data to work with. So, what do we do in those situations? In this video, we are going to cover the six most common distributions that you will come across. So, the normal distribution, which we have talked about already, the distribution, of which we can see several versions of on screen here, and we will talk about exactly what those all mean very soon. We will look at the binomial distribution and the related but simpler Bernoulli distribution. We will discuss the uniform distribution and we will take look at the very interesting Poisson distribution. So, let's do this. We have already discussed the key features of the normal distribution. So, what about this other one that looked kind of similar and that said was called the distribution? Well, distribution is actually at its core very similar to the normal distribution. Just like normal distribution, it is symmetrical around the mean and the breadth of the curve is based on the deviation or the variance that exists within the data. In fact, the orange curve on screen here is normal distribution. So, what about the blue and green lines here? Let me explain. distributions are specifically designed to work with samples of data rather than full populations, and even more specifically, when sample size is small. In other words, in situations where we don't have much data. The shape of the distribution can become flatter and broader with these smaller sample sizes. And this is to take into account the extra uncertainty we have around our data in situations where we don't have lot of it. On screen here, we have two versions of distribution, one in blue and one in green. You can see that the green box has the text 1 DF and the blue box has the text 5 DF. DF here means the degrees of freedom. And in the context of distribution, this is referring specifically to the sample size that we have minus one. As the sample size and thus the degrees of freedom gets larger, in other words, we get our hands on larger sample of data, the distribution tends more towards the normal distribution. And this is because with larger sample, we're more certain around estimating the true population statistics. Conversely, when sample size is small, the distribution becomes much flatter, essentially saying that it is less confident when representing the true population that it comes from. So, the key takeaway here is that distribution is very similar to normal distribution. It just has functionality built in that helps it adapt to smaller sample sizes and the subsequent uncertainty of how well that sample represents reality. All right, let's keep moving and the next distribution that we are going to talk about is the binomial distribution. Which, looking at this, actually somewhat resembles the shape of normal distribution. But the main difference is that instead of plotting continuous data, it instead plots distribution of two possible outcomes. As an example of this, let's say we flipped coin 10 times. And for each set of 10 flips, we noted down how many heads we got. So, here on screen, we flipped 10 times and we got four heads. If we repeated this task many, many, many more times, and here I've got four more examples, but in reality, let's say we did this something like 1,000 more times, we could plot the outcomes onto chart like this with the number of heads from our set of 10 flips along the x-axis at the bottom there, ranging from zero or no heads all the way up to 10 or all heads. And if we plotted the proportion of times we saw each of the possible outcomes, if our coin was indeed fair, it would end up looking like this. And as you can see, in the vast majority of cases from our 10 flips, we would get either four, five, or six heads. Less frequently than that, we would get either three or seven heads from our 10 flips. And even rarer than that, we would see total of two or less or eight or more heads from our sets of 10 coin flips. So, that is the binomial distribution, very useful to know as it can help us understand the probability of binary outcome in an experiment that we run multiple times. Here, our experiment was measuring the probability of different outcomes of coin flips. And as an example, we could use this knowledge to assess whether coin was fair or not, as fair coin would stick to this distribution, and an unfair coin would not. Anyway, let's keep moving. And the next type of distribution that we will discuss is what is known as the Bernoulli distribution, which can actually just be thought of as special case of the binomial distribution that we just looked at. On the y-axis to the left, you can see that we're still measuring the probability of an outcome, but unlike the binomial distribution, here, instead of considering all possible outcomes across the x-axis, we're just considering two possible outcomes, so perhaps success or failure, yes or no, or true or false. Let's imagine super simple experiment where we wanted to understand how likely we were to roll six with dice. To figure this out, if we went and rolled dice many, many, many times and measured how many times we did indeed roll six, if we did this enough times, we should end up with probability of rolling six one out of every six times, or 16.7% and we should get probability of not rolling six, in other words, rolling one, two, three, four, or five five times out of six, or 83.3% of the time. So, that is the Bernoulli distribution. Quite simple one, but definitely worth knowing the name. Next comes the uniform distribution, which is distribution in which all outcomes are equally likely to occur. If we were to again think of our many, many dice rolls, and if we were to count up the frequency of each possible outcome, so us rolling one, two, three, four, five, or six, if this dice is fair, we should end up with uniform distribution where each of those outcomes has exactly the same probability of taking place. Now, finally, the last distribution that we are going to cover is an extremely interesting one, and that is the Poisson distribution. Now, the thing that probably stands out about what you are seeing here, and what makes it different from the distributions that we have discussed so far, is that this distribution is not symmetrical. And this is actually because the Poisson distribution is bounded between zero and infinity. And while this might sound little bit strange, it's actually very useful. The Poisson distribution describes the number of events or outcomes that occur during some fixed interval, most commonly time interval. So, let's say that we ran shop and we wanted to understand the distribution of the sales that we make per hour. Now, the spread of the data, or in other words, the shape of the distribution is all based on the expected number of events per time unit. For this particular distribution, have used an expected sales per hour value of 3.2. And this would result in these probability values for each potential number of sales per hour. So, reading from the left, this suggests that with an expected number of sales per hour of 3.2, there is 4.1% chance of getting zero sales in given hour period, 13% chance of getting one sale, 20.9% chance of two sales, and so on. These are also additive, so we could equally say that the likelihood of there being two or fewer sales in an hour is 38%, the sum of those three bars. In the same way, we could say there is likelihood of only 10.5% that we will see six or more sales in that period. And something like this could come in really, really handy when managing how many staff we need to be in the store at any point in time. It's also worth quickly mentioning again that this type of distribution is bounded by zero and infinity, and this is pretty important to know. While the probability of our shop making 10 sales in an hour here is only 0.1%, the distribution does carry on to infinity, with those probabilities just getting smaller and smaller. Now, with this all in mind, the thing that really drives the Poisson distribution is our expected number of sales per hour, and currently, as you can see, this has value of 3.2. If we were to change that value, so let's say based on historical data, we actually only expected two sales per hour, our distribution would shift accordingly. So, like say, that expected number of events is really what underpins this entire distribution. So, there you go, whirlwind tour of the six most commonly used distributions in data science and analytics. This is extremely important foundational knowledge as we keep progressing forward. In the next video, we will be running through one of the most impressive and fascinating concepts within statistics, the central limit theorem. cannot wait for this, so will see you there.
Probability Distributions Made Easy Top 3 to Know for Data Science Interviews 9:19

Probability Distributions Made Easy Top 3 to Know for Data Science Interviews

Emma Ding

12 مشاهدة · 3 jaar geleden

Probability Types of Distributions 7:24

Probability Types of Distributions

365 Data Science

497 مشاهدة · 7 jaar geleden

Skewness Right Left Symmetric Distribution Mean Median Mode With Boxplots Statistics 10:22

Skewness Right Left Symmetric Distribution Mean Median Mode With Boxplots Statistics

The Organic Chemistry Tutor

893 مشاهدة · 7 jaar geleden

The Shape of Data Distributions Crash Course Statistics 11:23

The Shape of Data Distributions Crash Course Statistics

CrashCourse

648 مشاهدة · 8 jaar geleden

The Normal Distribution Clearly Explained 5:13

The Normal Distribution Clearly Explained

StatQuest with Josh Starmer

2 مشاهدة · 8 jaar geleden

The Main Ideas behind Probability Distributions 5:15

The Main Ideas behind Probability Distributions

StatQuest with Josh Starmer

610 مشاهدة · 9 jaar geleden

Data Science Statistics Tutorial The Poisson Distribution 5:09

Data Science Statistics Tutorial The Poisson Distribution

365 Data Science

237 مشاهدة · 6 jaar geleden

The 5 Must Know Distributions for Data Scientists not what you think 8:27

The 5 Must Know Distributions for Data Scientists not what you think

ritvikmath

15 مشاهدة · 3 jaar geleden