Data Distributions A Comprehensive Guide with hands on python code

👁 1 مشاهدة

Data Distributions A Comprehensive Guide with hands on python code

bhupen · ⏱ 1:01:23 · 75 مشاهدة · 2 jaar geleden

النص الكامل للفيديو

hi everyone welcome to this video series on essential statistics primer for data science and machine learning as usual the focus here will be on foundational Concepts along with examples use cases and application do and don'ts and also python demo so today we'll talk about data distribution which is very important concept in statistics Let's get started so understanding data distributions is very fundamental and it involves familiarizing ourselves with many of the fundamental concepts on data distributions some of the data distribution that we will start with or rather we focus will be the normal distribution and uniform distribution in this particular video the normal distribution as you know is well shaped normal distribution and characterized by symmetry or symmetrical shape we'll discuss whether we really need the data to be symmetrical or not symmetrical from from machine learning and data science perspective uniform distribution is uniform that means all values are equally probable so let's move on and see some of the use use ful and important aspects about normal distribution like probability distribution and probability density cumulative probability density and so on and so forth and this video on data distribution also has significant amount of python examples so what is data distribution it refers to the way the values or outcomes are spread or present in data column or data series or how the data is distributed in data column and that provides information about possible values particular variable can take and how frequently those values can occur this is important analytics that we you know conduct on you know any data column or data series so understanding distribution of data is fundamental to machine learning deep learning and in fact any of these statistical project and that allows us also to choose prop proper or or or better method to you know do the analysis like machine learning or deep learning so let's look at some of the examples of data distributions that we you know hear about or probably we know about the first one is the normal distribution and it is characterized by symmetry commonly seen in the numerical columns like height test scores blood pressure or temperature humidity wind pressure you know or atmospheric you know all the weather related data mostly you know they follow normal distribution one more distribution that we are commonly we commonly hear about is uniform distribution and in this all values in particular range is equally probable that means if you have range of let's say 10 and 20 So within 10 and 20 any value is equally probable and that means it typically straight line in terms of probability density we'll talk about the concept of density in this distribution next one that we see also commonly is left skew data that means the values on the left left hand side here on this side will be like you know some values might be present on this side so this is also called negatively skewed data and it has tail extending to the left for example income distribution where High earners are typically very few and most of the earners are on the average side so this side could be your high earners another example would be just opposite of that which is the right skew distribution where the you know right side is cued and this would be commonly seen in let's say the products where the failures occur and the lifetime of such products you know will exhibit this kind of pattern that means the product which have defects you know they will typically be lasting for fewer days and most of them will show this pattern some of the products where the defects occurred May last longer so that is why the is extending so this is skewed on the right hand side and this is called positively skewed now this right skewed and left skewed is closely associated with you know the normal or normal distribution data that is very common in analytics some more example of data distribution exponential distribution where the values are exponential decaying and mostly it is applicable in queuing Theory one more example binomial distribution where we are looking at number of successes in given number of Trials and for example beri trial and that shows this kind of distribution in this video we will focus on normal distribution and uniform distribution to be precise so how is data distribution really different from the dispersion method that we talked about in in in previous video so let's try and understand that so data distribution is about how the data is spread or distributed in data column this person also talks about the spread but more in in terms of quantifying it and the variability of values within the data from the center point the focus of data distribution is to understand or concentrate on frequency where the most of the values happen or occur dispersion method here the focus is to what extent the values are deviating from the Central Point some of the the characteristics that we will be studying here in data distribution would be probability densities skewness symetry and kosis whereas in the dispersion method we talked about range variance standard deviation IQR and so on and so forth so the here we are more focused on the overall scope sorry the shape of data and here is about variability of the values data distribution is mostly qualitative whereas the dispersion methods that we saw earlier they are numerical representation in terms of let's say mean or range or standard deviation or IQR right q1 Q2 Q3 so that's the perspective here now before we move on we have to understand couple of you know topics from mathematics one is integral we will keep the discussion more at the intuitive level than rather performing the you know integrals here let's say the function is x² which is kind of quadratic function here and and we are interested in finding the area defined between the point 2 and three and the area under the curve so how do we do so is by Computing the integral or definite integral of the function x² between these two values and that is called area under the Curve so the definite integral is nothing but the area under the curve in that region that we are specifying here so this called integral and how we do so or how we compute I'm going to show you in the next slide so what is the correlation between the definite integral that we are talking about and the probability or probability density that is associated with data distribution the correlation between the definite integral or the area under the curve so this is the definite integral and the correlation between that and probability is typically seen in the probability Den distributions for continuous variables an area under the curve the probability density over particular interval is actually the probability of the random variable so that means let's say the variable is this one this one is your variable and the function can be anything and we compute the area between this point and the point is let's say 2.1 to 2.2 so roughly about you're saying 2 point in between you know the average of this so what's the probability of that point so the probability of let's say 2.1 2.2 and that's 4.3 so divided by 2 that's 2.15 see the probability of 2.15 will be this area here so this is the correlation between the integral or the area and particular value now we are talking about continuous you know variable where the variable can assume infinite values and then is very hard to compute the probability of specific value the variable can take so the approximation that we take is or we make is we take small region around 2.15 lower slightly lower on the left hand side and slightly more on the right hand side and then we compute the area by using python function so this is correlation between the definite integral and the probability in case of probability density function so let's take an example here let's say we have we have column and the column has mean and standard deviation as parameters so the objective is to compute the probability that the random variable lies between two points and and that probability density function for normal distribution is characterized by the famous formula here where the mean and standard deviation is used for that column is individual values in the column essentially for every value is going to return the probability density function so what this function actually does is the following let's say this column has the minimum of 100 and the maximum value is 200 so this end is 100 this end is 200 and if we take let's say in between 10,000 points or 100,000 points and then for each of the point we apply this we going to get curve like this this is the meaning of this function and that's what we call as probability density function or representation so for each point we going to get probability density function here so this probability density function simply means NE neighborhood let me draw let's take another Point here and let's say this point is corresponding to the that so what we going to do is we take small area or region around this point and the area under this curve is going to be synonymous with the probability of this particular point so typically we take region you know around this point very very small region and then we compute the integral that integral represents the probability ility or probability density we'll talk about what is the intuitive meaning of probability density so the calculation of probability density is essentially integral and don't worry about all these formulas we are going to use the package the intuition is make sure use the right range and then use the function and call the python packages to compute the definite integral with is nothing but the area between and so let's look at some examples of probability density function that so right now we just talked about this one uniform distribution is very easy which is like the value between and so all values between and are equally likely so for that equally likely the probability is simply going to be 1 / B- and that's the famous uniform distribution similarly we have exponential distribution we'll talk about that these specific distribution in another video series but in this particular video we'll talk about normal distribution and uniform distribution along with the examples where we use them and likewise the binomial distribution and the poison distributions are you know very popular examples of probability density function each one of them will have different shape of the curve so let's look at an example one here let's say the function is x² okay and then that's the function quadratic function and the area under the curve which is let say Point 2 and 3 that gives the probability density which is equivalent of the area under this or in this shaded region so in other words this probability or the area also gives you an idea of how likely How likely is this point we'll take some more examples to illustrate that useful data analytics that we can perform with such measures okay so this is what will be our point of interest from now on the area under the curve and why this is called prob probability density and what are the use case is of probability density so the calculation let's take one exercise PDF calculation we're not going to do you know coding as such but I'm going to show you the library which you can use so given the probability density function let's say the function is 1 / 4 * x² and we have to calculate the probability that is between 1 and two essentially we need to find the area under the curve over the range of 1 and two so that's what it means here so let me let me show you so this is the function and we are interested in this range which is 1 and two and then the area is the light shaded region so we need to compute that prop probability so we use the CPI library to compute the area under the curve which is basically one liner the CPI library is called quad and in this we will pass the function so let me illustrate as to what is happening so we're going to pass this function here and then we pass the range let's say and for example 01 and2 and then essentially this is roughly here we are talking about you know 0.2 as the point I've taken small lower limit and very small higher limit or upper limit 2.2 just added one here so this does it this this is what we do to compute the density or area under the Curve so let's look at some examples what would this mean here so there are three points here let's go through each one of them so this video is going to be slightly longer and I'm I'm going to use lots of examples to make it clear hope you know you guys will appreciate and the the finer points here so there are three points here okay so one is around the mean the mean is this the red line is the mean which is equal to 1 okay so at this point here so this is the mean of the data series now around the mean we have the probability density equal to 2.5 * 10 ^ - 6 so what is it mean it means the probability density around the mean is low of course is low but now we're going to talk about in terms of Rel relative sense is slow within the narrow interval so use this interval here to get to one so you you notice this is very close to one and this is also very close to one just tiny difference this density is small which is this one is pretty small but it is not zero you notice is not zero so that means the sum probability of the mass that means this is the likelihood that the column likelihood that the column will have the value equal to one this is what it means here let's look at another Point which is 0.8 which is about this red line here okay this red line that that you see here that's 8 now what's the probability density got this probability density by using the range 799 which is also close to8 and then 81 this is also close to8 now this is small array value suggests the concentration of probability density means some area so this is representative of the area you know under the tiny Cur tiny region or under the curve here okay so that's the area here so and this is also likelihood so likelihood that the point or the column will have value = to8 let's compare the two here this one and this one notice which value is higher the value you know around one the probability density is higher that means it's more likely that the column will have value of one then column will have value8 that's what it means let's look at point two here Point 2 is very close to very small Point here 2 and the probability density that got is 1.0 10 the power - 7 and you can see here 10 ^ 7 is smaller than the previous two so the interpretation is again the probability density around 2 is very low in fact this is lowest of the three okay so lowest of the three that means the probability that the will be equal to2 is lowest and the highest is probability where the is equal to 1 that has highest probability among the three so this kind of analytics you can perform as part of your data science before you be in machine learning exercise hope this is clear let's move on then to the some more examples here okay so let's move on here the exercise two now we talked about PDF and there is something called CDF okay CDF means cumulative density function so why we need this so let me illustrate again so when you comput it let's say this is prob this function FX equal to whatever and that's my x-axis and then we can compute the probability density for each of the point I'm not drawing the region you can now make an assumption every point is actually region very small region now each point is nothing but PDF and that has some so that PDF is nothing but some numeric number or like you know area under the curve so each of the point is that now that's not probability right so each of the point is simply likelihood they're not probability so let's say this point is two okay so what's the probability that the probability that will be less than equal to 2 right in that case we need to take all of these densities all of these densities and then let's say this is two so we take all of the and we add them up and then we get whatever we get that's the probability let's say 74 so when we add this this is called cumulative density function so now we can use the PDF from the previous exercise that we did and then we compute the cumulative density function essentially the PDF just for illustration this is what is what we just did in exercise one where FX equals to 1 / 4 and x² and that's the function and each of the point is nothing but your PDF now when we take all of these points and cumulatively we go on adding then we get increasing probability curve and this called CD this is useful in determining the exact probability of particular value so the blue curve is blue curve is the PDF and the green one is the cumulative density function and it starts from from zero and approaches one because probability is going to be one anyway right cannot be more than one now steepness of the CDF indicates the rate at which the probability is accumulating so in fact this is what we will use or python packages use CDF internally to return the probability whenever we ask for probability so how do we convert PDF to CDF think already Illustrated the finer points let's look at it one more time now this time the fun function is different it is to ^ - which is you know exponential but in the opposite way so in this case something again I'd like to highlight some of the main points so at the blue curve at this point you will have the maximum PDF and as you go down all the way so the is assuming higher values let's say = 2 2.1 3 3.1 4 4.5 and 5 and so on and so forth as the value of is increasing the PDF is decreasing so it means the likelihood of the function where the variable is taking higher value vales it is going down that's what it means here the PDF so when you cumulatively add them together this is how it will look like and at the end it will remain constant like you know it doesn't matter because the probability is very very negligible at this point and hence this is on the top end of the CDF is high that's what what it means now the interesting thing is the densities are more for lower values of which we just described look at the red line that red line means the maximum density at is equal to 0 that means when the value of is equal to 0 the^ 0 is 1 so that's what it means but the probability is low okay so now why would be say so when the density is high but the probability is low right why we say so let's look at another conceptual point now density simply means the concentration or area or likelihood at particular specific point in continuous distribution so in the context of PDF FX is aun function and the higher density at point indicates the greater likelihood of that variable taking that particular value or near that value now probability calculation is all about cumulative the probability is not just the density that we spoke about but also the width of the interval that means you're asking about the probab ity of two means you know less than or equal to two so that is called the width of the interval so for continuous distribution the probability is computed considering the density and also the width of it and so what is meaning meaning is this if you look at the cumulative one so when we say probability of two this is what it means all of these area here is the probability of two in the density is just the specific point here so density is just that point whereas the probability is all of it under the curve right from the lowest point to the point of Interest so for probability is not just the density but also the width this is what it means here all right so the for this particular case exponential distribution to the^ minus as is increasing the density is decreasing because of the exponential nature and but simultaneously for the CDF okay the interval is getting wider and that's how the probabilities will change or increase the overall probability influenced by both the density and the width of of the interval so it means the pro probability that it will be one will be lower than the probability that the variable will have the value two or three so as you move to the right hand side the probability is going to increase all right so let's then and use these concepts for some more examples so finding the probabilities from CDF now that's what is going to be used so CDF is used to return the probabilities to us so how to find the probability from cumulative density function the CDF provides the cumulative density function let's say we are interested in probability that the value is less than two and that would be the point here and that means the area all of these PDF here you add them together and that's going to be the probability essentially let me just you know one more time so here this is nothing but the PDFs and addition of all the PDF so let's say here and we computed the PDF for each of the point and then we add it so this point that you see here is nothing but the addition of all the PDFs here so so this 86 is nothing but when you add all the PDFs here that returns 86 so that's the use of CDF CDF cumulative density function to get the probability so the interpretation is point let's say = to 1 so essentially we are saying CDF where is less than equal to 1 similarly for 2 less than equal to 2 for three less than equal to 3 hope this is this gives you clear picture in terms of what is density probability density and what is probability let's talk about now implementation of whatever we have talked about in uniform distribution and normal distribution so the probability density function recap it describes probability density describes the likelihood of the random variable at that point very close to range it does not describe the probability as such the PDF provides way to model the probability densities or likelihoods and when you convert that PDF to CDF cumulative that's where you can draw the probabilities let's look at the uniform distribution this is how you compute the probability for uniform distribution quite simple let's say point and point and that would be the probability of every point between and is going to be 1 / - and you notice here the point any point you take here the probability is going to be the same because that's uniform we will see the examples of use cases where we use uniform distribution so what exactly is this data distribution simply data distribution is nothing but the values and these values are spread and they are characterized by you know mean and standard deviation so data distribution typically has Central tendencies that we talked about it in the dispersion variability variance standard deviation and shape of it symmetry skew and all that in this particular video we're talking about the probability of values so one more example of now the practical use case for densities let's say this curve represents your data point or data points rather and probability density function at 20.2 so this is the green line here 20.2 and then we have probability density function at 30.2 and that is your 30.2 here region okay so notice the small region have drawn here okay 30.2 so the PDF is 134 and the PDF for 20.2 is 0.106 and PDF of 60.2 is this one now the mean of my data series is 42 which is red line here so this is my 42 okay now notice the value 30 is actually closer to the mean of 40 okay and the value 60 probably the green line here is 60 here and that's little far away from the mean compared to you know the 30 30 is here so the PDF if you notice the PDF for 30.2 is comparably higher than the PDF for 60.2 that means this column the likelihood of this column taking value close to 30.2 is higher than the likelihood that it will take the value 60.2 so this is the Insight behind how we use these densities or probability densities to do analysis so basically the PDF value is higher for specific point in continuous distribution for example here 30.2 it indicates that the probability density not the probability but probability density around that point is higher or relatively higher compared to other values so that suggest test that the values close to 30.2 are more likely to occur in the data series for example if you have two points 25 and 30.2 the PDF of 30.2 is higher than the PDF of 25.0 that means the likelihood of the column taking the value 30.2 is higher that's what it means similarly for the lower PDF PDF value just the opposite conclusion you will draw so the summary of this is the PDF provides way to quantify the likelihood of the variable taking specific values in continuous distribution higher PDF implies the higher concentration or higher likelihood in other words the area you can think of an area under the curve or definite integral why the the lower PDF suggest the lower concentration or lower likelihood now the interpretation is quite relative for one distribution .13 could be different than .13 of another distribution so it's specific to the distribution and the column comparing the likelihood of different values within particular distribution so we talked about it similarly you know the curves that we saw so far normal distribution and one we saw was uniform distribution so how do we get uniform distribution quite simple take the range and and you know you divide 1 over minus so the probability density function for for uniform distribution is constant because it's constant value and outside the range the probability density will be zero and the formula for you know probability density function is 1/ minus so the probability density is simply horizontal line in case of uniform distribution 1 / minus so that mean each value is equally likely and total area under the Curve will add to one let's take look at the python example to show you the uniform distribution let's take look at simple example here I'm going to take lower bound two and upper bound six it could be any number and then let's say want th000 points between and so generate those numbers by using using numpy and the uniform function it's going to take the range and then it's going to generate those th000 points and if you generate those th000 points and plot histogram you can see is almost giving you very constant representation or like rectangle and visualize the PDF is going to be like straight line because you're going to divide one over particular scalar value so this is the way we compute the PDF of uniform distribution so that was quick demo on uniform distribution and the PDF how we compute that let's look at the summarizing uniform distribution is nothing but rectangular shape of the PDF for every value because every value is equally likely so you get rectangular shape so some of the examples of uniform distribution let's say choosing random number between two values every value between the two values is equally likely in that case we use uniform distribution lottery number selection again the winning numbers are chosen using uniform distribution because every number is equally likely random selection from deck of cards again every card is equally likely so you're going to use random sorry you're going to use uniform distribution let's say you have square and selecting random point in square so every point is equally likely so we can use uniform distribution random time of day again is very good use case for uniform distribution quality control in manufacturing again is let's say you know you are producing some products and and just want to pick anything out of it so or any product for quality control checks so uniform distribution randomize experiments in any experimental design where you have treatments and you know experiments again we have use case for uniform distribution now we talk about the normal distribution how do we get normal distribution essentially normal dist distribution is characterized by two parameters mean and standard deviation when the mean is zero and standard deviation is one that's special normal distribution called standard normal distribution so the probability density function is computed by using this formula we talked about it while ago and this formula is gossan formula as well for normal distribution and it takes the mean and standard deviation and the range of values and it gives you the PDF or PDF for each of the value and then when you when you plot it gives you bell shape curve the shape of the distribution is typically Bell shape whether symmetric non symmetric or little bit symmetric or skewed it all depends on the data or data colum here so some more characteristics of normal distribution that we talk about very very you know common is 68 95 99.7 rule that means 68% of the data will fall within plus and minus one standard deviation in in normally distributed data approximately 95% of the data will fall within two standard deviation plus and minus and three standard deviation will cover almost all the data now this is empirical this is not rule okay so for for example here you know for normal standard normal distribution 68% data will fall between plus and minus one standard deviation 95% plus and minus 2 and 99.7 is plus and - 3 this is very important Rule and we're going to use or make use of this rule in many of these statistical experiments including hypothesis testing so the basis of 68 9599 rule is mostly empirical or it is also called three sigma rule more of guideline than rule rule is based on the properties of the normal distribution that we have seen over Decades of experiment so summary of C3 Sigma rule bulk of data is typically within plus and minus one standard deviation and approximately 95% would be plus or minus 2 standard deviation and 99.7 just like almost 100% of the data will be within plus or minus three standard deviation any data which is outside of 99.7 would qualify to be considered as outlier or extreme value so where are the applications of normal distribution hypothesis testing we'll talk about it in machine learning also we use the concept of normal distribution to scale the data this is also we will cover in our data processing Topic in quality controls the normal distributions are typically used to monitor the quality of production just few have listed there are many any other applications of normal distribution let's take look at python code to show you the 3 Sigma Rule and how we generate the PDF for normal distribution okay so let's take look at simple example on normal distribution I'm taking mean of zero standard deviation of one and we're going to generate some you know normal normal distributed data and if you plot the histogram is going to return the histogram like you know this the bars here now to compute the probability density function just use the norm which is package from scipi and this Norm package will compute the PDF which is basically the formula that showed you earlier in the slide for normal distribution is going to return for every value in the range which we just defined it here 100 points and those 100 points for each of the points is going to compute the probability density and those densities are being plotted in terms of red color you notice here the probability density PDF we matching the histogram let's look at one more example here and now in this example I'm going to show you the relevance of the 36 rule okay so in this particular case I'm using some widgets to show you this have data series which has mean of 17018 so if you look at the plot here let me reduce the size so this is 17018 which is basically the you know the green line and here is your normally distributed curve okay now yeah I'm going to show you now 17018 so this is the mean in the center the green and this the PDF here so now if change the height to let's say 171 and you see the PDF is now shifting and you can see it is if someone so basically it's about height of dos in particular City and the average was 170 something and then CH then chose to make the point as or the height as 178 the plot is telling it's very unlikely that someone would be of this height 17040 cm and you see this is beyond the PDF the these so that means here the PDF will be the highest around the center or the mean because of the normal distribution curve and likewise you can keep going to the right and the probabilities of the PDF will be very very small and likewise if you go on the left hand side let's say someone is 156 CM adult is very very unlikely also so that shows that the in case of normal distribution the data or the probability is of someone or the variable being around the mean is highest for example here let's let's make it 170 again and you will see that the probabilities are now or the PDF is the highest towards the center somewhere here all right 170 so if pick someone with the same height then the probabilities that you will find someone with that height is very high that's what it means and this is the important concept that we're going to use in hypothesis testing as well so that was the quick demo on normal distribution and PDF and also 3 Sigma you know rule what are the examples of normal distribution quite few height of population gener exhibits the normal distribution pattern IQ scores of some people students in class or you know some population or sample in general also exhibit normal distribution test scores again of Select students or college within class you know that also or SAT score ACT score you know they all exhibit normal distribution some more examples errors in measurement this also represents normally distribution or normally distributed patterns response time psychology and the BP industry or you know the service Industries there also the response time of individuals you know show the normal distribution Financial market returns that also shows normal distribution patterns blood pressure measurement of noise weight of manufacturing products sat course for example these all show the characteristics of normal distribution so essentially symmetry is one of the closely associated characteristics of normally distributed data Bell shape whether it is completely Bell shape or less than Bell shape or skewed or not skewed depends on the data that you have mean median of normally distributed data are equal because that's at the Center standard deviation larger the values of Sigma results in the wider distribution while the smaller values mean the narrower distribution and another popular characteristics is 68 95 99.7 rule so standard normal distribution also is called variation of normal distribution and only difference is the mean is zero standard deviation of one here is often written as so normal distribution is normal distribution with any mean or standard deviation standard normal distribution is just the differences the mean becomes zero standard deviation becomes one so the use case for standard normal distribution in normalizing the data or standardizing the data we will see those in our data process processing so let's say if you have value or let's say we have column so we subtract here by mean and then divide by standard deviation so every data point now will be represented in terms of some standard deviation this is called normalization we'll discuss those in our data pre-processing techniques outlier detection is one of the applications of standard normal distribution because 99% of data will be within 3 Sigma and anything which is outside of 99% can possibly be an outlier hypothesis testing uses the concept of the3 Sigma rule some note on normally distributed not every data in the world is normally distributed although the central limit theorem says that you know dat should be normally distributed examples of non-normally distributed income distribution not everyone earns the same there are people who earn quite lot and large mass of people earn typically common value so this is an example of not normally distributed data or skewed data aging population is also an example of not normally distributed data more people in the younger group group than the people in the Advanced age group product defects also show the patterns of non-normally distributed data the number of defects found in the product typically follows another distribution called Poison distribution which is not normal we will talk about the poison distribution in the advanced statistical topics some more examples of not normally distributed data web page hits this also shows some patterns majority of pages are watched by majority of the users few pages are not watched by that so again you know tendency of skewed distribution customer transaction amount most of the customer they have similar you know small transactions but some of them can do large transaction okay let's look at some more social media engagement some posts go viral some posts do not go viral again is skew data Hospital duration stay duration some patient many patients stay for one or two days and some people may need longer hospitalization weight times also shows some weight right skew data that means most of the time you know people wait for very less time but sometime people may have to wait for longer time all right should we now this is the most important point that we have in our mind should we worry too much about nonnormally data non-normally distributed data normal distribution is an assumption common assumption in many of the statistical method but is also important not all machine learning models require the data to be normally distributed so some of the considerations that have listed here we will discuss those in our machine learning course anyway so the considerations are many machine learning models like decision trees random Forest nearest neighbor base they are nonparametric means they're not dependent on standard deviation mean and all that and they assume specific they don't assume any specific distribution so those models for those model we don't need to worry too much about normally distributed or non normally distributed but if we using something like linear regression and so on so forth that requires the data to be normally distributed so consider robust models so we have we should choose machine learning models which are robust and they are less sensitive to outliers and you know some of the distributional assumptions examples like support Vector machines or neural nets for example they can handle nonnormality in the data as well so we will talk about each of the machine learning models in our machine learning class and we'll see which ones are the ones where we need to you know take note of the abnormality in the data nonetheless it's good idea to experiment and study and understand the shape and distribution of data and make some good you know decisions based on which model will probably work for your data that's pretty much in this topic this was pretty long 1 hour video thought it will could wrap it up in 45 minutes or 40 minutes but it went over 1 hour thank you all for your patience we'll see you in the next video