STATISTICS FOR DATA SCIENCE
1. VARIABLE : It a place holder which stores values.
2. Random variable : It is a random collection of variables.
It is of two types :
A. Numerical variable : A numerical is one that may take on any value within a finite or infinite interval (e.g., height, weight, temperature, blood glucose, …)
Numerical variable is further divided into two parts :
A.1. Continuous(floating number) : A continuous variable is one which have decimal values. For example : 5.6, 7.8, 0.001, 846.245
A.2. Discrete(whole number) : Discrete numbers are the basic counting numbers. For example : 0, 1, 2, 3, 4, 5, 6
B. Categorical Variable : A categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values (e.g. race, sex, age group)
Categorical Variable is further divided into two parts :
B.1. Nominal : A nominal variable does not have orders.
B.2. Ordinal : An ordinal variable is a categorical variable for which the possible values are ordered (e.g. education level (“high school”, ”BS”, ”MS”, ”PhD”))
RANDOM VARIABLE CONCLUSION :
3. MEASURE OF CENTRAL TENDENCIES :
A. MEAN : it is the sum of a collection of numbers divided by the count of numbers in the collection
mean = sum of number of collection / total collection
B. MEDIAN : The “middle” of a sorted list of numbers(When there are two middle numbers we average them).
C. MODE : The mode of a set of data values is the value that appears most often.
NOTE : mean, median, mode helps in handling missing values.
4. RANGE : The Range is the difference between the lowest and highest values. Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So the range is 9 − 3 = 6.
5. POPULATION, SAMPLE, POPULATION MEAN, SAMPLE MEAN :
POPULATION : a population is a set of similar items or events.
SAMPLE : small collection of items from population.
Every dataset that we get to perform ML model is a sample of data.
Population vs sample use case : exit poll on election.
POPULATION MEAN : The population mean is an average of a group characteristic.
SAMPLE MEAN : A sample mean refers to the average of the sample data.
6. VARIANCE :
variance : It is the desire for the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers is spread out from their average worth.

7. Standard deviation and measure of dispersion:
Standard deviation (SD) is the most usually utilized measure of dispersion. It is a measure of spread of data about the mean. SD is the square root of aggregate of squared deviation from the mean isolated by the number of observations.The standard deviation is a measure of the measure of variation or dispersion of a set of values. A low standard deviation shows that the values will in general be near the mean of the set, while an exclusive requirement deviation demonstrates that the values are spread out over a wider range.

STANDARD DEVIATION
Normal distribution, otherwise called the Gaussian distribution, is a probability distribution that is symmetric about the mean, demonstrating that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a chime curve.Gaussian distribution to Standard normal distribution(mean=0 and standard deviation=1) [(x-mean)/standard deviation = (z-score)].

GAUSSIAN / NORMAL DISTRIBUTION
9. STANDARD NORMAL DISTRIBUTION :
The standard normal distribution is a normal distribution with a mean of zero and standard deviation of 1.
Empirical formula :
68.2% lies in 1st standard deviation
95.4% lies in 1st standard deviation
99.7% lies in 1st standard deviation
10. Z-SCORE :
The estimation of the z-score reveals to you the number of standard deviations you are away from the mean. In the event that a z-score is equivalent to 0, it is on the mean. A positive z-score shows the raw score is higher than the mean average. For instance, if a z-score is equivalent to +1, it is 1 standard deviation over the mean.
Z-SCORE
10. PROBABILITY DENSITY FUNCTION :
A probability density capacity, or density of a nonstop random variable, is a capacity whose esteem at some random example in the example space can be interpreted as providing a relative probability that the estimation of the random variable would rise to that example.

11. CUMULATIVE DISTRIBUTION FUNCTION :
The cumulative distribution function (CDF) of a real-valued random variable , is the probability that will take a value less than or equal to.
12. HYPOTHESIS TESTING :

KERNEL DENSITY ESTIMATION(KDE) is a non-parametric way to estimate the probability density function of a random variable.
Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel.

14. CENTRAL LIMIT THEOREM :
The central limit theorem expresses that on the off chance that you have a populace with mean μ and standard deviation σ and take adequately large random examples from the populace with replacement , at that point the distribution of the example means will be approximately normally distributed.The central limit theorem reveals to us that regardless of what the distribution of the populace is, the state of the examining distribution will approach normality as the example size (N) increases.

15. SKEWNESS :


Pearson’s correlation coefficient (r) is a measure of the strength of the relationship between the two variables.Pearson Correlation Coefficient helps in feature choice.
Pearson Correlation Coefficient lies b/w – 1 to 1.
Pearson Correlation Coefficient tells magnitude and direction.

18. SPEARMAN RANK CORRELATION :
It surveys how well the relationship between two variables can be described utilizing a monotonic function(function between ordered sets that preserves or reverses the provided order.).
Spearman’s rank correlation coefficient tells magnitude and direction in any event, for non linear data and outliers.
FORMULA OF SPEARMAN RANK CORRELATION
SAME RESULT WHEN THERE IS NO OUTLIER
SPEARMAN GIVE BETTER RESULT IN OUTLIER
POSITIVE SPEARMAN CORRELATION
NEGATIVE SPEARMAN CORRELATION
Q–Q (quantile-quantile) plot is a probability plot, which is a graphical strategy for comparing two probability distributions by plotting their quantiles against one another.A Q–Q plot is utilized to compare the states of distributions, providing a graphical perspective on how properties, for example, area, scale, and skewness are similar or different in the two distributions.

20. CHEBYSHEV’S INEQUALITY :
Chebyshev’s inequality guarantees that, for a wide class of probability distributions, close to a certain fraction of values can be more than a certain good ways from the mean.
In particular, close to 1/k2 of the distribution’s values can be more than k standard deviations from the mean (or proportionately, in any event 1 − 1/k2 of the distribution’s values are within k standard deviations of the mean)

A binomial distribution can be thought of as just the probability of a SUCCESS or FAILURE result in an experiment or survey that is repeated on different occasions. The binomial is a type of distribution that has two potential results (the prefix “bi” signifies two, or twice). For instance, a coin throw has just two potential results: heads or tails and stepping through an examination could have two potential results: pass or come up short.Binomial distributions should likewise meet the accompanying three criteria:
A. The number of observations or trials is fixed.
B. Every observation or trial is free.
C. The probability of accomplishment is actually the equivalent from one trial to another.
Real Life Examples :
On the off chance that another drug is introduced to cure an illness, it either cures the ailment (it’s fruitful) or it doesn’t cure the ailment (it’s a failure). On the off chance that you purchase a lottery ticket, you’re either going to win cash, or you aren’t. Essentially, anything you can think about that must be a triumph or a failure can be represented by a binomial distribution.


23. LOG-NORMAL DISTRIBUTION :
A log-normal distribution is a constant probability distribution of a random variable whose logarithm is normally distributed. In this manner, in the event that the random variable X is log-normally distributed, at that point Y = ln(X) has a normal distribution.

LOG-NORMAL DISTRIBUTION
The power law (likewise called the scaling law) expresses that a relative change in one quantity results in a proportional relative change in another. The least difficult case of the law in real life is a square; in the event that you twofold the length of a side (say, from 2 to 4 inches) at that point the area will quadruple (from 4 to 16 inches squared).
25. BOX-COX TRANSFORM :
A Box Cox transformation is a transformation of a non-normal ward variables into a normal shape. Normality is an important suspicion for some measurable methods; if your data isn’t normal, applying a Box-Cox implies that you are ready to run a broader number of tests.
26. POISSON DISTRIBUTION :
The Poisson distribution is the discrete probability distribution of the number of occasions occurring in a given time span, given the average number of times the occasion occurs over that time-frame.
Model : A certain drive-through eatery gets an average of 3 visitors to the drive-through per minute. This is only an average, however. The genuine sum can vary.

27. NON-GAUSSIAN DISTRIBUTION :
In spite of the fact that the normal distribution becomes the overwhelming focus in insights, numerous processes follow a non normal distribution. This can be because of the data naturally following a particular type of non normal distribution (for instance, bacteria growth naturally follows an exponential distribution). In other cases, your data assortment strategies or other systems might be to blame.
Dealing with Non Normal Distributions
You have several alternatives for taking care of your non normal data. Numerous tests, including the one example Z test, T test and ANOVA accept normality. You may at present have the option to run these tests if your example size is sufficiently large (as a rule over 20 items). You can likewise decide to transform the data with a capacity, forcing it to fit a normal model. However, on the off chance that you have a minuscule example, an example that is slanted or one that naturally fits another distribution type, you might need to run a non parametric test. A non parametric test is one that doesn’t accept the data fits a particular distribution type. Non parametric tests incorporate the Wilcoxon marked rank test, the Mann-Whitney U Test and the Kruskal-Wallis test.