Unveiling the Power of Statistics: Mastering Data Science - Part  02

Unveiling the Power of Statistics: Mastering Data Science - Part 02

Play this article

In the previous part, we talked about population and sample, types of data and many more. Today we are going deeper into Mean, Mode, Median and all the way up to Correlation and Coefficient.

Mean, Mode, Median👇

3 measures of central tendency are mean, mode and median.

Mean is known as the simple average.

Mean is denoted by ‘μ’ for the population and x̄ for a sample.

We can find the mean of a data set by adding up all of its components and then dividing by the number of components.

The mean is the most common measure of central tendency. But it has a huge downside. It’s easily affected by outliers.

For example -

New York

Los Angeles

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

11

10

66

Mean = 11

Mean = 5.5

Mean is not enough to make a definite conclusion.

The median is the middle number in an ordered data set.

The median is not affected by outliers.

The mode is the value that occurs most often.

Skewness👇

The measure of asymmetric is skewness.

Sample skewness formula-

What is Skewness?👇

Skewness indicates whether the data is concentrated on one side.

If mean > median then it is a positive or right skew. The graph will be like that-

If mean = median = mode then there is no skew. The graph will be

If mean < median then there is a negative or a left skew. The graph will be -

Skewness tells about where the data is situated.

Measuring asymmetric-like skewness is the link between central tendency measure and probability theory.

Variance👇

What is Variance?👇

Variance measures the dispersion of a set of data points around their mean value.

Population variance:

Sample variance:

The closer a number is to the mean, the lower the results we will obtain.

The further away from the mean it lies the larger this difference.

Dispersion is a non-negative value.

Squaring amplifies the effect of large differences.

Standard Deviation And Coefficient Of Variation👇

Variance is the common measure of data dispersion in most cases.

The population standard deviation is - σ = √σ².

The sample standard deviation is - s = √s².

What is the Coefficient of Variation(CV)?👇

It is simply a standard deviation relative to the mean.

The sample coefficient of variation is -

Standard deviation is the most common measure of variability for a single dataset.

A coefficient of variation is needed when we need to compare two or more datasets.

Standard deviation is the preferred measure of variability, as it is directly interpretable.

Covariance👇

Measures that are used when we work with more than one variable -

  1. Covariance

  2. Linear correlation coefficient

What is Covarinace?👇

The two variables are correlated and the main statistic to measure this correlation is called covariance.

Covariance may be -

  1. >0

  2. =0

  3. <0

The population covariance formula is -

The sample covariance formula is -

For example -

Size(ft)

Price($)

650

772000

785

998000

1200

1200000

720

800000

975

895000

mean= 866

933000

covariance= 33,491,250

Sense of direction a covariance gives -

  1. >0, the two variables move together.

  2. <0, the two variables move in opposite direction.

  3. =0, the two variables are independent.

Correlation Coefficient👇

Correlation adjusts covariance so that the relationship between the two variables becomes easy and intuitive to interpret.

The formulas for the correlation coefficient are -

A correlation of 0 between two variables means that they are absolutely independent from each other.

Symmetrical with respect to both variables.

Correlation does not imply causation.

Before we end…

Thank you for taking the time to read my posts and share your thoughts. If you like my blog please give a like, comment and share it with your circle and follow for more I look forward to continuing this journey with you.

Let’s connect and grow together. I look forward to getting to know you better😉.

Here are my social links below-

Linkedin: linkedin.com/in/ai-naymul

Twitter: twitter.com/ai_naymul

Github: github.com/ai-naymul