In the previous part, we talk about Inferential Statistics, Distribution, Normal Distribution, Sampling Distribution, Central Limit Theorem(CLT), Standard Error, and Estimators and Estimates. In this blog post, we going to learn about Confidence Interval, Population Variance Known, Z-Score, Student’s T-Distribution, and up to T-Score.

# Confidence Interval**👇**

Imagine visiting 5% of the restaurants in London and saying that the average meal is worth 22.50 pounds. You may be close, but chances are that the true value isn’t 22.50 but somewhere around it. It’s much safer to say that the average meal in London is somewhere between 20 and 25 pounds. Isn’t it? In this way, you have created a confidence interval around your point estimate of 22.50.

A confidence interval is a much more accurate representation of reality.

The level of confidence is denoted by

And it’s called the confidence interval of the interval.

Alpha is a value between (0-1).

For example, if we wanna be 95% confident that the parameter is inside the interval, alpha is 5%.

If we want a higher competence level of say 99%, the alpha will be 1%

The formula for all confidence intervals is -

## Confidence Interval: Population Variance Known; Z-score**👇**

**What is Confidence Interval?👇**

A confidence interval is the range within which you expect the population parameter to be.

And its estimation is based on the data we have in our sample.

There can be two main situations when we calculate the confidence intervals for a population -

Known

Unknown

Depending on which situation we are in we would use a different calculation method.

The whole field of statistics exists because we rarely have population data. Even if we do have a population we may not be able to analyze it. It may be so much that it doesn’t make sense to be used all at once.

## Confidence interval for a population mean with a known variance**👇**

An important assumption in this calculation is that the population is normally distributed, Even if it is not, you should use a large sample and let the central limit theorem do the normalization magic for you.

Dataset |

117313 |

104002 |

113038 |

101936 |

84560 |

113136 |

80740 |

100536 |

105052 |

87201 |

91986 |

94868 |

90745 |

102848 |

85927 |

112276 |

108637 |

96818 |

92307 |

114564 |

Sample Mean = 100200

Population std = 15000

Standard error = 2739

Note: The sample mean is the point estimate in that case.

Common confidence levels are 90%, 95%, and 99% with a respective alpha of 10%,5% and 1%.

A 95% confidence interval means that you are sure that in 95% of the cases, the true population parameter would fall into the specified interval.

A commonly used term for the z is ‘Critical value’.

From the upper dataset, we are 95% confident that the average data scientist salary will be in the interval [94833, 105568]

In this case, there is a tradeoff between the level of confidence we choose and the estimation precision the interval we obtained is broader.

The opposite is also true, A narrow confidence interval translates to higher uncertainty.

If we are trying to estimate the population mean and we are picking a larger interval we are increasing our chances of having an interval that includes the mean.

**Confidence interval classification👇**

The sample mean is located in the middle of the graph.

Now if we know that a variable is normally distributed, we are making the statement that the mean and the rest are far away from it.

On the upper graph, a 95% confidence interval would imply we are 95% confident the true population mean falls within this interval.

When our confidence is lower the confidence interval itself is smaller(1-α is lower, CI is smaller)

When our confidence is higher the confidence interval itself is larger.

95% is the accepted norm, as we don’t compromise with accuracy too much, but still get a relatively narrow interval.

**Student’s T- Distribution👇**

Story of the Student’s T-Distribution-

William Gosset was an English statistician who worked for the brewery of Guinness. He developed a different method for the selection of the best-yielding varieties of barley, an important ingredient when making beer, Gosset found big samples tedious, So he was trying to develop a way to extract small samples but still came up with meaningful predictions. He was a curious and productive researcher and published several papers that are still relevant today. However, due to the Guinness company policy, he was not allowed to sign the papers with his name therefore, all of his work was under the pen name “Student’.later on, a friend of his and a famous statistician Ronald Fisher stepped on the findings of Gosset Introduce The T-Statistics and the name that stuck with corresponding distribution even today is Student’s T.

The student’s T distribution is one of the most significant breakthroughs in statistics.

Student’s T distribution allowed inferences through small samples with an unknown population variance. this setting can be applied to a big part of the statistical problems we face today and is an important part of this course.

The visual representation has fatter tails.

Father tails allow for a higher dispersion of variables and there is more uncertainty.

In the same way that the z-statistical is related to the standard Normal Distribution, the T-statistics is related to the student’s T distribution.

The formula that allows us to calculate it is -

After 30 degrees of freedom, the T-statistics table becomes almost the same as the z-statistics

As the degrees of freedom depend on the sample in essence the bigger the sample the closer we get to the actual numbers.

A common rule of thumb is that for a sample containing more than 50 observations, we use the z-table instead of the T-table.

**Confidence Intervals: Population Variances Unknown: T-Score👇**

Confidence Intervals based on small samples from normally distributed populations are calculated with the t-statistics.

Ex:

Dataset |

78000 |

90000 |

75000 |

117000 |

105000 |

96000 |

89500 |

102300 |

80000 |

Sample Mean = 92533

Sample Standard deviation = 13932

Standard error = s/√n

Population variance is unknown. The sample size is small => Student’s T distribution.

The formula of unknown and known variance-

There are two key differences, first instead of z-statistics, we have t-statistics and second instead of population standard deviation we have sample standard deviation

The logic behind constructing confidence intervals in both cases is the statistics at hand and the standard deviation

When population variance is known population standard deviation goes with the z-statistics.

When the population variance is unknown sample standard deviation goes with the t-statistics.

From the upper dataset-

In that case, when we know the population variance, we get a narrow confidence interval. When we do not know the population variance, there is a higher uncertainty that is reflected by wider boundaries for our interval.

When we do not know the population variance, we can still make predictions but they will be less accurate.

Here we’ve got two effects-

Smaller sample size

Unknown population variance contributes to the width of the interval

The proper statistics for estimating the confidence interval when the population variance is unknown is the t-statistics, not the z-statistics.

**Before we end…**

Thank you for taking the time to read my posts and share your thoughts. If you like my blog please give a like, comment, and share it with your circle and follow for more I look forward to continuing this journey with you.

Let’s connect and grow together. I look forward to getting to know you better😉.

Here are my social links below-

**Linkedin:****linkedin.com/in/ai-naymul**

**Twitter:****twitter.com/ai_naymul**

**Github:** **github.com/ai-naymul**