## Table of contents

Imagine this: You're standing at the entrance of a dense forest, armed with a map that promises to guide you through the uncharted territory of data. The air is buzzing with excitement as you set foot on a winding trail named "Unveiling the Power of Statistics: Mastering Data Science." With every step, you're about to unlock the secrets that transform raw numbers into captivating stories. Just like a trailblazer armed with a compass, we're here to equip you with the essential tools to navigate the world of data science, starting with the enigmatic realm of statistics. So grab your metaphorical hiking boots and join us on this exhilarating journey as we embark on a quest to unravel the mysteries of statistics and master the art of data science. Get ready to explore, discover, and conquer – because the wilderness of numbers awaits, and the adventure begins now!

**Statistics: Population and Sample👇**

The first step in every statistical analysis you perform is to determine the data you dealing with is a population or a sample.

**What is Population?👇**

A population is the collection of all items of interest to our study and is usually denoted with an `upper-case n (N)`

.

The number we have obtained when using a population is called `parameters`

.

**What is Sample?👇**

A sample is a subset of the population and is denoted with a `lowercase n(n)`

.

The numbers we obtained when working with the sample are called `statistics`

.

The population is hard to define and hard to observe in real life.

However, A sample is much easier to gather. It is less consuming and less costly. Time and resources are the main reason we prefer drawing samples compared to analyzing an entire population. The population is hard to observe and hard to contact but samples are easy to observe and easy to contact

Since statistical tests are usually based on sample data, samples are key to accurate statistical insight.

They have two different characteristics-

Randomness

Representativeness

A sample must be both random and representative for an insight to be precise.

**What is Randomness?👇**

A random sample is collected when each member of the sample is chosen from the population strictly by chance.

**What is a Representative?👇**

A representative sample is a subset of the population that accurately reflects the member of the entire population.

**Type Of Data👇**

We can classify data in two main ways based on its types and its measurement level.

Types of data are divided into two parts -

Categorical

Numerical

**What is Categorical Data?👇**

Categorical data describe categories or groups our example is car brands like Toyota, BMW, and Audi. They show different categories.

Another instance is answers to yes and no questions

**What is Numerical data?👇**

Numerical data represents numbers. It's further divided into two subsets-

Discrete

Continuous

**What is discrete data?👇**

Discrete data can usually be counted in a finite matter. For example-

The number of children that you want to have. Even if you don’t know exactly how many. You are sure that the value will be an integer such as 0, 1 or 10.

**What is Continuous data?👇**

Continuous Data is infinite and impossible to count

Examples of Discrete: Grades, numbers of objects, money

Example of Continuous: Height, Area, Distance and Time

Time on a clock is discrete but time in general is continuous.

**Levels of Measurement👇**

Levels of measurement can be split into two groups such as-

Qualitative

Quantitive

Qualitative data is split into two parts such as-

Nominal

Ordinal

**What is Nominal Data?👇**

Nominal variables are like the categories we talked about just now Toyota, BMW and Audi. They aren’t numbers and cannot be ordered.

**What is Ordinal data?👇**

Ordinal data consists of two groups and categories which follow a strict order. For example-

You have been asked to rate your lunch and the options are disgusting, unappetizing, neutral, tasty and delicious although we have words rather than numbers. It’s obvious that their preferences are ordered from negative to positive. Thus the level of measurement is qualitative, Ordinal.

Quantitive variables are split into two groups -

Interval

Ratio

Interval and ratio are both represented by numbers but have one major difference. The ratio has a true 0 but the interval doesn’t. Most things we observe in the real world are ratios. For example-

If I have two apples and you have 6 apples you would have three times as many as I do.

Examples are the number of objects, distance and time.

Temperature is a most common example of an interval variable.

0ºC and 0**°**F are not true zeros

There is another scale called kelvin which has a true zero.

**Categorical Variables- Visualization Technique👇**

Visually data is the most intuitive way to interpret it, so It's a valuable skill. It is much easier to visualize data if you know its type and measurement level.

The most common way to visualize categorical variables is frequently distributed tables, bar charts, pie charts and Pareto diagrams. For example-

Frequency table of a German car shop -

Car name | Frequency |

Audi | 124 |

BMW | 98 |

Mercedes | 113 |

Total | 335 |

In this case, frequency is the number of units sold.

Bar charts are also known as column charts.

Example of Barcharts:

Example of the pie chart:

Relative frequency is the percentage of the total frequency for each category.

Market share is represented by pie charts.

**What is Pareto Diagram?👇**

A Pareto diagram is a special type of bar chart, where categories are shown in descending order of frequency.

Frequency is the number of occurrences of each item.

Pareto Diagram:

A curve on the same graph shows the cumulative frequency.

The cumulative frequency is the sum of the relative frequencies.

For example -

The Pareto diagram combines the strong sides of the bar and pie charts

The Pareto principle is also known as the 80-20 rule.

It states that 80% of the effect come from 20% of the causes.

A Pareto diagram shows how subtotal changes with each additional category and provides us with a better understanding of our data.

**Numerical Variables - Frequency Distribution Table👇**

When we deal with numerical variables it makes much more sense to group the data into intervals and then find the corresponding frequencies.

Generally, statisticians prefer 5 to 20 intervals.

Intervals largely depend on the amount of data we are working with.

To make a frequency table we want desired intervals of 5. The length of the interval should be equal to -

A number is included in an interval if that number :

Is greater than the lower bound

Is lower or equal to the upper bound

Frequency table of numerical variable:

Dataset | Frequency |

1 | 1 |

9 | 1 |

22 | 1 |

24 | 2 |

32 | 2 |

41 | 1 |

44 | 1 |

48 | 1 |

57 | 1 |

66 | 1 |

70 | 1 |

73 | 1 |

75 | 1 |

76 | 1 |

79 | 1 |

82 | 1 |

87 | 1 |

89 | 1 |

95 | 1 |

Frequency Table:

Desired interval = 5

Interval width = largest number - smaller number/number of the desired outcome

\= 19.8 ≈ 20

Interval start | Interval end | Frequency | Relative frequency |

1 | 21 | 2 | 0.10 |

21 | 41 | 4 | 0.20 |

41 | 61 | 3 | 0.15 |

61 | 81 | 6 | 0.30 |

81 | 101 | 4 | 0.20 |

**The Histogram👇**

The most common graph used to represent numerical data is the histogram.

For example -

We may create a histogram with unequal intervals.

**Crosstable and Scatter Plot👇**

The most common way to represent categorical variables using cross tables or what some statisticians called contingency tables.

A very useful chart in such cases is a variable of the bar chart called the side-by-side bar chart.

Example of contingency table below -

All graphs are very easy to create and read, once you have identified the type of data you are dealing with and decided on the best way to visualize it.

Scatter plots are used when we are representing two numerical variables

In scatter plots, outliers are data points that go against the logic of the whole dataset.

**Before we end…**

Thank you for taking the time to read my posts and share your thoughts. If you like my blog please give a like, comment and share it with your circle and follow for more I look forward to continuing this journey with you.

Let’s connect and grow together. I look forward to getting to know you better😉.

Here are my socials links below-

**Linkedin:** **linkedin.com/in/ai-naymul**

**Twitter:** **twitter.com/ai_naymul**

**Github:** **github.com/ai-naymul**