Table of Contents
Last modified on December 20th, 2024
While finding the standard deviation of a data set, we may encounter two different sets of data based on its scope and size. The two types are population standard deviation and sample standard deviation.
Let us now discuss them in detail.
The population standard deviation represents the entire population of an area under consideration, such as a national census or during a financial report. Thus, it includes all individuals in a population.
The population standard deviation is generally represented by the Greek letter ‘σ’
In statistics, data can be ungrouped (raw) or grouped data (well-organized). We can calculate the standard deviation for each type of data.
We calculate the population standard deviation of a dataset using the formula:
${\sigma =\sqrt{\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}}$
Here,
Let us find the population standard deviation for the dataset 5, 8, 10, 12, and 15
Finding the mean (μ)
As we know,
Mean = (Sum of data points) ÷ (Total data points)
= ${\dfrac{5+8+10+12+15}{5}}$ = 10
Calculating the Squared Differences ((xi – μ)2)
(5 – 10)2 = 25
(8 – 10)2 = 4
(10 – 10)2 = 0
(12 – 10)2 = 4
(15 – 10)2 = 25
Here, the sum of the squared differences is:
${\sum \left( x_{i}-\mu \right) ^{2}}$ = 25 + 4 + 0 + 4 + 25 = 58
Using the Formula
Thus, the population standard deviation is ${\sigma =\sqrt{\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}}$
= ${\sqrt{\dfrac{58}{5}}}$ = ${\sqrt{11.6}}$ ≈ 3.41
Thus, the population standard deviation of the ungrouped data is 3.41
The population standard deviation (σ) formula for grouped data is calculated as:
${\sigma =\sqrt{\dfrac{\sum f\left( x_{i}-\mu \right) ^{2}}{N}}}$
Here,
Let us calculate the population standard deviation of the survey that records the test scores of 50 students.
Score Interval | Frequency (f) |
---|---|
0 – 10 | 5 |
10 – 20 | 8 |
20 – 30 | 12 |
30 – 40 | 15 |
40 – 50 | 10 |
Finding the Midpoints
Score Interval | Frequency (f) | Midpoint (x) |
---|---|---|
0 – 10 | 5 | ${\dfrac{0+10}{2}}$ = 5 |
10 – 20 | 8 | ${\dfrac{10+20}{2}}$ = 15 |
20 – 30 | 12 | ${\dfrac{20+30}{2}}$ = 25 |
30 – 40 | 15 | ${\dfrac{30+40}{2}}$ = 35 |
40 – 50 | 10 | ${\dfrac{40+50}{2}}$ = 45 |
Finding the Mean
The mean for grouped data is calculated as: ${\mu =\dfrac{\sum fx}{\sum f}}$
${\sum fx}$
= (5 × 5) + (8 × 15) + (12 × 25) + (15 × 35) + (10 × 45)
= 25 + 120 + 300 + 525 + 450
= 1420
${\sum f}$
= 5 + 8 + 12 + 15 + 10
= 50
Thus, μ = ${\dfrac{1420}{50}}$ = 28.4
Finding (x – μ)2 and f(x – μ)2
x | f | x – μ | (x – μ)2 | f(x – μ)2 |
---|---|---|---|---|
5 | 5 | 5 – 28.4 = -23.4 | 547.56 | 5 × 547.56 = 2737.8 |
15 | 8 | 15 – 28.4 = -13.4 | 179.56 | 8 × 179.56 = 1436.48 |
25 | 12 | 25 – 28.4 = -3.4 | 11.56 | 12 × 11.56 = 138.72 |
35 | 15 | 35 – 28.4 = 6.6 | 43.56 | 15 × 43.56 = 653.4 |
45 | 10 | 45 – 28.4 = 16.6 | 275.56 | 10 × 275.56 = 2755.6 |
Thus, ${\sum f\left( x-\mu \right) ^{2}}$ = 2737.8 + 1436.48 + 138.72 + 653.4 + 2755.6 = 7721.6
Calculating Population Standard Deviation
${\sigma =\sqrt{\dfrac{\sum f\left( x_{i}-\mu \right) ^{2}}{N}}}$
= ${\sqrt{\dfrac{7721.6}{50}}}$
= ${\sqrt{154.43}}$ ≈ 12.43
Thus, the population standard deviation of the grouped data is 12.43
The sample standard deviation estimates the standard deviation of a dataset, which is a subset of the population. It accounts for the potential bias in smaller datasets by using n – 1 (Bessel’s correction) instead of the total number n.
It is commonly used in research and surveys, such as medical trials and market research, where only a subset of the population is analyzed.
The sample standard deviation is generally represented by the letter ‘s’
We calculate the sample standard deviation of a population data set using the formula:
${s=\sqrt{\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}}$
Here,
Note: Bessel’s correction ensures the sample standard deviation is an unbiased approximation of the population standard deviation. Without this correction, the variability in smaller samples cannot be estimated properly.
Let us find the population standard deviation for the dataset 7, 9, 12, 13, and 16
Finding the mean (${\overline{x}}$)
As we know,
Mean = (Sum of data points) ÷ (Total data points)
= ${\dfrac{7+9+12+13+16}{5}}$ = 11.4
Calculating the Squared Differences (${\left( x_{i}-\overline{x}\right) ^{2}}$)
(7 – 11.4)2 = (-4.4)2 = 19.36
(9 – 11.4)2 = (-2.4)2 = 5.76
(12 – 11.4)2 = (0.6)2 = 0.36
(13 – 11.4)2 = (1.6)2 = 2.56
(16 – 11.4)2 = (4.6)2 = 21.16
Here, the sum of the squared differences is:
${\sum \left( x_{i}-\overline{x}\right) ^{2}}$ = 19.36 + 5.76 + 0.36 + 2.56 + 21.16 = 49.2
Using the Formula
As we know, the sample standard deviation is given by the formula:
${s=\sqrt{\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}}$
= ${\sqrt{\dfrac{49.2}{5-1}}}$ = ${\sqrt{\dfrac{49.2}{4}}}$ = ${\sqrt{12.3}}$ ≈ 3.51
Thus, the sample standard deviation of the ungrouped data is 3.51
Similarly, for grouped data, it is calculated by using the formula:
${s=\sqrt{\dfrac{\sum f\left( x_{i}-\overline{x}\right) ^{2}}{n-1}}}$
Here,
Now, let us calculate the sample standard deviation from the survey recording the test scores of 50 students.
Score Interval | Frequency (f) |
---|---|
0 – 10 | 5 |
10 – 20 | 8 |
20 – 30 | 12 |
30 – 40 | 15 |
40 – 50 | 10 |
Finding the Midpoints
We have:
Score Interval | Frequency (f) | Midpoint (x) |
---|---|---|
0 – 10 | 5 | 5 |
10 – 20 | 8 | 15 |
20 – 30 | 12 | 25 |
30 – 40 | 15 | 35 |
40 – 50 | 10 | 45 |
Finding the Mean
${\sum fx}$
= (5 × 5) + (8 × 15) + (12 × 25) + (15 × 35) + (10 × 45)
= 1420
${\sum f}$
= 5 + 8 + 12 + 15 + 10
= 50
Thus, ${\overline{x}}$ = ${\dfrac{1420}{50}}$ = 28.4
Finding ${\left( x-\overline{x}\right) ^{2}}$ and ${f\left( x-\overline{x} \right) ^{2}}$
We have:
x | f | ${x-\overline{x}}$ | ${\left( x-\overline{x}\right) ^{2}}$ | ${f\left( x-\overline{x} \right) ^{2}}$ |
---|---|---|---|---|
5 | 5 | -23.4 | 547.56 | 2737.8 |
15 | 8 | -13.4 | 179.56 | 1436.48 |
25 | 12 | -3.4 | 11.56 | 138.72 |
35 | 15 | 6.6 | 43.56 | 653.4 |
45 | 10 | 16.6 | 275.56 | 2755.6 |
Thus, ${\sum f\left( x-\overline{x} \right) ^{2}}$ = 7721.6
Calculating Sample Standard Deviation
${s=\sqrt{\dfrac{\sum f\left( x_{i}-\overline{x}\right) ^{2}}{n-1}}}$
= ${\sqrt{\dfrac{7721.6}{50-1}}}$
= ${\sqrt{\dfrac{7721.6}{49}}}$
= ${\sqrt{157.58}}$ ≈ 12.55
Thus, the sample standard deviation of the grouped data is 12.55
Below is a summary of the formulas for finding the population and sample standard deviation of a dataset:
The monthly sales (in thousands of dollars) for a small business over 5 months are: 12, 15, 20, 25, 30. Find the sample standard deviation.
Here,
The sample mean is ${\overline{x}}$ = ${\dfrac{12+15+20+25+30}{5}}$ = 20.4
The squared differences are:
(12 – 20.4)2 = (-8.4)2 = 70.56
(15 – 20.4)2 = (-5.4)2 = 29.16
(20 – 20.4)2 = (-0.4)2 = 0.16
(25 – 20.4)2 = (4.6)2 = 21.16
(30 – 20.4)2 = (9.6)2 = 92.16
The sum of the squared differences is ${\sum \left( x_{i}-\overline{x}\right) ^{2}}$ = 70.56 + 29.16 + 0.16 + 21.16 + 92.16 = 213.2
Now, the sample standard deviation is ${s=\sqrt{\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}}$
= ${\sqrt{\dfrac{213.2}{5-1}}}$ = ${\sqrt{\dfrac{213.2}{4}}}$ = ${\sqrt{53.3}}$ ≈ 7.3
Thus, the sample standard deviation is 7.3 thousand dollars.
The test scores for a group of students are: 85, 90, 88, 92, 87, 89, 91, 86. Calculate the population standard deviation.
Here,
The population mean is μ = ${\dfrac{85+90+88+92+87+89+91+86}{8}}$ = 88.5
The squared differences are:
(85 – 88.5)2 = (-3.5)2 = 12.25
(90 – 88.5)2 = (1.5)2 = 2.25
(88 – 88.5)2 = (-0.5)2 = 0.25
(92 – 88.5)2 = (3.5)2 = 12.25
(87 – 88.5)2 = (-1.5)2 = 2.25
(89 – 88.5)2 = (0.5)2 = 0.25
(91 – 88.5)2 = (2.5)2 = 6.25
(86 – 88.5)2 = (-2.5)2 = 6.25
The sum of the squared differences is ${\sum \left( x_{i}-\mu \right) ^{2}}$ = 12.25 + 2.25 + 0.25 + 12.25 + 2.25 + 0.25 + 6.25 + 6.25 = 42
Now, the sample standard deviation is ${\sigma =\sqrt{\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}}$
= ${\sqrt{\dfrac{42}{8}}}$ = ${\sqrt{5.25}}$ ≈ 2.29
Thus, the population standard deviation is 2.29.
Last modified on December 20th, 2024