Table of Contents
Last modified on December 20th, 2024
Like standard deviation, we can also find variance for two different types of datasets based on the sample size. They are called population variance and sample variance.
We use population variance when we take all of the data in the dataset under consideration, whereas we use sample variance when we consider only a subset of the total population.
Let us now discuss them in detail.
Population variance measures the dispersion of data points across an entire population. It is represented by the Greek letter sigma squared (σ²)
In statistics, data can be ungrouped (raw) or grouped data (well-organized). We can calculate the variance for each type of data.
Mathematically, the formula to find the population variance is:
${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$
Here,
Let us find the population variance for the data points X = {2, 4, 6, 8}
To find the population variance of this ungrouped data, we follow the following steps:
Finding the Mean
${\mu =\dfrac{2+4+6+8}{4}}$ = 5
Calculating (xi – μ)2
(2 – 5)2 = 9
(4 – 5)2 = 1
(6 – 5)2 = 1
(8 – 5)2 = 9
Finding the Sum of Squares
${\sum \left( x_{i}-\mu \right) ^{2}}$ = 9 + 1 + 1 + 9 = 20
Dividing by N = 4
σ² = ${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$ = ${\dfrac{20}{4}=5}$
Thus, the population variance is 5.
For grouped data, the population variance is calculated using the formula:
${\sigma ^{2}=\dfrac{\sum f\left( x_{i}-\mu \right) ^{2}}{n}}$
Here,
Let us calculate the population variance of the survey that records the ages of 35 individuals in a community.
Age Group (Years) | Frequency (f) |
---|---|
10 – 20 | 5 |
20 – 30 | 8 |
30 – 40 | 12 |
40 – 50 | 7 |
50 – 60 | 3 |
Finding the Midpoints
Age Group (Years) | Frequency (f) | Midpoint (x) |
---|---|---|
10 – 20 | 5 | ${\dfrac{10+20}{2}}$ = 15 |
20 – 30 | 8 | ${\dfrac{20+30}{2}}$ = 25 |
30 – 40 | 12 | ${\dfrac{30+40}{2}}$ = 35 |
40 – 50 | 7 | ${\dfrac{40+50}{2}}$ = 45 |
50 – 60 | 3 | ${\dfrac{50+60}{2}}$ = 55 |
Finding the Mean
The mean for grouped data is calculated as: ${\mu =\dfrac{\sum fx}{\sum f}}$
${\sum fx}$
= (5 × 15) + (8 × 25) + (12 × 35) + (7 × 45) + (3 × 55)
= 75 + 200 + 420 + 315 + 165
= 1175
${\sum f}$
= 5 + 8 + 12 + 7 + 3
= 35
Thus, μ = ${\dfrac{1175}{35}}$ = 33.57
Finding (x – μ)2 and f(x – μ)2
x | f | x – μ | (x – μ)2 | f(x – μ)2 |
---|---|---|---|---|
15 | 5 | 15 – 33.57 = -18.57 | 344.93 | 5 × 344.93 = 1724.65 |
25 | 8 | 25 – 33.57 = -8.57 | 73.45 | 8 × 73.45 = 587.60 |
35 | 12 | 35 – 33.57 = 1.43 | 2.05 | 12 × 2.05 = 24.60 |
45 | 7 | 45 – 33.57 = 11.43 | 130.68 | 7 × 130.68 = 914.76 |
55 | 3 | 55 – 33.57 = 21.43 | 459.22 | 3 × 459.22 = 1377.66 |
Thus, ${\sum f\left( x-\mu \right) ^{2}}$ = 1724.65 + 587.60 + 24.60 + 914.76 + 1377.66 = 4628.27
Calculating Population Variance
${\sigma ^{2}=\dfrac{\sum f\left( x_{i}-\mu \right) ^{2}}{n}}$
= ${\dfrac{4628.27}{35}}$ = 132.24
Thus, the population variance of the grouped data is 132.24
Sample variance measures variability when the data represents a subset (sample) of the total population. To avoid any biases, we use the correction factor, n – 1, known as Bessel’s correction.
The sample variance is represented by the letter s²
Note: This adjustment improves the accuracy by correcting the variability in small samples and making the sample variance a better approximation of the population variance.
Mathematically, sample variance can be obtained using the formula:
${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$
Here,
Let us consider the data points X = {2, 4, 6, 8}
To find the sample variance, we follow the following steps:
Finding the Mean
${\overline{x}}$ = ${\dfrac{2+4+6+8}{4}}$ = 5
Calculating ${\left( x_{i}-\overline{x}\right) ^{2}}$
(2 – 5)2 = 9
(4 – 5)2 = 1
(6 – 5)2 = 1
(8 – 5)2 = 9
Finding the Sum of Squares
${\sum \left( x_{i}-\overline{x}\right) ^{2}}$ = 9 + 1 + 1 + 9 = 20
Dividing by n – 1
Here, n = 4 ⇒ n – 1 = 4 – 1 = 3
Now,
${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$ = ${\dfrac{20}{3}}$ ≈ 6.67
Thus, the sample variance is 6.67
Similarly, for grouped data, it is calculated by the formula:
${s^{2}=\dfrac{\sum f\left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$
Here,
Now, let us calculate the sample variance from the survey of the ages of 35 individuals in a community.
Age Group (Years) | Frequency (f) |
---|---|
10 – 20 | 5 |
20 – 30 | 8 |
30 – 40 | 12 |
40 – 50 | 7 |
50 – 60 | 3 |
Finding the Midpoints
We have:
Age Group (Years) | Frequency (f) | Midpoint (x) |
---|---|---|
10 – 20 | 5 | 15 |
20 – 30 | 8 | 25 |
30 – 40 | 12 | 35 |
40 – 50 | 7 | 45 |
50 – 60 | 3 | 55 |
Finding the Mean
${\sum fx}$
= (5 × 15) + (8 × 25) + (12 × 35) + (7 × 45) + (3 × 55)
= 1175
${\sum f}$
= 5 + 8 + 12 + 7 + 3
= 35
Thus, ${\overline{x}}$ = ${\dfrac{1175}{35}}$ = 33.57
Finding ${\left( x-\overline{x}\right) ^{2}}$ and ${f\left( x-\overline{x} \right) ^{2}}$
We have:
x | f | ${x-\overline{x}}$ | ${\left( x-\overline{x}\right) ^{2}}$ | ${f\left( x-\overline{x} \right) ^{2}}$ |
---|---|---|---|---|
15 | 5 | -18.57 | 344.93 | 1724.65 |
25 | 8 | -8.57 | 73.45 | 587.60 |
35 | 12 | 1.43 | 2.05 | 24.60 |
45 | 7 | 11.43 | 130.68 | 914.76 |
55 | 3 | 21.43 | 459.22 | 1377.66 |
Thus, ${\sum f\left( x-\overline{x} \right) ^{2}}$ = 4628.27
Calculating Sample Variance
${s^{2}=\dfrac{\sum f\left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$
= ${\dfrac{4628.27}{35-1}}$
= ${\dfrac{4628.27}{34}}$ = 136.42
Thus, the sample variance of the grouped data is 136.42
Note: We observe that the sample variance is greater than the population variance for each dataset for both grouped and ungrouped data.
Variance is always non-negative because it is the average of squared deviations. Mathematically:
If all sample data points in a population or a sample are identical, the variance equals 0. It means
σ2 = 0 when xi = μ, ∀ i
s2 = 0 when xi = ${\overline{x}}$, ∀ i
Variance is measured in squared units of the data. For example, if data is measured in meters, the population and sample variances are in square meters.
If a constant c is added to all data points, the population and sample variance remain unchanged. It means
Var(xi + c) = Var(xi)
If all data points are multiplied by a constant c, the population and sample variances are scaled by c2:
Var(c ⋅ xi) = c2 ⋅ Var(xi)
For independent random variables X and Y, the variance of their sum is:
Var(X + Y) = Var(X) + Var(Y)
A company measures the heights (in cm) of 5 employees in a department. The data is as follows: 150, 160, 170, 180, 190. Find the population variance.
Here,
Mean = μ = ${\dfrac{150+160+170+180+190}{5}}$ = 170
The square differences = (xi – μ)2
(150 – 170)2 = (-20)2 = 400
(160 – 170)2 = (-10)2 = 100
(170 – 170)2 = (0)2 = 0
(180 – 170)2 = (10)2 = 100
(190 – 170)2 = (20)2 = 400
The sum of the square differences = ${\sum \left( x_{i}-\mu \right) ^{2}}$ = 400 + 100 + 0 + 100 + 400 = 1000
As we know, the population variance is
σ² = ${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$
Now, by using the formula, we get
= ${\dfrac{1000}{5}}$ = 200
Thus, the population variance is 200 cm².
A researcher randomly selects 4 students’ scores from a class: 12, 14, 16, 18. Find the sample variance.
Here,
Mean = ${\overline{x}}$ = ${\dfrac{12+14+16+18}{4}}$ = 15
The square differences = ${\left( x_{i}-\overline{x}\right) ^{2}}$
(12 – 15)2 = (-3)2 = 9
(14 – 15)2 = (-1)2 = 1
(16 – 15)2 = (1)2 = 1
(18 – 15)2 = (3)2 = 9
The sum of the square differences = ${\sum \left( x_{i}-\overline{x}\right) ^{2}}$ = 9 + 1 + 1 + 9 = 20
As we know, the sample variance is
${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$
Now, by using the formula, we get
= ${\dfrac{20}{4-1}}$ ≈ 6.67
Thus, the sample variance is 6.67.
Last modified on December 20th, 2024