Last modified on December 20th, 2024

chapter outline

 

Population and Sample Variance

Like standard deviation, we can also find variance for two different types of datasets based on the sample size. They are called population variance and sample variance. 

We use population variance when we take all of the data in the dataset under consideration, whereas we use sample variance when we consider only a subset of the total population.

Let us now discuss them in detail.

Population Variance

Population variance measures the dispersion of data points across an entire population. It is represented by the Greek letter sigma squared (σ²)

In statistics, data can be ungrouped (raw) or grouped data (well-organized). We can calculate the variance for each type of data.

For Ungrouped Data

Mathematically, the formula to find the population variance is:

${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$

Here,

  • N = Total number of observations
  • xi = Individual data point
  • μ = Population mean

Steps To Find

Let us find the population variance for the data points X = {2, 4, 6, 8}

To find the population variance of this ungrouped data, we follow the following steps:

Finding the Mean

${\mu =\dfrac{2+4+6+8}{4}}$ = 5

Calculating (xi – μ)2 

(2 – 5)2 = 9

(4 – 5)2 = 1

(6 – 5)2 = 1

(8 – 5)2 = 9

Finding the Sum of Squares

${\sum \left( x_{i}-\mu \right) ^{2}}$ = 9 + 1 + 1 + 9 = 20

Dividing by N = 4

σ² = ${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$ = ${\dfrac{20}{4}=5}$

Thus, the population variance is 5.

For Grouped Data

For grouped data, the population variance is calculated using the formula:

${\sigma ^{2}=\dfrac{\sum f\left( x_{i}-\mu \right) ^{2}}{n}}$

Here

  • f = frequency of each interval
  • xi = midpoint of the ith interval
  • μ = population mean of the grouped data

Steps To Find

Let us calculate the population variance of the survey that records the ages of 35 individuals in a community.

Age Group (Years)Frequency (f)
10 – 205
20 – 308
30 – 4012
40 – 507
50 – 603

Finding the Midpoints

Age Group (Years)Frequency (f)Midpoint (x)
10 – 205${\dfrac{10+20}{2}}$ = 15
20 – 308${\dfrac{20+30}{2}}$ = 25
30 – 4012${\dfrac{30+40}{2}}$ = 35
40 – 507${\dfrac{40+50}{2}}$ = 45
50 – 603${\dfrac{50+60}{2}}$ = 55

Finding the Mean

The mean for grouped data is calculated as: ${\mu =\dfrac{\sum fx}{\sum f}}$

${\sum fx}$ 

= (5 × 15) + (8 × 25) + (12 × 35) + (7 × 45) + (3 × 55) 

= 75 + 200 + 420 + 315 + 165 

= 1175

${\sum f}$

= 5 + 8 + 12 + 7 + 3

= 35

Thus, μ = ${\dfrac{1175}{35}}$ = 33.57

Finding (x – μ)2 and f(x – μ)2 

xfx – μ(x – μ)2f(x – μ)2
15515 – 33.57 = -18.57344.935 × 344.93 = 1724.65
25825 – 33.57 = -8.5773.458 × 73.45 = 587.60
351235 – 33.57 = 1.432.0512 × 2.05 = 24.60
45745 – 33.57 = 11.43130.687 × 130.68 = 914.76
55355 – 33.57 = 21.43459.223 × 459.22 = 1377.66

Thus, ${\sum f\left( x-\mu \right) ^{2}}$ = 1724.65 + 587.60 + 24.60 + 914.76 + 1377.66 = 4628.27

Calculating Population Variance

${\sigma ^{2}=\dfrac{\sum f\left( x_{i}-\mu \right) ^{2}}{n}}$

= ${\dfrac{4628.27}{35}}$ = 132.24

Thus, the population variance of the grouped data is 132.24

Sample Variance

Sample variance measures variability when the data represents a subset (sample) of the total population. To avoid any biases, we use the correction factor, n – 1, known as Bessel’s correction. 

The sample variance is represented by the letter s² 

Note: This adjustment improves the accuracy by correcting the variability in small samples and making the sample variance a better approximation of the population variance.

For Ungrouped Data

Mathematically, sample variance can be obtained using the formula:

${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$

Here,

  • n = Total number of observations
  • xi = Individual data point
  • ${\overline{x}}$ = Sample mean

Steps To Find

Let us consider the data points X = {2, 4, 6, 8}

To find the sample variance, we follow the following steps:

Finding the Mean

${\overline{x}}$ = ${\dfrac{2+4+6+8}{4}}$ = 5

Calculating ${\left( x_{i}-\overline{x}\right) ^{2}}$

(2 – 5)2 = 9

(4 – 5)2 = 1

(6 – 5)2 = 1

(8 – 5)2 = 9

Finding the Sum of Squares

${\sum \left( x_{i}-\overline{x}\right) ^{2}}$ = 9 + 1 + 1 + 9 = 20

Dividing by n – 1

Here, n = 4 ⇒ n – 1 = 4 – 1 = 3

Now, 

${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$ = ${\dfrac{20}{3}}$ ≈ 6.67

Thus, the sample variance is 6.67

For Grouped Data

Similarly, for grouped data, it is calculated by the formula:

${s^{2}=\dfrac{\sum f\left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$

Here

  • f = frequency of each interval
  • xi = midpoint of the ith interval
  • ${\overline{x}}$ = sample mean of the grouped data

Steps To Find

Now, let us calculate the sample variance from the survey of the ages of 35 individuals in a community.

Age Group (Years)Frequency (f)
10 – 205
20 – 308
30 – 4012
40 – 507
50 – 603

Finding the Midpoints

We have:

Age Group (Years)Frequency (f)Midpoint (x)
10 – 20515
20 – 30825
30 – 401235
40 – 50745
50 – 60355

Finding the Mean

${\sum fx}$ 

= (5 × 15) + (8 × 25) + (12 × 35) + (7 × 45) + (3 × 55) 

= 1175

${\sum f}$

= 5 + 8 + 12 + 7 + 3

= 35

Thus, ${\overline{x}}$ = ${\dfrac{1175}{35}}$ = 33.57

Finding ${\left( x-\overline{x}\right) ^{2}}$ and ${f\left( x-\overline{x} \right) ^{2}}$ 

We have:

xf${x-\overline{x}}$${\left( x-\overline{x}\right) ^{2}}$ ${f\left( x-\overline{x} \right) ^{2}}$ 
155-18.57344.931724.65
258-8.5773.45587.60
35121.432.0524.60
45711.43130.68914.76
55321.43459.221377.66

Thus, ${\sum f\left( x-\overline{x} \right) ^{2}}$ = 4628.27

Calculating Sample Variance

${s^{2}=\dfrac{\sum f\left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$

= ${\dfrac{4628.27}{35-1}}$ 

= ${\dfrac{4628.27}{34}}$ = 136.42

Thus, the sample variance of the grouped data is 136.42

Note: We observe that the sample variance is greater than the population variance for each dataset for both grouped and ungrouped data. 

Properties

Non-Negativity

Variance is always non-negative because it is the average of squared deviations. Mathematically:  

  • σ2 ≥ 0
  • s2 ≥ 0

Zero Variance 

If all sample data points in a population or a sample are identical, the variance equals 0. It means 

σ2 = 0 when xi = μ, ∀ i

s2 = 0 when xi = ${\overline{x}}$, ∀ i

Units 

Variance is measured in squared units of the data. For example, if data is measured in meters, the population and sample variances are in square meters.

Adding a Constant 

If a constant c is added to all data points, the population and sample variance remain unchanged. It means 

Var(xi + c) = Var(xi)

Multiplying by a constant 

If all data points are multiplied by a constant c, the population and sample variances are scaled by c2

Var(c ⋅ xi) = c2 ⋅ Var(xi)

Additivity

For independent random variables X and Y, the variance of their sum is: 

Var(X + Y) = Var(X) + Var(Y)

Solved Examples

A company measures the heights (in cm) of 5 employees in a department. The data is as follows: 150, 160, 170, 180, 190. Find the population variance.

Solution:

Here,
Mean = μ = ${\dfrac{150+160+170+180+190}{5}}$ = 170
The square differences = (xi – μ)2 
(150 – 170)2 = (-20)2 = 400
(160 – 170)2 = (-10)2 = 100
(170 – 170)2 = (0)2 = 0
(180 – 170)2 = (10)2 = 100
(190 – 170)2 = (20)2 = 400
The sum of the square differences = ${\sum \left( x_{i}-\mu \right) ^{2}}$ = 400 + 100 + 0 + 100 + 400 = 1000
As we know, the population variance is 
σ² = ${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$ 
Now, by using the formula, we get
= ${\dfrac{1000}{5}}$ = 200
Thus, the population variance is 200 cm².

A researcher randomly selects 4 students’ scores from a class: 12, 14, 16, 18. Find the sample variance.

Solution:

Here,
Mean = ${\overline{x}}$ = ${\dfrac{12+14+16+18}{4}}$ = 15
The square differences = ${\left( x_{i}-\overline{x}\right) ^{2}}$
(12 – 15)2 = (-3)2 = 9
(14 – 15)2 = (-1)2 = 1
(16 – 15)2 = (1)2 = 1
(18 – 15)2 = (3)2 = 9
The sum of the square differences = ${\sum \left( x_{i}-\overline{x}\right) ^{2}}$ = 9 + 1 + 1 + 9 = 20
As we know, the sample variance is 
${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$ 
Now, by using the formula, we get
= ${\dfrac{20}{4-1}}$ ≈ 6.67
Thus, the sample variance is 6.67.

Last modified on December 20th, 2024