Table of Contents
Last modified on January 2nd, 2025
‘Variance’ refers to the spread or dispersion of a dataset in relation to its mean value. A lower variance means the data set is close to its mean, whereas a greater variance indicates a larger dispersion.
Mathematically, it is expressed as the average of the squared differences between each data point and the mean of the dataset.
It is generally represented by the symbol σ2.
Similar to standard deviation, variance can be analyzed for ungrouped data (individual data points) and grouped data (data organized in intervals with frequencies).
There are two types of variance based on the type of data set being analyzed. They are population variance and sample variance.
Population variation refers to the dispersion of an entire dataset. It includes every member of the group or every possible observation. It is denoted by sigma squared σ².
Mathematically, the formula for finding the population variance of a given dataset is:
${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$
Here,
When the population data is very large, calculating the variance directly becomes difficult. In such cases, a sample is taken from the dataset, and the variance calculated from this sample is called the sample variance. It represents only a part of the population and helps estimate the overall variance.
Sigma variance is often represented by the symbol s2
It can be obtained using the formula:
${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$
Here,
Finding the Variances for the Ungrouped Dataset X = {3, 5, 7, 9, 11}
Calculating the Mean
Mean = ${\dfrac{3+5+7+9+11}{5}}$ = 7
Finding the Squared Differences
(3 – 7)2 = 16
(5 – 7)2 = 4
(7 – 7)2 = 0
(9 – 7)2 = 4
(11 – 7)2 = 16
Computing the Variances
${\sigma ^{2}=\dfrac{16+4+0+4+16}{5}}$ = 8
${s^{2}=\dfrac{16+4+0+4+16}{5-1}}$ = 10
Thus, the population variance is 8, and the sample variance is 10.
However, for grouped data, the variance is determined by considering the frequency of each data point or group.
Here, the formula for calculating the population variation is:
${\sigma ^{2}=\dfrac{\sum f\left( m_{i}-\mu \right) ^{2}}{n}}$
Here,
The formula for calculating the sample variation of grouped data is:
${s^{2}=\dfrac{\sum f\left( m_{i}-\overline{x}\right) ^{2}}{n-1}}$
Here,
Finding the Variances for the Grouped Dataset
Working Hours (Hours) | Frequency (f) |
---|---|
30 – 35 | 4 |
35 – 40 | 6 |
40 – 45 | 8 |
45 – 50 | 10 |
50 – 55 | 5 |
Finding the Midpoints
Working Hours (Hours) | Frequency (f) | Midpoint (x) |
---|---|---|
30 – 35 | 4 | ${\dfrac{30+35}{2}}$ = 32.5 |
35 – 40 | 6 | ${\dfrac{35+40}{2}}$ = 37.5 |
40 – 45 | 8 | ${\dfrac{40+45}{2}}$ = 42.5 |
45 – 50 | 10 | ${\dfrac{45+50}{2}}$ = 47.5 |
50 – 55 | 5 | ${\dfrac{50+55}{2}}$ = 52.5 |
Finding the Mean
${\sum fx}$
= (4 × 32.5) + (6 × 37.5) + (8 × 42.5) + (10 × 47.5) + (5 × 52.5) = 1432.5
${\sum f}$
= 4 + 6 + 8 + 10 + 5 = 33
Thus, μ or ${\overline{x}}$ = ${\dfrac{1432.5}{33}}$ = 43.41
Finding ${\sum f\left( x-\mu \right) ^{2}}$ or ${\sum f\left( x-\overline{x} \right) ^{2}}$
Working Hours (Hours) | Frequency (f) | Midpoint (x) | x – μ or ${x-\overline{x}}$ | (x – μ)2 or ${\left( x-\overline{x}\right) ^{2}}$ | f(x – μ)2 or ${f\left( x-\overline{x} \right) ^{2}}$ |
---|---|---|---|---|---|
30 – 35 | 4 | 32.5 | 32.5 – 43.41 = -10.91 | 119.06 | 4 × 119.06 = 476.24 |
35 – 40 | 6 | 37.5 | 37.5 – 43.41 = -5.91 | 34.91 | 6 × 34.91 = 209.46 |
40 – 45 | 8 | 42.5 | 42.5 – 43.41 = -0.91 | 0.83 | 8 × 0.83 = 6.64 |
45 – 50 | 10 | 47.5 | 47.5 – 43.41 = 4.09 | 16.73 | 10 × 16.73 = 167.30 |
50 – 55 | 5 | 52.5 | 52.5 – 43.41 = 9.09 | 82.64 | 5 × 82.64 = 413.20 |
Thus, ${\sum f\left( x-\mu \right) ^{2}}$ or ${\sum f\left( x-\overline{x} \right) ^{2}}$ = 1272.84
Computing the Variances
${\sigma ^{2}}$ = ${\dfrac{1272.84}{33}}$ = 38.57
${s^{2}}$ = ${\dfrac{1272.84}{33-1}}$ = ${\dfrac{4628.27}{32}}$ = 39.78
Thus, the population variance is 38.57, and the sample variance is 39.78.
Note: Since the variance involves squared differences, the result cannot be negative.
The variance of a random variable X follows the following properties.
The value of variance changes based on the type of probability distribution:
The variance of a normal distribution is:
${\sigma ^{2}=\int ^{\infty }_{-\infty }\left( x-\mu \right) ^{2}\cdot f\left( x\right) dx}$
Here,
The variance is given by:
σ2 = np(1 – p)
Here,
The variance is:
σ2 = λ
Here,
Both variance and standard deviation indicate the dispersion of data points in a dataset by measuring their deviation from the mean.
Mathematically,
Variance = (Standard Deviation)2
Standard deviation (σ) is expressed with the same units as the original data. Since variance (σ2) is the square of the standard deviation, it is thus expressed in squared units.
While variance measures the spread of a single variable around its mean, covariance extends this concept to measure how two random variables change together.
Notably, variance is a special case of covariance where both variables are the same (x = y).
Mathematically,
Cov(x, x) = Var(x)
A company tracks the number of hours five employees spend on a specific project during a week. If the recorded data is H = {4, 6, 8, 10, 12}, find the variance of the number of hours worked.
Given,
H = {4, 6, 8, 10, 12}
Thus,
Mean = ${\dfrac{4+6+8+10+12}{5}}$ = 8
The squared differences are:
(4 – 8)2 = 16
(6 – 8)2 = 4
(8 – 8)2 = 0
(10 – 8)2 = 4
(12 – 8)2 = 16
As we know,
${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$
Here, n = 5
Now, the variance is ${\dfrac{16+4+0+4+16}{5}}$ = 8
Thus, the variance of the number of hours worked is 8
Last modified on January 2nd, 2025