Table of Contents

Last modified on January 2nd, 2025

chapter outline

Variance

‘Variance’ refers to the spread or dispersion of a dataset in relation to its mean value. A lower variance means the data set is close to its mean, whereas a greater variance indicates a larger dispersion.

Mathematically, it is expressed as the average of the squared differences between each data point and the mean of the dataset.

It is generally represented by the symbol σ².

Similar to standard deviation, variance can be analyzed for ungrouped data (individual data points) and grouped data (data organized in intervals with frequencies).

For Ungrouped Data

There are two types of variance based on the type of data set being analyzed. They are population variance and sample variance.

Population Variance

Population variation refers to the dispersion of an entire dataset. It includes every member of the group or every possible observation. It is denoted by sigma squared σ².

Mathematically, the formula for finding the population variance of a given dataset is:

${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$

Here,

μ = population mean
N = total number of observations

Sample Variance

When the population data is very large, calculating the variance directly becomes difficult. In such cases, a sample is taken from the dataset, and the variance calculated from this sample is called the sample variance. It represents only a part of the population and helps estimate the overall variance.

Sigma variance is often represented by the symbol s²

It can be obtained using the formula:

${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$

Here,

${\overline{x}}$ = sample mean
n = total number of observations

Finding the Variances for the Ungrouped Dataset X = {3, 5, 7, 9, 11}

Calculating the Mean

Mean = ${\dfrac{3+5+7+9+11}{5}}$ = 7

Finding the Squared Differences

(3 – 7)² = 16

(5 – 7)² = 4

(7 – 7)² = 0

(9 – 7)² = 4

(11 – 7)² = 16

Computing the Variances

${\sigma ^{2}=\dfrac{16+4+0+4+16}{5}}$ = 8

${s^{2}=\dfrac{16+4+0+4+16}{5-1}}$ = 10

Thus, the population variance is 8, and the sample variance is 10.

For Grouped Data

However, for grouped data, the variance is determined by considering the frequency of each data point or group.

Population Variance

Here, the formula for calculating the population variation is:

${\sigma ^{2}=\dfrac{\sum f\left( m_{i}-\mu \right) ^{2}}{n}}$

Here,

f = frequency of each interval
m_i = midpoint of the i^th interval
μ = population mean of the grouped data

Sample Variance

The formula for calculating the sample variation of grouped data is:

${s^{2}=\dfrac{\sum f\left( m_{i}-\overline{x}\right) ^{2}}{n-1}}$

Here,

f = frequency of each interval
m_i = midpoint of the i^th interval
${\overline{x}}$ = sample mean of the grouped data

Finding the Variances for the Grouped Dataset

Working Hours (Hours)	Frequency (f)
30 – 35	4
35 – 40	6
40 – 45	8
45 – 50	10
50 – 55	5

Finding the Midpoints

Working Hours (Hours)	Frequency (f)	Midpoint (x)
30 – 35	4	${\dfrac{30+35}{2}}$ = 32.5
35 – 40	6	${\dfrac{35+40}{2}}$ = 37.5
40 – 45	8	${\dfrac{40+45}{2}}$ = 42.5
45 – 50	10	${\dfrac{45+50}{2}}$ = 47.5
50 – 55	5	${\dfrac{50+55}{2}}$ = 52.5

Finding the Mean

${\sum fx}$

= (4 × 32.5) + (6 × 37.5) + (8 × 42.5) + (10 × 47.5) + (5 × 52.5) = 1432.5

${\sum f}$

= 4 + 6 + 8 + 10 + 5 = 33

Thus, μ or ${\overline{x}}$ = ${\dfrac{1432.5}{33}}$ = 43.41

Finding ${\sum f\left( x-\mu \right) ^{2}}$ or ${\sum f\left( x-\overline{x} \right) ^{2}}$

Working Hours (Hours)	Frequency (f)	Midpoint (x)	x – μ or ${x-\overline{x}}$	(x – μ)² or ${\left( x-\overline{x}\right) ^{2}}$	f(x – μ)² or ${f\left( x-\overline{x} \right) ^{2}}$
30 – 35	4	32.5	32.5 – 43.41 = -10.91	119.06	4 × 119.06 = 476.24
35 – 40	6	37.5	37.5 – 43.41 = -5.91	34.91	6 × 34.91 = 209.46
40 – 45	8	42.5	42.5 – 43.41 = -0.91	0.83	8 × 0.83 = 6.64
45 – 50	10	47.5	47.5 – 43.41 = 4.09	16.73	10 × 16.73 = 167.30
50 – 55	5	52.5	52.5 – 43.41 = 9.09	82.64	5 × 82.64 = 413.20

Thus, ${\sum f\left( x-\mu \right) ^{2}}$ or ${\sum f\left( x-\overline{x} \right) ^{2}}$ = 1272.84

Computing the Variances

${\sigma ^{2}}$ = ${\dfrac{1272.84}{33}}$ = 38.57

${s^{2}}$ = ${\dfrac{1272.84}{33-1}}$ = ${\dfrac{4628.27}{32}}$ = 39.78

Thus, the population variance is 38.57, and the sample variance is 39.78.

Note: Since the variance involves squared differences, the result cannot be negative.

Properties

The variance of a random variable X follows the following properties.

Var(X + c) = Var(X), where c is a constant.
Var(c) = 0, where c is a constant.
Var(cX) = c² ⋅ Var(X), where c is a constant.
Var(aX + b) = a² ⋅ Var(X), where a and b are constants.
If X₁, X₂,……., X_n are n independent random variables, then Var(X₁ + X₂ + … + X_n) = Var(X₁) + Var(X₂) + … +Var(X_n)

For Common Probability Distributions

The value of variance changes based on the type of probability distribution:

Normal Distribution

The variance of a normal distribution is:
${\sigma ^{2}=\int ^{\infty }_{-\infty }\left( x-\mu \right) ^{2}\cdot f\left( x\right) dx}$

Here,

f(x) is the probability density function (PDF) of the normal distribution

Binomial Distribution

The variance is given by:

σ² = np(1 – p)

Here,

n is the number of trials
p is the probability of success
1 – p is the probability of failure

Poisson Distribution

The variance is:

σ² = λ

Here,

λ is the average number of successes in a given time or space interval

Relation to Standard Deviation

Both variance and standard deviation indicate the dispersion of data points in a dataset by measuring their deviation from the mean.

Mathematically,

Variance = (Standard Deviation)²

Standard deviation (σ) is expressed with the same units as the original data. Since variance (σ²) is the square of the standard deviation, it is thus expressed in squared units.

Relation to Covariance

While variance measures the spread of a single variable around its mean, covariance extends this concept to measure how two random variables change together.

Notably, variance is a special case of covariance where both variables are the same (x = y).

Mathematically,

Cov(x, x) = Var(x)

Solved Example

A company tracks the number of hours five employees spend on a specific project during a week. If the recorded data is H = {4, 6, 8, 10, 12}, find the variance of the number of hours worked.

Solution:

Given,
H = {4, 6, 8, 10, 12}
Thus,
Mean = ${\dfrac{4+6+8+10+12}{5}}$ = 8
The squared differences are:
(4 – 8)² = 16
(6 – 8)² = 4
(8 – 8)² = 0
(10 – 8)² = 4
(12 – 8)² = 16
As we know,
${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$
Here, n = 5
Now, the variance is ${\dfrac{16+4+0+4+16}{5}}$ = 8
Thus, the variance of the number of hours worked is 8

More Resources

Last modified on January 2nd, 2025

chapter outline

Variance

For Ungrouped Data

Population Variance

Sample Variance

For Grouped Data

Population Variance

Sample Variance

Properties

For Common Probability Distributions

Normal Distribution

Binomial Distribution

Poisson Distribution

Relation to Standard Deviation

Relation to Covariance

Solved Example

Categories

Grades

Join Our Newsletter

#ezw_tco-2 .ez-toc-title{ font-size: 120%; ; ; } #ezw_tco-2 .ez-toc-widget-container ul.ez-toc-list li.active{ background-color: #ededed; } chapter outline

Variance

For Ungrouped Data

Population Variance

Sample Variance

For Grouped Data

Population Variance

Sample Variance

Properties

For Common Probability Distributions

Normal Distribution

Binomial Distribution

Poisson Distribution

Relation to Standard Deviation

Relation to Covariance

Solved Example

Categories

Grades

Join Our Newsletter

chapter outline