Last modified on December 20th, 2024

Variance

‘Variance’ refers to the spread or dispersion of a dataset in relation to its mean value. A lower variance means the data set is close to its mean, whereas a greater variance indicates a larger dispersion.

Mathematically, it is expressed as the average of the squared differences between each data point and the mean of the dataset. 

It is generally represented by the symbol σ2.

Similar to standard deviation, variance can be analyzed for ungrouped data (individual data points) and grouped data (data organized in intervals with frequencies).

For Ungrouped Data

There are two types of variance based on the type of data set being analyzed. They are population variance and sample variance.

Population Variance

Population variation refers to the dispersion of an entire dataset. It includes every member of the group or every possible observation. It is denoted by sigma squared σ².

Mathematically, the formula for finding the population variance of a given dataset is:

${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$

Here,

  • μ = population mean
  • N = total number of observations

Sample Variance

When the population data is very large, calculating the variance directly becomes difficult. In such cases, a sample is taken from the dataset, and the variance calculated from this sample is called the sample variance. It represents only a part of the population and helps estimate the overall variance.

Sigma variance is often represented by the symbol s2

It can be obtained using the formula:

${s^{2}=\dfrac{\sum \left( x_{i}-\overline{x}\right) ^{2}}{n-1}}$

Here,

  • ${\overline{x}}$ = sample mean
  • n = total number of observations

Finding the Variances for the Ungrouped Dataset X = {3, 5, 7, 9, 11}

Calculating the Mean 

Mean = ${\dfrac{3+5+7+9+11}{5}}$ = 7

Finding the Squared Differences

(3 – 7)2 = 16

(5 – 7)2 = 4

(7 – 7)2 = 0

(9 – 7)2 = 4

(11 – 7)2 = 16

Computing the Variances

${\sigma ^{2}=\dfrac{16+4+0+4+16}{5}}$ = 8

${s^{2}=\dfrac{16+4+0+4+16}{5-1}}$ = 10

Thus, the population variance is 8, and the sample variance is 10.

For Grouped Data

However, for grouped data, the variance is determined by considering the frequency of each data point or group.

Population Variance

Here, the formula for calculating the population variation is:

${\sigma ^{2}=\dfrac{\sum f\left( m_{i}-\mu \right) ^{2}}{n}}$

Here

  • f = frequency of each interval
  • mi = midpoint of the ith interval
  • μ = population mean of the grouped data

Sample Variance

The formula for calculating the sample variation of grouped data is:

${s^{2}=\dfrac{\sum f\left( m_{i}-\overline{x}\right) ^{2}}{n-1}}$

Here

  • f = frequency of each interval
  • mi = midpoint of the ith interval
  • ${\overline{x}}$ = sample mean of the grouped data

Finding the Variances for the Grouped Dataset

Working Hours (Hours)Frequency (f)
30 – 354
35 – 406
40 – 458
45 – 5010
50 – 555

Finding the Midpoints

Working Hours (Hours)Frequency (f)Midpoint (x)
30 – 354${\dfrac{30+35}{2}}$ = 32.5
35 – 406${\dfrac{35+40}{2}}$ = 37.5 
40 – 458${\dfrac{40+45}{2}}$ = 42.5
45 – 5010${\dfrac{45+50}{2}}$ = 47.5
50 – 555${\dfrac{50+55}{2}}$ = 52.5

Finding the Mean

${\sum fx}$ 

= (4 × 32.5) + (6 × 37.5) + (8 × 42.5) + (10 × 47.5) + (5 × 52.5)  = 1432.5

${\sum f}$

= 4 + 6 + 8 + 10 + 5 = 33

Thus, μ or ${\overline{x}}$ = ${\dfrac{1432.5}{33}}$ = 43.41

Finding ${\sum f\left( x-\mu \right) ^{2}}$ or ${\sum f\left( x-\overline{x} \right) ^{2}}$ 

Working Hours (Hours)Frequency (f)Midpoint (x)x – μ or ${x-\overline{x}}$(x – μ)2 or ${\left( x-\overline{x}\right) ^{2}}$f(x – μ)2 or ${f\left( x-\overline{x} \right) ^{2}}$
30 – 35432.532.5 – 43.41 = -10.91119.064 × 119.06 = 476.24
35 – 40637.5 37.5 – 43.41 = -5.9134.916 × 34.91 = 209.46
40 – 45842.542.5 – 43.41 = -0.910.838 × 0.83 = 6.64
45 – 501047.547.5 – 43.41 = 4.0916.7310 × 16.73 = 167.30
50 – 55552.552.5 – 43.41 = 9.0982.645 × 82.64 = 413.20

Thus, ${\sum f\left( x-\mu \right) ^{2}}$ or ${\sum f\left( x-\overline{x} \right) ^{2}}$ = 1272.84

Computing the Variances

${\sigma ^{2}}$ = ${\dfrac{1272.84}{33}}$ = 38.57

${s^{2}}$ = ${\dfrac{1272.84}{33-1}}$ = ${\dfrac{4628.27}{32}}$ = 39.78

Thus, the population variance is 38.57, and the sample variance is 39.78.

Note: Since the variance involves squared differences, the result cannot be negative.

Properties

The variance of a random variable X follows the following properties.

  1. Var(X + c) = Var(X), where c is a constant.
  2. Var(c) = 0, where c is a constant.
  3. Var(cX) = c2 ⋅ Var(X), where c is a constant.
  4. Var(aX + b) = a2 ⋅ Var(X), where a and b are constants.
  5. If X1, X2,……., Xn are n independent random variables, then Var(X1 + X2 + … + Xn) = Var(X1) + Var(X2) + … +Var(Xn)

For Common Probability Distributions

The value of variance changes based on the type of probability distribution:

Normal Distribution

The variance of a normal distribution is:
${\sigma ^{2}=\int ^{\infty }_{-\infty }\left( x-\mu \right) ^{2}\cdot f\left( x\right) dx}$

Here

  • f(x) is the probability density function (PDF) of the normal distribution

Binomial Distribution

The variance is given by: 

σ2 = np(1 – p)

Here

  • n is the number of trials
  • p is the probability of success
  • 1 – p is the probability of failure

Poisson Distribution

The variance is: 

σ2 = λ

Here

  • λ is the average number of successes in a given time or space interval

Relation to Standard Deviation

Both variance and standard deviation indicate the dispersion of data points in a dataset by measuring their deviation from the mean.

Mathematically,

Variance = (Standard Deviation)2 

Standard deviation (σ) is expressed with the same units as the original data. Since variance (σ2) is the square of the standard deviation, it is thus expressed in squared units.

Relation to Covariance

While variance measures the spread of a single variable around its mean, covariance extends this concept to measure how two random variables change together. 

Notably, variance is a special case of covariance where both variables are the same (x = y).

Mathematically, 

Cov(x, x) = Var(x)

Solved Example

A company tracks the number of hours five employees spend on a specific project during a week. If the recorded data is H = {4, 6, 8, 10, 12}, find the variance of the number of hours worked.

Solution:

Given, 
H = {4, 6, 8, 10, 12}
Thus, 
Mean = ${\dfrac{4+6+8+10+12}{5}}$ = 8
The squared differences are:
(4 – 8)2 = 16
(6 – 8)2 = 4
(8 – 8)2 = 0
(10 – 8)2 = 4
(12 – 8)2 = 16
As we know, 
${\sigma ^{2}=\dfrac{\sum \left( x_{i}-\mu \right) ^{2}}{N}}$
Here, n = 5
Now, the variance is ${\dfrac{16+4+0+4+16}{5}}$ = 8
Thus, the variance of the number of hours worked is 8

Last modified on December 20th, 2024