Last modified on December 20th, 2024

chapter outline

 

Covariance

Covariance is a statistical relationship between two random variables, showing how they change relative to each other with time. Accordingly, covariance can be of two types: positive and negative. 

They are used to analyze stock market trends, detect patterns in machines, and quantify income and expenditure. 

Types

Positive Covariance

When two variables have positive covariance, they move in the same direction. This means if one variable increases, the other variable also increases, and vice versa. 

Let us consider the relationship between weather temperature and ice cream sales. As the temperature rises, ice cream sales also increase, showing a positive covariance.

Negative Covariance

When two variables have negative covariance, they move in opposite directions. This means if one variable increases, the other tends to decrease, and vice versa.

For example, we can imagine the relationship between the speed of a car and the time taken to reach a destination. As the speed increases, the travel time decreases, showing a negative covariance.

Zero Covariance

A covariance close to zero means the variables are not linearly related.

Now, let us observe the number of books read by a student and their shoe size. There is likely no linear relationship between these variables, resulting in a covariance close to zero.

Formulas

Covariance is differentiated into population covariance and sample covariance based on whether we are considering the entire dataset or a subset of the entire population.  

If X and Y are two random variables, their covariance can be calculated using the following formulas:

Population Covariance

Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$

Here

  • xi = Individual value of
  • yi = Individual value of Y 
  • ${\overline{x}}$ = Mean of X 
  • ${\overline{y}}$ = Mean of Y 
  • N = Total number of observations in population covariance

Sample Covariance

Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{n-1}}$

Here

  • xi = Individual value of
  • yi = Individual value of Y 
  • ${\overline{x}}$ = Mean of X 
  • ${\overline{y}}$ = Mean of Y 
  • n = Total number of observations in sample covariance

Steps To Find

Let us find the population and sample covariance of the given dataset:

XY
23
45
67

Calculating the Means 

${\overline{x}}$ = ${\dfrac{2+4+6}{3}}$ = 4

${\overline{y}}$ = ${\dfrac{3+5+7}{3}}$ = 5

Calculating Deviations and Their Products

${x_{i}-\overline{x}}$${y_{i}-\overline{y}}$${\left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right)}$
-2-24
000
224

Calculating the Sum of Products

${\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right)}$ = 4 + 0 + 4 = 8

Using the Formulas

Now, the population covariance is 

Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$

= ${\dfrac{8}{N}}$ 

= ${\dfrac{8}{3}}$ = 2.67

The sample covariance is

Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{n-1}}$

= ${\dfrac{8}{n-1}}$

= ${\dfrac{8}{3-1}}$ = ${\dfrac{8}{2}}$ = 4

Thus, the population covariance is 2.67, and the sample covariance is 4

Here, the positive covariance values indicate a direct relationship between X and Y.

Properties

If X, Y, and Z are the random variables, and c is a constant, then the covariance follows the following properties:

  1. Cov(X, X) = Var(X) ≥ 0
  2. Cov(X, Y) = Cov(Y, X)
  3. Cov(cX, Y) = c ⋅ Cov(X, Y)
  4. Cov(X, cY) = c ⋅ Cov(X, Y)
  5. Cov(X, c) = 0
  6. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z)
  7. Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z)

Comparing

After understanding the basics of covariance, here are the similarities and differences between covariance and other statistical measures like correlation and variance.

Covariance vs Correlation

BasisCovarianceCorrelation
Measures Direction or StrengthMeasures the direction of variablesMeasures the strength and direction of variables
RangeAny value (positive, negative, or zero)Between -1 and 1
UnitProduct of units of the variablesUnitless
DenotationCov(x, y)Corr(x, y) = ${\dfrac{Cov\left( x,y\right) }{\sigma _{x}\sigma _{y}}}$

Covariance vs Variance

BasisCovarianceVariance
MeasuresMeasures the joint variability of two variables. It determines how two variables change together.Measures the variability of a single variable. It determines how far a variable deviates from its mean value.
RangeCan be either positive, negative, or zeroAlways non-negative 
UnitProduct of the units of the variablesSquare of the unit of the variable
DenotationCov(x, y)σ, s, or Var(x)

Covariance Matrix

A square matrix that summarizes the covariances between multiple variables is called a covariance matrix. 

For n variables, the matrix is:

${\Sigma =\begin{bmatrix} Cov\left( x_{1},x_{1}\right) & Cov\left( x_{1},x_{2}\right) & \ldots & Cov\left( x_{1},x_{n}\right) \\ Cov\left( x_{2},x_{1}\right) & Cov\left( x_{2},x_{2}\right) & \ldots & Cov\left( x_{2},x_{n}\right) \\ \vdots & \vdots & \ddots & \vdots \\ Cov\left( x_{n},x_{1}\right) & Cov\left( x_{n},x_{2}\right) & \ldots & Cov\left( x_{n},x_{n}\right) \end{bmatrix}}$

Here,

  • It is symmetric and positive semi-definite
  • Its main diagonal (sometimes a primary diagonal) contains variances

Solved Examples

Example 1: For the following data with two variables, X and Y, find their covariance.

XY
12
34
56
78

Here,

Mean of X is ${\dfrac{1+3+5+7}{4}}$ = ${\dfrac{16}{4}}$ = 4

Mean of Y is ${\dfrac{2+4+6+8}{4}}$ = ${\dfrac{20}{4}}$ = 5

Now,

X${X-\overline{X}}$Y${Y-\overline{Y}}$${\left( X-\overline{X}\right) \left( Y-\overline{Y}\right)}$
1-32-39
3-14-11
51611
73839

The sum of the product of deviations is 9 + 1 + 1 + 9 = 20

As we know, the covariance is Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$

Here,

Cov(X, Y) = ${\dfrac{20}{4}}$ = 5

Thus, the covariance is 5

Example 2: Consider two identical variables, X and Y:

XY
11
33
55

Show that covariance equals variance when X and Y are identical.

Since X = Y,

Means of X and Y are ${\dfrac{1+3+5}{3}}$ = ${\dfrac{9}{3}}$ = 3

Now,

X${X-\overline{X}}$Y${Y-\overline{Y}}$${\left( X-\overline{X}\right) \left( Y-\overline{Y}\right)}$
1-21-24
30300
52524

The sum of the product of deviations is 4 + 0 + 4 = 8

The sum of squared deviations of X is (-2)2 + (0)2 + (2)2 = 8

As we know, 

The covariance is Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$

The variance is Var(x) = ${\dfrac{\sum \left( x-\overline{x}\right) ^{2}}{N}}$

Here, 

Cov(X, Y) = ${\dfrac{8}{3}}$ = 2.67

Var(X) = ${\dfrac{8}{3}}$ = 2.67

Thus, covariance equals variance, which is 2.67.

Example 3: A financial analyst is studying the relationship between hours of study and exam scores for a group of students to determine if there is a positive association between them. The following data was collected from a sample of 5 students:

Hours of Study (X)Exam Scores (Y)
260
465
670
875
1080

Calculate the sample covariance between hours of study (X) and exam scores (Y).

As we know, the formula for sample covariance is:

Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{n-1}}$

Here,

${\overline{X}}$ = ${\dfrac{2+4+6+8+10}{5}}$ = ${\dfrac{30}{5}}$ = 6

${\overline{Y}}$ = ${\dfrac{60+65+70+75+80}{5}}$ = ${\dfrac{350}{5}}$ = 70

X${X-\overline{X}}$Y${Y-\overline{Y}}$${\left( X-\overline{X}\right) \left( Y-\overline{Y}\right)}$
2-460-1040
4-265-510
607000
8275510.
104801040

The sum of the product of deviations is 40 +10 + 0 + 10 + 40 = 100

The sample size n = 5

Thus, the sample covariance is Cov (X, Y) = ${\dfrac{100}{5-1}}$ = ${\dfrac{100}{4}}$ = 25

The sample covariance between hours of study and exam scores is 25, which means if the hours of study increase, the exam scores tend to increase as well.

Last modified on December 20th, 2024