Table of Contents
Last modified on January 2nd, 2025
Covariance is a statistical relationship between two random variables, showing how they change relative to each other with time. Accordingly, covariance can be of two types: positive and negative.
They are used to analyze stock market trends, detect patterns in machines, and quantify income and expenditure.
When two variables have positive covariance, they move in the same direction. This means if one variable increases, the other variable also increases, and vice versa.
Let us consider the relationship between weather temperature and ice cream sales. As the temperature rises, ice cream sales also increase, showing a positive covariance.
When two variables have negative covariance, they move in opposite directions. This means if one variable increases, the other tends to decrease, and vice versa.
For example, we can imagine the relationship between the speed of a car and the time taken to reach a destination. As the speed increases, the travel time decreases, showing a negative covariance.
A covariance close to zero means the variables are not linearly related.
Now, let us observe the number of books read by a student and their shoe size. There is likely no linear relationship between these variables, resulting in a covariance close to zero.
Covariance is differentiated into population covariance and sample covariance based on whether we are considering the entire dataset or a subset of the entire population.
If X and Y are two random variables, their covariance can be calculated using the following formulas:
Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$
Here,
Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{n-1}}$
Here,
Let us find the population and sample covariance of the given dataset:
X | Y |
2 | 3 |
4 | 5 |
6 | 7 |
Calculating the Means
${\overline{x}}$ = ${\dfrac{2+4+6}{3}}$ = 4
${\overline{y}}$ = ${\dfrac{3+5+7}{3}}$ = 5
Calculating Deviations and Their Products
${x_{i}-\overline{x}}$ | ${y_{i}-\overline{y}}$ | ${\left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right)}$ |
---|---|---|
-2 | -2 | 4 |
0 | 0 | 0 |
2 | 2 | 4 |
Calculating the Sum of Products
${\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right)}$ = 4 + 0 + 4 = 8
Using the Formulas
Now, the population covariance is
Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$
= ${\dfrac{8}{N}}$
= ${\dfrac{8}{3}}$ = 2.67
The sample covariance is
Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{n-1}}$
= ${\dfrac{8}{n-1}}$
= ${\dfrac{8}{3-1}}$ = ${\dfrac{8}{2}}$ = 4
Thus, the population covariance is 2.67, and the sample covariance is 4
Here, the positive covariance values indicate a direct relationship between X and Y.
If X, Y, and Z are the random variables, and c is a constant, then the covariance follows the following properties:
After understanding the basics of covariance, here are the similarities and differences between covariance and other statistical measures like correlation and variance.
Basis | Covariance | Correlation |
---|---|---|
Measures Direction or Strength | Measures the direction of variables | Measures the strength and direction of variables |
Range | Any value (positive, negative, or zero) | Between -1 and 1 |
Unit | Product of units of the variables | Unitless |
Denotation | Cov(x, y) | Corr(x, y) = ${\dfrac{Cov\left( x,y\right) }{\sigma _{x}\sigma _{y}}}$ |
Basis | Covariance | Variance |
---|---|---|
Measures | Measures the joint variability of two variables. It determines how two variables change together. | Measures the variability of a single variable. It determines how far a variable deviates from its mean value. |
Range | Can be either positive, negative, or zero | Always non-negative |
Unit | Product of the units of the variables | Square of the unit of the variable |
Denotation | Cov(x, y) | σ, s, or Var(x) |
A square matrix that summarizes the covariances between multiple variables is called a covariance matrix.
For n variables, the matrix is:
${\Sigma =\begin{bmatrix} Cov\left( x_{1},x_{1}\right) & Cov\left( x_{1},x_{2}\right) & \ldots & Cov\left( x_{1},x_{n}\right) \\ Cov\left( x_{2},x_{1}\right) & Cov\left( x_{2},x_{2}\right) & \ldots & Cov\left( x_{2},x_{n}\right) \\ \vdots & \vdots & \ddots & \vdots \\ Cov\left( x_{n},x_{1}\right) & Cov\left( x_{n},x_{2}\right) & \ldots & Cov\left( x_{n},x_{n}\right) \end{bmatrix}}$
Here,
Example 1: For the following data with two variables, X and Y, find their covariance.
X | Y |
1 | 2 |
3 | 4 |
5 | 6 |
7 | 8 |
Here,
Mean of X is ${\dfrac{1+3+5+7}{4}}$ = ${\dfrac{16}{4}}$ = 4
Mean of Y is ${\dfrac{2+4+6+8}{4}}$ = ${\dfrac{20}{4}}$ = 5
Now,
X | ${X-\overline{X}}$ | Y | ${Y-\overline{Y}}$ | ${\left( X-\overline{X}\right) \left( Y-\overline{Y}\right)}$ |
---|---|---|---|---|
1 | -3 | 2 | -3 | 9 |
3 | -1 | 4 | -1 | 1 |
5 | 1 | 6 | 1 | 1 |
7 | 3 | 8 | 3 | 9 |
The sum of the product of deviations is 9 + 1 + 1 + 9 = 20
As we know, the covariance is Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$
Here,
Cov(X, Y) = ${\dfrac{20}{4}}$ = 5
Thus, the covariance is 5
Example 2: Consider two identical variables, X and Y:
X | Y |
1 | 1 |
3 | 3 |
5 | 5 |
Show that covariance equals variance when X and Y are identical.
Since X = Y,
Means of X and Y are ${\dfrac{1+3+5}{3}}$ = ${\dfrac{9}{3}}$ = 3
Now,
X | ${X-\overline{X}}$ | Y | ${Y-\overline{Y}}$ | ${\left( X-\overline{X}\right) \left( Y-\overline{Y}\right)}$ |
---|---|---|---|---|
1 | -2 | 1 | -2 | 4 |
3 | 0 | 3 | 0 | 0 |
5 | 2 | 5 | 2 | 4 |
The sum of the product of deviations is 4 + 0 + 4 = 8
The sum of squared deviations of X is (-2)2 + (0)2 + (2)2 = 8
As we know,
The covariance is Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{N}}$
The variance is Var(x) = ${\dfrac{\sum \left( x-\overline{x}\right) ^{2}}{N}}$
Here,
Cov(X, Y) = ${\dfrac{8}{3}}$ = 2.67
Var(X) = ${\dfrac{8}{3}}$ = 2.67
Thus, covariance equals variance, which is 2.67.
Example 3: A financial analyst is studying the relationship between hours of study and exam scores for a group of students to determine if there is a positive association between them. The following data was collected from a sample of 5 students:
Hours of Study (X) | Exam Scores (Y) |
---|---|
2 | 60 |
4 | 65 |
6 | 70 |
8 | 75 |
10 | 80 |
Calculate the sample covariance between hours of study (X) and exam scores (Y).
As we know, the formula for sample covariance is:
Cov(x, y) = ${\dfrac{\sum \left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{n-1}}$
Here,
${\overline{X}}$ = ${\dfrac{2+4+6+8+10}{5}}$ = ${\dfrac{30}{5}}$ = 6
${\overline{Y}}$ = ${\dfrac{60+65+70+75+80}{5}}$ = ${\dfrac{350}{5}}$ = 70
X | ${X-\overline{X}}$ | Y | ${Y-\overline{Y}}$ | ${\left( X-\overline{X}\right) \left( Y-\overline{Y}\right)}$ |
---|---|---|---|---|
2 | -4 | 60 | -10 | 40 |
4 | -2 | 65 | -5 | 10 |
6 | 0 | 70 | 0 | 0 |
8 | 2 | 75 | 5 | 10. |
10 | 4 | 80 | 10 | 40 |
The sum of the product of deviations is 40 +10 + 0 + 10 + 40 = 100
The sample size n = 5
Thus, the sample covariance is Cov (X, Y) = ${\dfrac{100}{5-1}}$ = ${\dfrac{100}{4}}$ = 25
The sample covariance between hours of study and exam scores is 25, which means if the hours of study increase, the exam scores tend to increase as well.
Last modified on January 2nd, 2025