# ML – Covariance and Correlation

Covariance

Mean of a random variable X Also called as expectation of X.

\mu = E(X) = \frac{1}{N} \sum_{i=0}^{i=N-1}x_{i}

Variance of the population:

\sigma^2(X) = E((X - E(x))^2) = \frac{1}{N} \sum_{i=0}^{i=N-1}(x_{i} - \mu)^2

This can also be written as (in matrix notations),

\sigma^2(X) = (X - \mu)^T (X - \mu)

Covariance between random variable X and Y For samples:

cov(X, Y) = E((X-\mu_x)(Y-\mu_y ))

cov(X, Y) = \frac{1}{N}\sum_{i=0}^{i=N-1}(x_i-\mu_x)(y_i-\mu_y)

This can also be written as (in matrix notations),

cov(X, Y) = (X - \mu_x)^T (Y - \mu_y)

In a sense, covariance indicates similarity between X and Y. It also indicates how much the random variables vary w.r.t its means.

Correlation

To bound the covariance, normalize it by its standard deviation.

Correlation between X and Y

corr(X, Y) = \frac{cov(X, Y)}{\sigma(X) \sigma(Y)}

corr(X, Y) = \frac{1}{N \sigma(X) \sigma(Y)} \sum_{i=0}^{i=N-1}(x_i - \mu_x)(y_i - \mu_y)

Implementation:

import numpy as np

def cov (x, y):
 """ 
 Calculate covariance
 """
  x = x - np.mean (x)
  y = y - np.mean (y)
  N = x.shape[1]
  z = np.dot (x, y.T) / N

  return z

def corr (x, y):
  """
  Calculate correlation
  """
  corr_X_Y = cov (x, y) / (np.std(x)*np.std(y))  

  return corr_X_Y  

def main ():
  x = np.random.rand (1, 10)
  y = np.random.rand (1, 10)

  # My implementation
  cov_X_Y = cov (x, y)
  corr_X_Y = corr (x, y)

  print ("My implementation")
  print ("Cov: {}\nCorr:{}".format (cov_X_Y, corr_X_Y))
  print ()

  # Numpy Implementation
  x_corr = np.corrcoef (x, y)
  x_cov = np.cov (x, y)

  print ("Numpy output")
  print ("Cov: {}\n\nCorr:{}".format (x_cov, x_corr))

if __name__ == '__main__':
  main ()

Leave a Reply

Your email address will not be published. Required fields are marked *