# Linear Regression

October 13, 2018 admin Leave a comment

Let’s first consider a situation where you need to predict something based on the previous data. For example, the rent of house based on number of rooms, area etc. If some new data is provided, the model should be able to predict the rent for the given input data.

Let’s say the independent variables be X representing all of the independent data (number of rooms). Let the dependent variable be Y representing all of the dependent data.

$Y = X W$

W represents the coefficients (also called as weights) which needs to be learned or determined.

Analytical Method

If X is a full-rank matrix, you could just take inverse to find out the weights.

$W = X^{-1}Y$

If $X^{-1}$ doesn’t exist then use projection theorem (pseudo inverse).

$W = (X^{T}X)^{-1}X^{T}Y$

$W = X \setminus Y$

Iterative Method

Let $Y^{'} = W^TX$ be the predicted output based on W. Let the loss/error function be defined by

$L = \frac{1}{2} \sum_{i=0}^{i=n-1} (Y_i - Y_i^{'})^2$

$L = \frac{1}{2} (Y - Y^{'})^{T} (Y - Y')$

$L = \frac{1}{2} \sum_{i=0}^{i=n-1} (Y_{i} - W^TX_{i})^2$

$\frac{dL}{dW} = \frac{1}{2} 2 (Y_{i} - Y_{i}^{'}) (- X_{i})$

$\frac{dL}{dW} = - X_{i}(Y_i - Y_{i}^{'})$

The objective is to minimize the loss function. Because the L function is convex, the minima is given by $dL/dW = 0$ .

Iteratively, if you change W in the opposite direction of the derivative $\frac{dL}{dW}$ , the weight converges to the pseudo inverse value.

$W = W - \alpha \frac{dL}{dW}$

There are multiple update strategies – online/stochastic update, batch update and mini-batch update.

Leave a Reply Cancel reply

Copyright © 2026 Ravi's blog — Primer WordPress theme by GoDaddy