Let’s first consider a situation where you need to predict something based on the previous data. For example, the rent of house based on number of rooms, area etc. If some new data is provided, the model should be able to predict the rent for the given input data.
Let’s say the independent variables be X representing all of the independent data (number of rooms). Let the dependent variable be Y representing all of the dependent data.
![]()
W represents the coefficients (also called as weights) which needs to be learned or determined.
Analytical Method
If X is a full-rank matrix, you could just take inverse to find out the weights.
![]()
If
doesn’t exist then use projection theorem (pseudo inverse).
![]()
![]()
Iterative Method
Let
be the predicted output based on W. Let the loss/error function be defined by
![Rendered by QuickLaTeX.com \[L = \frac{1}{2} \sum_{i=0}^{i=n-1} (Y_i - Y_i^{'})^2\]](http://www.ssravisutha.com/wp-content/ql-cache/quicklatex.com-ec0862fabe76120e2963a3d43e6b500c_l3.png)
![]()
![Rendered by QuickLaTeX.com \[L = \frac{1}{2} \sum_{i=0}^{i=n-1} (Y_{i} - W^TX_{i})^2\]](http://www.ssravisutha.com/wp-content/ql-cache/quicklatex.com-3754797f0872e14bef87b5ac182dcaf7_l3.png)
![]()
![]()
The objective is to minimize the loss function. Because the L function is convex, the minima is given by
.
Iteratively, if you change W in the opposite direction of the derivative
, the weight converges to the pseudo inverse value.
![]()
There are multiple update strategies – online/stochastic update, batch update and mini-batch update.