Classification Based Problems
A self explanatory table from Wikipedia called Confusion Matrix is as shown below. 
Confusion Matrix
| Confusion Matrix | Really Positive | Really Negative |
|---|---|---|
| Predicted to be positive (P) | TP | FP |
| Predicted to be negative (N) | FN | TN |
If you consider classification of images into Cat and Not a cat classes then the preformance of a given model can be measured as follows:
-
Recall a.k.a True Positive Rate
Of the all the
Catimages, how many were we able to sucessfully able to detect.![Rendered by QuickLaTeX.com \[\text{TPR or Recall} = \frac{TP}{TP + FN}\]](http://www.ssravisutha.com/wp-content/ql-cache/quicklatex.com-57516c3cb01f8dd33c58a435828ef107_l3.png)
-
False Positive Rate
Of all the
Non-catimages, how many of them were classified wrong.
![]()
ROC – Receiver Operational Characteristics
Plot of
TPRvsFPR
Why is it important?
Example 1: Consider a following senario: A radar is receiving some signal and our job is to classify the signal into
Enemy PlaneandNoiseclasses. For that, we have to set appropriate thresholds. 1. We need to set threshold such that the TPs are high and we certainly don’t want False Negatives (Classifying enemy plane as Noise). Hence we are more concerned about False Negatives and we must ensure lower FNs. 2. And we can compromise on FPs. Hence, even if the FPR is high, it is Okay!Example 2: Consider another senario where you are deteting tumor cells. 1. In this case, even if the model misses to detect the tumor cell correctly, it Okay! Doctors can do additional tests to detect the tumors. So, FNs are not that critical. 2. But, if the model says that a patient is ill when he isn’t, that is serious. So, FP should be as low as possible and hence FPR becomes the important factor.
- Precision > Of the images that are predicted to contain cat, how many of them really contain cat?
![]()
F1 Score
Hormonic mean of Recall and Precision.
![]()
Regression Based Problems
R-squared value
The worst prediction that you can do is to just predict the average of all the expected output (
) for any input. The error due to this model is called
.
![]()
Now, the prediction of a given model be
and the error incured is given by
![]()
Now, the r-squared metric is defined as follows:
![]()
If
then
and hence gives a baseline condition. Lower the model prediction error, higher the
value.