Evaluation of Machine Learning Models

Type I and Type II errors in Statistical Testing

Source: https://en.wikipedia.org/wiki/Type_I_and_type_II_errors

Supervised ML Models

There are two types of supervised ML models: Regression Models and Classification Models. Root Mean Square Error (RMSE) is used for evaluation of regression model performance. Confusion Matrix and associated measures are used for evaluation of classification model performance.

RMSE

RMSE is the square root of Mean Square Error (MSE).
MSE is the mean of the square of the errors between the various actual values and the corresponding values predicted by the regression model.

Confusion Matrix

TP: Predicts positive when actually positive (Correct): The model correctly predicts that an item belongs to the selected class.

FP: Predicts positive when actually negative (Incorrect): The model incorrectly predicts that an item belongs to the selected class.

TN: Predicts negative when actually negative (Correct): The model does not predict (correctly) that an item belongs to the selected class.

FN: Predicts Negative when actually positive (Incorrect): The model does not predict (incorrectly) that an item belongs to the selected class.

Precision

Sensitivity/Recall/True Positivity Rate(TPR)

F1 Score

Specificity

ROC & AUC

By tuning certain parameters in an ML model, the TPR and FPR can be improved. E.g. In Logistic regression, the threshold for deciding the two classes. Classification can also be done using different models.

Receiver Operating Curve (ROC) is used to visualize the balance between the following in a summarized plot and choose the appropriate model (or) appropriate parameter in a given model:

  • the benefits (the positively identified records—TPR or recall) and
  • the costs (the mistakes made—FPR)

Area Under Curve (AUC) is an indicator of the performance of the model (bigger the area under the ROC curve greater the performance).

Source: ROC and AUC, Clearly Explained!
https://youtu.be/4jRBRDbJemM
Source: ROC and AUC, Clearly Explained!
https://youtu.be/4jRBRDbJemM

Further Reading

Refer papers where they compare models for a given classification problem. For example:
Chakraborti, S. (2014). A comparative study of performances of various classification algorithms for predicting salary classes of employees. International Journal of Computer Science and Information Technologies5(2), 1964-1972.