Understanding the Confusion Matrix, Precision, Recall, F1 Score, and Accuracy (A Beginner’s Guide part 6)

Understanding the Confusion Matrix, Precision, Recall, F1 Score, and Accuracy

In the realm of machine learning, evaluating the performance of your models is crucial. Various metrics help in understanding how well your model is performing, and among them, the confusion matrix, precision, recall, F1 score, and accuracy are fundamental. This guide will walk you through these concepts, providing a clear understanding and practical examples.

What is a Confusion Matrix?

A confusion matrix is a table used to evaluate the performance of a classification model. It helps in understanding the types of errors made by the model. The matrix contrasts the actual target values with those predicted by the model.

Structure of a Confusion Matrix

For a binary classification problem, the confusion matrix looks like this:

True Positive (TP): The model correctly predicts the positive class.
True Negative (TN): The model correctly predicts the negative class.
False Positive (FP): The model incorrectly predicts the positive class.
False Negative (FN): The model incorrectly predicts the negative class.

Precision

Precision is the ratio of correctly predicted positive observations to the total predicted positives. It answers the question: What proportion of positive identifications was actually correct?

$\text{Precision} = \frac{TP}{TP + FP}$

High precision indicates a low false positive rate.

Example Calculation

Let's say you have the following confusion matrix:

Using the above confusion matrix:

$Precision = \frac{4}{4 + 1} = \frac{4}{5} = 0.80$

Recall (Sensitivity)

Recall, or sensitivity, is the ratio of correctly predicted positive observations to all observations in the actual positive class. It answers the question: What proportion of actual positives was identified correctly?

$Recall = \frac{T P}{T P + F N}$

High recall indicates a low false negative rate.

Example Calculation

Using the same confusion matrix:

$\text{Recall} = \frac{4}{4 + 1} = \frac{4}{5} = 0.80$

F1 Score

The F1 Score is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful when you need to account for both false positives and false negatives.

$\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$

Example Calculation

Using our previous precision and recall values:

$\text{F1 Score} = 2 \times \frac{0.80 \times 0.80}{0.80 + 0.80} = 2 \times \frac{0.64}{1.60} = 0.80$

Accuracy

Accuracy is the ratio of correctly predicted observations to the total observations. It answers the question: What proportion of the total predictions were correct?

$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$

Accuracy is a great measure when the classes are balanced, but it can be misleading when there is an imbalance.

Example Calculation

Using the same confusion matrix:

$\text{Accuracy} = \frac{4 + 4}{4 + 4 + 1 + 1} = \frac{8}{10} = 0.80$

$equations$

Accuracy:

\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Precision:

\text{Precision} = \frac{TP}{TP + FP}

Recall:

\text{Recall} = \frac{TP}{TP + FN}

F1 Score:

\text{F1 Score} = \frac{2 \times (\text{Precision} \times \text{Recall})}{\text{Precision} + \text{Recall}}

Sithija Theekshana

(bsc in Computer Science and Information Technology)

(bsc in Applied Physics and Electronics)

linkedin ;- www.linkedin.com/in/sithija-theekshana-008563229

Innovate IT Insights

Search This Blog