Skip to main content

Understanding the Confusion Matrix, Precision, Recall, F1 Score, and Accuracy (A Beginner’s Guide part 6)

Understanding the Confusion Matrix, Precision, Recall, F1 Score, and Accuracy

In the realm of machine learning, evaluating the performance of your models is crucial. Various metrics help in understanding how well your model is performing, and among them, the confusion matrix, precision, recall, F1 score, and accuracy are fundamental. This guide will walk you through these concepts, providing a clear understanding and practical examples.

What is a Confusion Matrix?

A confusion matrix is a table used to evaluate the performance of a classification model. It helps in understanding the types of errors made by the model. The matrix contrasts the actual target values with those predicted by the model.

Structure of a Confusion Matrix

For a binary classification problem, the confusion matrix looks like this:


  • True Positive (TP): The model correctly predicts the positive class.
  • True Negative (TN): The model correctly predicts the negative class.
  • False Positive (FP): The model incorrectly predicts the positive class.
  • False Negative (FN): The model incorrectly predicts the negative class.

Precision

Precision is the ratio of correctly predicted positive observations to the total predicted positives. It answers the question: What proportion of positive identifications was actually correct?

Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}

High precision indicates a low false positive rate.

Example Calculation

Let's say you have the following confusion matrix:

Using the above confusion matrix:

Precision=44+1=45=0.80

Recall (Sensitivity)

Recall, or sensitivity, is the ratio of correctly predicted positive observations to all observations in the actual positive class. It answers the question: What proportion of actual positives was identified correctly?

Recall=TPTP+FN​

High recall indicates a low false negative rate.

Example Calculation

Using the same confusion matrix:

Recall=44+1=45=0.80\text{Recall} = \frac{4}{4 + 1} = \frac{4}{5} = 0.80


F1 Score

The F1 Score is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful when you need to account for both false positives and false negatives.

F1 Score=2×Precision×RecallPrecision+Recall\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

Example Calculation

Using our previous precision and recall values:

F1 Score=2×0.80×0.800.80+0.80=2×0.641.60=0.80\text{F1 Score} = 2 \times \frac{0.80 \times 0.80}{0.80 + 0.80} = 2 \times \frac{0.64}{1.60} = 0.80


Accuracy

Accuracy is the ratio of correctly predicted observations to the total observations. It answers the question: What proportion of the total predictions were correct?

Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Accuracy is a great measure when the classes are balanced, but it can be misleading when there is an imbalance.

Example Calculation

Using the same confusion matrix:

Accuracy=4+44+4+1+1=810=0.80\text{Accuracy} = \frac{4 + 4}{4 + 4 + 1 + 1} = \frac{8}{10} = 0.80



equations

  • Accuracy:

    Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
  • Precision:

    Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}
  • Recall:

    Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}
  • F1 Score:

    F1 Score=2×(Precision×Recall)Precision+Recall\text{F1 Score} = \frac{2 \times (\text{Precision} \times \text{Recall})}{\text{Precision} + \text{Recall}}



  • Sithija Theekshana 

    (bsc in Computer Science and Information Technology)

    (bsc in Applied Physics and Electronics)


    linkedin ;- www.linkedin.com/in/sithija-theekshana-008563229


    Comments

    Popular posts from this blog

    Understanding Machine Learning: A Beginner's Guide(part 1)

    Introduction Machine learning is a branch of artificial intelligence (AI) that is revolutionizing various industries, from healthcare to finance to technology. It enables computers to learn from data and make decisions or predictions without being explicitly programmed to perform specific tasks. In this blog post, we will delve into the basics of machine learning, exploring its significance, fundamental concepts, and how it works. The Significance of Machine Learning Machine learning has become a pivotal technology in the modern era due to its ability to process and analyze vast amounts of data more efficiently than traditional methods. Here’s why machine learning is so important: Automation of Tasks: Machine learning automates repetitive and mundane tasks, allowing humans to focus on more complex and creative endeavors. Data-Driven Decisions: By uncovering patterns and insights from data, machine learning helps businesses and organizations make informed decisions, leading to better ...

    Supervised Learning and Unsupervised Learning in Machine Learning (A Beginner's Guide(part 2)

      Supervised Learning and Unsupervised Learning in Machine Learning Machine learning, a subset of artificial intelligence, involves training algorithms to learn from and make predictions or decisions based on data. Two fundamental types of machine learning are supervised learning and unsupervised learning. Understanding these concepts is crucial for anyone diving into the world of data science and machine learning. Supervised Learning Supervised learning is a type of machine learning where the model is trained on a labeled dataset. This means that each training example is paired with an output label. The goal is for the algorithm to learn a mapping from inputs to outputs so it can make accurate predictions on new, unseen data. Key Concepts Labeled Data : In supervised learning, the dataset consists of input-output pairs. For example, a dataset for a spam detection algorithm might include emails (inputs) and labels indicating whether each email is spam or not (outputs). Training Pro...

    Spam Mail Prediction using Machine Learning

     Spam Mail Prediction using Machine Learning This project involves building a spam mail detector using Python within the Google Colab environment. By leveraging machine learning techniques, we aim to automatically classify emails as either spam or legitimate. The detector will enhance user security by filtering out potentially harmful emails. Source code(with describtion) Importing the Dependencies import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score Importing Libraries: The code begins by importing necessary libraries such as NumPy, Pandas, scikit-learn's train_test_split , TfidfVectorizer , LogisticRegression , and accuracy_score from sklearn.metrics . Data Preparation: It implies that you have a dataset containing email content along with labels indicating whether each emai...