Introduction to Logistic Regression (A Beginner’s Guide part 5)

Introduction to Logistic Regression

Logistic regression is a fundamental statistical technique used in machine learning for binary classification problems. Unlike linear regression, which predicts continuous outcomes, logistic regression predicts the probability of a binary outcome. This makes it an ideal tool for tasks where the output is categorical, such as determining whether an email is spam or not, or predicting whether a patient has a certain disease.

Understanding Logistic Regression

Concept

Logistic regression is a statistical model that is primarily used for binary classification problems. The core idea is to model the probability of a binary outcome (1 or 0, true or false, success or failure) based on one or more predictor variables.

For instance, suppose you want to predict whether a student will pass or fail an exam based on their hours of study and previous grades. Logistic regression helps in estimating the probability that the student will pass, given their study hours and grades.

The key difference between logistic regression and linear regression is that logistic regression predicts probabilities that are bounded between 0 and 1, while linear regression predicts continuous values. Logistic regression achieves this by using the logistic (sigmoid) function.

Sigmoid Function

The sigmoid function is the mathematical function that logistic regression uses to map predicted values to probabilities. It takes any real-valued number and maps it to a value between 0 and 1. The sigmoid function is defined as:

$\sigma(z) = \frac{1}{1 + e^{-z}}$

Here, $z$ is a linear combination of the input features (predictor variables) and their corresponding weights (parameters). The sigmoid function ensures that the output of the logistic regression model is always a probability between 0 and 1.

Mathematical Background

Logistic Regression Equation

The logistic regression model equation is used to predict the probability $P(Y=1|X)$ that the dependent variable $Y$ is 1 given the independent variables $X$ . The equation is as follows:

$P (Y = 1 ∣ X) = \frac{1}{1 + e^{- (β_{0} + β_{1} X_{1} + β_{2} X_{2} + . . . + β_{n} X_{n})}}$

P(Y=1∣X)=1+e−(β0+β1X1+β2X2+...+βnXn)1

In this equation:

$\beta_0$ is the intercept.

$\beta_1, \beta_2, ..., \beta_n$ are the coefficients corresponding to the predictor variables $X_1, X_2, ..., X_n$ .

The term $\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n$ is called the linear predictor.

The logistic regression model transforms this linear predictor using the sigmoid function to produce a probability.

Odds and Log-Odds

Logistic regression is based on the concept of odds and log-odds.

Odds: The odds of an event occurring is the ratio of the probability that the event will occur to the probability that it will not occur.
$\text{Odds} = \frac{P(Y=1|X)}{1 - P(Y=1|X)}$
Log-Odds (Logit): The log-odds is the natural logarithm of the odds. Logistic regression models the log-odds as a linear combination of the predictor variables.
$Logit (P (Y = 1 ∣ X)) = \log (\frac{P (Y = 1 ∣ X)}{1 - P (Y = 1 ∣ X)}) = β_{0} + β_{1} X_{1} + β_{2} X_{2} + . . . + β_{n} X_{n}$

Logit(P(Y=1∣X))=log(1−P(Y=1∣X)P(Y=1∣X))=β0+β1X1+β2X2+...+βnXn

The logit transformation ensures that the output remains linear with respect to the predictors, but the actual prediction is bounded between 0 and 1.

Cost Function and Optimization

Cost Function

The cost function in logistic regression, also known as the binary cross-entropy or log loss, measures how well the model's predicted probabilities match the actual class labels. The cost function is defined as:

$J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y_i \log(h_\theta(x_i)) + (1-y_i) \log(1-h_\theta(x_i))]$

In this equation:

$m$ is the number of training examples.
$y_i$ is the actual label of the $i$ -th training example.
$h_\theta(x_i)$ is the predicted probability for the $i$ -th training example.

The cost function penalizes incorrect predictions more heavily. When the predicted probability diverges significantly from the actual label, the log loss increases, thereby increasing the cost. The goal is to minimize this cost function during training.

Gradient Descent

Gradient descent is an optimization algorithm used to minimize the cost function in logistic regression. The basic idea is to iteratively update the model parameters (coefficients) in the direction that reduces the cost function.

The gradient descent update rule for each parameter $\theta_j$ is given by:

$\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}$

Here:

$\alpha$ is the learning rate, which controls the size of the steps taken towards the minimum.
$\frac{\partial J(\theta)}{\partial \theta_j}$ is the partial derivative of the cost function with respect to the parameter $\theta_j$ .

By iteratively applying this update rule, gradient descent converges to the set of parameters that minimize the cost function, thereby finding the best fit for the logistic regression model.

Examples and Applications

Example 1: Spam Detection: Describe how logistic regression can be used to classify emails as spam or not spam.
- Dataset: Mention common features used in spam detection, such as word frequency.
- Graph: Show a confusion matrix for the spam detection model.
Example 2: Disease Diagnosis: Explain how logistic regression can be used to predict the presence of a disease based on patient data.

Dataset: Include features like age, weight, blood pressure, etc.
Graph: Display a ROC curve for the disease diagnosis model.

Advantages and Limitations

Advantages: Highlight the strengths of logistic regression, such as its simplicity and interpretability.
Limitations: Discuss limitations, including its assumption of linearity between independent variables and the log-odds, and sensitivity to outliers.

Sithija Theekshana

(bsc in Computer Science and Information Technology)

(bsc in Applied Physics and Electronics)

linkedin ;- www.linkedin.com/in/sithija-theekshana-008563229

Innovate IT Insights

Search This Blog