Skip to main content

Introduction to Logistic Regression (A Beginner’s Guide part 5)

 Introduction to Logistic Regression



Logistic regression is a fundamental statistical technique used in machine learning for binary classification problems. Unlike linear regression, which predicts continuous outcomes, logistic regression predicts the probability of a binary outcome. This makes it an ideal tool for tasks where the output is categorical, such as determining whether an email is spam or not, or predicting whether a patient has a certain disease.

Understanding Logistic Regression

Concept

Logistic regression is a statistical model that is primarily used for binary classification problems. The core idea is to model the probability of a binary outcome (1 or 0, true or false, success or failure) based on one or more predictor variables.

For instance, suppose you want to predict whether a student will pass or fail an exam based on their hours of study and previous grades. Logistic regression helps in estimating the probability that the student will pass, given their study hours and grades.

The key difference between logistic regression and linear regression is that logistic regression predicts probabilities that are bounded between 0 and 1, while linear regression predicts continuous values. Logistic regression achieves this by using the logistic (sigmoid) function.

Sigmoid Function

The sigmoid function is the mathematical function that logistic regression uses to map predicted values to probabilities. It takes any real-valued number and maps it to a value between 0 and 1. The sigmoid function is defined as:

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

Here, zz is a linear combination of the input features (predictor variables) and their corresponding weights (parameters). The sigmoid function ensures that the output of the logistic regression model is always a probability between 0 and 1.

Mathematical Background

Logistic Regression Equation



The logistic regression model equation is used to predict the probability P(Y=1X)P(Y=1|X) that the dependent variable YY is 1 given the independent variables XX. The equation is as follows:

P(Y=1X)=11+e(β0+β1X1+β2X2+...+βnXn)

P(Y=1∣X)=1+e(β0+β1X1+β2X2+...+βnXn)1

In this equation:

  • β0\beta_0 is the intercept.
  • β1,β2,...,βn\beta_1, \beta_2, ..., \beta_n are the coefficients corresponding to the predictor variables X1,X2,...,XnX_1, X_2, ..., X_n.
  • The term β0+β1X1+β2X2+...+βnXn\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n is called the linear predictor.

The logistic regression model transforms this linear predictor using the sigmoid function to produce a probability.

Odds and Log-Odds

Logistic regression is based on the concept of odds and log-odds.

  • Odds: The odds of an event occurring is the ratio of the probability that the event will occur to the probability that it will not occur.

    Odds=P(Y=1X)1P(Y=1X)\text{Odds} = \frac{P(Y=1|X)}{1 - P(Y=1|X)}
  • Log-Odds (Logit): The log-odds is the natural logarithm of the odds. Logistic regression models the log-odds as a linear combination of the predictor variables.

    Logit(P(Y=1X))=log(P(Y=1X)1P(Y=1X))=β0+β1X1+β2X2+...+βnXn
    Logit(P(Y=1∣X))=log(1P(Y=1∣X)P(Y=1∣X))=β0+β1X1+β2X2+...+βnXn

The logit transformation ensures that the output remains linear with respect to the predictors, but the actual prediction is bounded between 0 and 1.

Cost Function and Optimization

Cost Function

The cost function in logistic regression, also known as the binary cross-entropy or log loss, measures how well the model's predicted probabilities match the actual class labels. The cost function is defined as:

J(θ)=1mi=1m[yilog(hθ(xi))+(1yi)log(1hθ(xi))]J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y_i \log(h_\theta(x_i)) + (1-y_i) \log(1-h_\theta(x_i))]

In this equation:

  • mm is the number of training examples.
  • yiy_i is the actual label of the ii-th training example.
  • hθ(xi)h_\theta(x_i) is the predicted probability for the ii-th training example.

The cost function penalizes incorrect predictions more heavily. When the predicted probability diverges significantly from the actual label, the log loss increases, thereby increasing the cost. The goal is to minimize this cost function during training.

Gradient Descent

Gradient descent is an optimization algorithm used to minimize the cost function in logistic regression. The basic idea is to iteratively update the model parameters (coefficients) in the direction that reduces the cost function.

The gradient descent update rule for each parameter θj\theta_j is given by:

θj:=θjαJ(θ)θj\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}

Here:

  • α\alpha is the learning rate, which controls the size of the steps taken towards the minimum.
  • J(θ)θj\frac{\partial J(\theta)}{\partial \theta_j} is the partial derivative of the cost function with respect to the parameter θj\theta_j.

By iteratively applying this update rule, gradient descent converges to the set of parameters that minimize the cost function, thereby finding the best fit for the logistic regression model.

Examples and Applications

  • Example 1: Spam Detection: Describe how logistic regression can be used to classify emails as spam or not spam.

    • Dataset: Mention common features used in spam detection, such as word frequency.
    • Graph: Show a confusion matrix for the spam detection model.
  • Example 2: Disease Diagnosis: Explain how logistic regression can be used to predict the presence of a disease based on patient data.

    • Dataset: Include features like age, weight, blood pressure, etc.
    • Graph: Display a ROC curve for the disease diagnosis model.

 Advantages and Limitations

  • Advantages: Highlight the strengths of logistic regression, such as its simplicity and interpretability.
  • Limitations: Discuss limitations, including its assumption of linearity between independent variables and the log-odds, and sensitivity to outliers.


Sithija Theekshana 

(bsc in Computer Science and Information Technology)

(bsc in Applied Physics and Electronics)


linkedin ;- www.linkedin.com/in/sithija-theekshana-008563229


Comments

Popular posts from this blog

cloud computing(sinhala)

                      cloud computing  cloud යනු කුමක්ද යන්න  තවදුරටත් රහසක් නොවේ. එය ඩිජිටල් පරිවර්තනයේ සහ නවීන තාක්‍ෂණයේ සෑම අංශයකම බහුලව භාවිතා වන යෙදුමක් වන අතර clouds එදිනෙදා ජීවිතයේ කොටසක් වනු ඇතැයි අපි පිළිගෙන ඇත්තෙමු .cloud shift යන්න තවමත් සම්පූර්ණයෙන් වටහාගෙන නැතත්. නමුත් cloud infrastructure (වලාකුළු යටිතල ව්‍යුහය) සහ එය අපට ලබා දෙන දේ තේරුම් නොගැනීමෙන් අදහස් වන්නේ අපි මෙම අත්‍යවශ්‍ය තාක්‍ෂණය සුළුවෙන් ලබාගන්නා වගයි. cloud  හොඳින් භාවිතා කිරීම සඳහා  cloud computing පිළිබඳ  හොඳ අවබෝධයක් අවශ්‍ය වේ. cloud computing යනු කුමක්ද ? සහ එය ක්‍රියා කරන්නේ කෙසේද? මීට වසර කිහිපයකට පෙර, cloud පිළිබඳ මූලික සංකල්පය එය "වෙනත් කෙනෙකුගේ පරිගණකය"  (“someone else’s computer,”) අදහස් කිරීම මගින් උපහාසයට ලක් කරන ලදී, එය තොරතුරු තාක්ෂණ වෘත්තිකයන් කිහිප දෙනෙකුගේ කෝපි මග් අලංකාර කරන කියමනකි.Oracle CTO  ලැරී එලිසන් ඒ හා සමානව අර්ත දැක්වූ  අතර, "අපි දැනටමත් කරන සෑම දෙයක්ම ඇතුළත් කිරීම සඳහා අපි cloud compu...

How Generative AI Works

 Generative AI is one of the most exciting and transformative technologies today. From creating realistic images to generating human-like text and composing music, this field of artificial intelligence has made enormous strides in recent years. As AI evolves, generative models have become essential tools in various industries, offering new ways to create, innovate, and solve problems. In this blog post, we will explore how generative AI works, examining the key components, technologies, and models that enable it to generate content like text, images, and more. We’ll also dive into real-world applications, ethical considerations, and the future potential of this technology. What is Generative AI? At its core, generative AI refers to a category of artificial intelligence models that can generate new content. Unlike traditional AI models, which are designed to classify, predict, or recognize data, generative models can create something entirely new based on the patterns they learn fro...

Understanding the K-Nearest Neighbors Algorithm (A Beginner's Guide part 7)

Understanding the K-Nearest Neighbors Algorithm (A Beginner's Guide) Machine learning algorithms can seem complex, but breaking them down into simpler terms can make them more approachable. One such algorithm is the K-Nearest Neighbors (K-NN) algorithm, which is popular for its simplicity and effectiveness. In this blog, we'll explore what K-NN is, how it works, and some practical applications. What is K-Nearest Neighbors? K-Nearest Neighbors (K-NN) is a supervised learning algorithm used for classification and regression tasks. In simple terms, K-NN classifies data points based on the 'votes' of their nearest neighbors. It doesn't make any assumptions about the underlying data distribution, making it a non-parametric algorithm. How Does K-NN Work? The K-Nearest Neighbors algorithm operates based on the idea that data points that are close to each other tend to have similar properties or belong to the same class. Here’s a detailed step-by-step process of how K-NN wo...