What is Linear and Logistic Regression?
Linear regression predicts a continuous value by fitting a weighted sum of input features to minimise prediction error. Logistic regression predicts the probability of a binary class by passing a linear combination of features through a sigmoid function. Both are foundational supervised learning models: fast, interpretable, and still widely used as baselines and components in larger systems.
Why do regression models still matter?
Despite the dominance of neural networks and tree models, linear and logistic regression remain valuable because:
- Interpretability: weights directly show each feature’s contribution to the prediction
- Speed: near-instantaneous training and inference
- Baselines: always test a linear model first; if neural approaches don’t beat it significantly, complexity isn’t justified
- Components in pipelines: logistic regression is commonly used as a classification head on top of frozen embeddings
How does linear regression work?
Linear regression models the target as a weighted sum of features plus a bias:
ŷ = w₁x₁ + w₂x₂ + ... + wₙxₙ + bWeights are learned by minimising Mean Squared Error (MSE), the average squared difference between predictions and true values:
MSE = (1/n) Σ (ŷᵢ - yᵢ)²The closed-form solution (normal equations) or gradient descent can both find the optimal weights.
How does logistic regression work?
Logistic regression extends linear regression to binary classification by wrapping the linear output in a sigmoid function:
p = σ(w·x + b) = 1 / (1 + e^-(w·x+b))This maps any real number to a probability between 0 and 1. A threshold (typically 0.5) converts the probability to a class prediction.
Training minimises binary cross-entropy loss:
Loss = -[y·log(p) + (1-y)·log(1-p)]Linear vs logistic regression
| Linear Regression | Logistic Regression | |
|---|---|---|
| Output | Continuous value | Probability (0-1) |
| Task | Regression | Binary classification |
| Loss function | MSE | Binary cross-entropy |
| Output activation | None | Sigmoid |
| Interpretability | High | High |
For multi-class problems, softmax regression (multinomial logistic regression) generalises logistic regression with a softmax output layer.
Regularisation: Ridge, Lasso, and Elastic Net
Without regularisation, regression models overfit on noisy or high-dimensional data.
Ridge (L2): adds a penalty on the sum of squared weights, shrinking them towards zero:
Loss = MSE + λ·Σwᵢ²Lasso (L1): adds a penalty on the sum of absolute weights, driving some to exactly zero (automatic feature selection):
Loss = MSE + λ·Σ|wᵢ|Elastic Net: combines L1 and L2:
Loss = MSE + λ₁·Σ|wᵢ| + λ₂·Σwᵢ²Using logistic regression on top of embeddings
A practical and efficient pattern for text classification:
- Encode documents with a frozen embedding model via SIE
- Train a logistic regression classifier on the resulting vectors
import numpy as npfrom sie_sdk import SIEClientfrom sie_sdk.types import Itemfrom sklearn.linear_model import LogisticRegression
client = SIEClient("http://localhost:8080")
# Encode training documentsX_train = np.stack( [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=t) for t in train_texts])])X_test = np.stack( [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=t) for t in test_texts])])
# Train classifier on top of embeddingsclf = LogisticRegression(C=1.0, max_iter=1000)clf.fit(X_train, y_train)predictions = clf.predict(X_test)This approach is fast to train, interpretable, and often competitive with fully fine-tuned classifiers, especially when labelled data is limited.
Frequently asked questions
When should I use logistic regression instead of a neural network? When you have limited data, need interpretability, need fast iteration, or want a strong baseline. Neural networks typically outperform only with sufficient data and tuning budget.
What is the difference between regression and classification? Regression predicts continuous values (e.g. price, relevance score). Classification predicts discrete categories (e.g. document type, spam/not-spam).
Can logistic regression handle multi-class problems?
Yes. With the multi_class='multinomial' option and a softmax output, logistic regression handles N classes directly. For large N (e.g. thousands of categories), neural classifiers tend to scale better.