Supervised Learning

What is Linear and Logistic Regression?

Linear regression predicts a continuous value by fitting a weighted sum of input features to minimise prediction error. Logistic regression predicts the probability of a binary class by passing a linear combination of features through a sigmoid function. Both are foundational supervised learning models: fast, interpretable, and still widely used as baselines and components in larger systems.

Why do regression models still matter?

Despite the dominance of neural networks and tree models, linear and logistic regression remain valuable because:

Interpretability: weights directly show each feature’s contribution to the prediction
Speed: near-instantaneous training and inference
Baselines: always test a linear model first; if neural approaches don’t beat it significantly, complexity isn’t justified
Components in pipelines: logistic regression is commonly used as a classification head on top of frozen embeddings

How does linear regression work?

Linear regression models the target as a weighted sum of features plus a bias:

ŷ = w₁x₁ + w₂x₂ + ... + wₙxₙ + b

Weights are learned by minimising Mean Squared Error (MSE), the average squared difference between predictions and true values:

MSE = (1/n) Σ (ŷᵢ - yᵢ)²

The closed-form solution (normal equations) or gradient descent can both find the optimal weights.

How does logistic regression work?

Logistic regression extends linear regression to binary classification by wrapping the linear output in a sigmoid function:

p = σ(w·x + b) = 1 / (1 + e^-(w·x+b))

This maps any real number to a probability between 0 and 1. A threshold (typically 0.5) converts the probability to a class prediction.

Training minimises binary cross-entropy loss:

Loss = -[y·log(p) + (1-y)·log(1-p)]

Linear vs logistic regression

	Linear Regression	Logistic Regression
Output	Continuous value	Probability (0-1)
Task	Regression	Binary classification
Loss function	MSE	Binary cross-entropy
Output activation	None	Sigmoid
Interpretability	High	High

For multi-class problems, softmax regression (multinomial logistic regression) generalises logistic regression with a softmax output layer.

Regularisation: Ridge, Lasso, and Elastic Net

Without regularisation, regression models overfit on noisy or high-dimensional data.

Ridge (L2): adds a penalty on the sum of squared weights, shrinking them towards zero:

Loss = MSE + λ·Σwᵢ²

Lasso (L1): adds a penalty on the sum of absolute weights, driving some to exactly zero (automatic feature selection):

Loss = MSE + λ·Σ|wᵢ|

Elastic Net: combines L1 and L2:

Loss = MSE + λ₁·Σ|wᵢ| + λ₂·Σwᵢ²

Using logistic regression on top of embeddings

A practical and efficient pattern for text classification:

Encode documents with a frozen embedding model via SIE
Train a logistic regression classifier on the resulting vectors

import numpy as np
from sie_sdk import SIEClient
from sie_sdk.types import Item
from sklearn.linear_model import LogisticRegression

client = SIEClient("http://localhost:8080")

# Encode training documents
X_train = np.stack(
    [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=t) for t in train_texts])]
)
X_test = np.stack(
    [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=t) for t in test_texts])]
)

# Train classifier on top of embeddings
clf = LogisticRegression(C=1.0, max_iter=1000)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)

This approach is fast to train, interpretable, and often competitive with fully fine-tuned classifiers, especially when labelled data is limited.

Frequently asked questions

When should I use logistic regression instead of a neural network? When you have limited data, need interpretability, need fast iteration, or want a strong baseline. Neural networks typically outperform only with sufficient data and tuning budget.

What is the difference between regression and classification? Regression predicts continuous values (e.g. price, relevance score). Classification predicts discrete categories (e.g. document type, spam/not-spam).

Can logistic regression handle multi-class problems? Yes. With the multi_class='multinomial' option and a softmax output, logistic regression handles N classes directly. For large N (e.g. thousands of categories), neural classifiers tend to scale better.