Supervised Learning

What is Classification in Machine Learning?

Classification is a supervised learning task where a model learns to assign inputs to one of a fixed set of categories. Binary classification predicts one of two outcomes (e.g. spam or not spam); multi-class classification predicts one of three or more categories (e.g. document type). The model learns from labelled examples and generalises to unseen inputs.

Why does classification matter?

Classification is one of the most common tasks in applied machine learning. It underlies document routing, intent detection, content moderation, medical diagnosis, fraud detection, and many RAG pre-processing pipelines where documents must be labelled or filtered before indexing.

How does binary classification work?

Binary classification produces a score between 0 and 1 using a sigmoid activation on the output:

σ(x) = 1 / (1 + e^(-x))

A threshold (typically 0.5) converts the score to a class label. The model is trained by minimising binary cross-entropy loss:

Loss = -[y·log(p) + (1-y)·log(1-p)]

Where y is the true label (0 or 1) and p is the predicted probability.

How does multi-class classification work?

Multi-class classification outputs a probability distribution over all classes using a softmax activation:

softmax(xᵢ) = e^xᵢ / Σ e^xⱼ

All probabilities sum to 1. The model is trained with categorical cross-entropy loss.

What metrics should you use to evaluate a classifier?

Raw accuracy is misleading on imbalanced datasets. Use these metrics together:

Metric	What it measures	When to prioritise
Accuracy	Overall correctness	Balanced classes
Precision	Of predicted positives, how many are correct	When false positives are costly
Recall	Of actual positives, how many were found	When false negatives are costly
F1 Score	Harmonic mean of precision and recall	Imbalanced datasets
AUC-ROC	Discrimination ability across thresholds	Comparing classifiers

For document classification in search pipelines, F1 and precision are usually the most important metrics.

How does classification relate to semantic search and RAG?

Classification often appears as a pre- or post-processing step in inference pipelines:

Query intent classification: route queries to different retrieval strategies based on detected intent
Document type classification: tag documents before indexing so they can be filtered at retrieval time
Answer relevance classification: judge whether a retrieved chunk is relevant before passing it to an LLM
Reranker output: cross-encoder rerankers can be framed as binary classifiers predicting (query, document) relevance

Embedding models hosted on SIE can be fine-tuned for classification tasks using a classification head on top of the encoder.

Binary vs multi-class vs multi-label classification

Type	Output	Example
Binary	One of two classes	Spam / not spam
Multi-class	One of N classes	Document category
Multi-label	Multiple classes simultaneously	Document topics

Multi-label classification is common for document tagging in knowledge bases and RAG pipelines.

Frequently asked questions

What’s the difference between classification and regression? Classification predicts a discrete category. Regression predicts a continuous value. The model architectures are similar, but the output layer and loss function differ.

Can I use an embedding model for classification? Yes. A common pattern is to encode text with an embedding model (e.g. BGE-M3 via SIE) and then train a lightweight classification head on top of the frozen embeddings, far more efficient than fine-tuning the full model.

What is class imbalance and how do you handle it? Class imbalance occurs when one class has far more examples than others. Techniques include oversampling the minority class (SMOTE), undersampling the majority, and using class-weighted loss functions.