What is Classification in Machine Learning?
Classification is a supervised learning task where a model learns to assign inputs to one of a fixed set of categories. Binary classification predicts one of two outcomes (e.g. spam or not spam); multi-class classification predicts one of three or more categories (e.g. document type). The model learns from labelled examples and generalises to unseen inputs.
Why does classification matter?
Classification is one of the most common tasks in applied machine learning. It underlies document routing, intent detection, content moderation, medical diagnosis, fraud detection, and many RAG pre-processing pipelines where documents must be labelled or filtered before indexing.
How does binary classification work?
Binary classification produces a score between 0 and 1 using a sigmoid activation on the output:
σ(x) = 1 / (1 + e^(-x))A threshold (typically 0.5) converts the score to a class label. The model is trained by minimising binary cross-entropy loss:
Loss = -[y·log(p) + (1-y)·log(1-p)]Where y is the true label (0 or 1) and p is the predicted probability.
How does multi-class classification work?
Multi-class classification outputs a probability distribution over all classes using a softmax activation:
softmax(xᵢ) = e^xᵢ / Σ e^xⱼAll probabilities sum to 1. The model is trained with categorical cross-entropy loss.
What metrics should you use to evaluate a classifier?
Raw accuracy is misleading on imbalanced datasets. Use these metrics together:
| Metric | What it measures | When to prioritise |
|---|---|---|
| Accuracy | Overall correctness | Balanced classes |
| Precision | Of predicted positives, how many are correct | When false positives are costly |
| Recall | Of actual positives, how many were found | When false negatives are costly |
| F1 Score | Harmonic mean of precision and recall | Imbalanced datasets |
| AUC-ROC | Discrimination ability across thresholds | Comparing classifiers |
For document classification in search pipelines, F1 and precision are usually the most important metrics.
How does classification relate to semantic search and RAG?
Classification often appears as a pre- or post-processing step in inference pipelines:
- Query intent classification: route queries to different retrieval strategies based on detected intent
- Document type classification: tag documents before indexing so they can be filtered at retrieval time
- Answer relevance classification: judge whether a retrieved chunk is relevant before passing it to an LLM
- Reranker output: cross-encoder rerankers can be framed as binary classifiers predicting (query, document) relevance
Embedding models hosted on SIE can be fine-tuned for classification tasks using a classification head on top of the encoder.
Binary vs multi-class vs multi-label classification
| Type | Output | Example |
|---|---|---|
| Binary | One of two classes | Spam / not spam |
| Multi-class | One of N classes | Document category |
| Multi-label | Multiple classes simultaneously | Document topics |
Multi-label classification is common for document tagging in knowledge bases and RAG pipelines.
Frequently asked questions
What’s the difference between classification and regression? Classification predicts a discrete category. Regression predicts a continuous value. The model architectures are similar, but the output layer and loss function differ.
Can I use an embedding model for classification? Yes. A common pattern is to encode text with an embedding model (e.g. BGE-M3 via SIE) and then train a lightweight classification head on top of the frozen embeddings, far more efficient than fine-tuning the full model.
What is class imbalance and how do you handle it? Class imbalance occurs when one class has far more examples than others. Techniques include oversampling the minority class (SMOTE), undersampling the majority, and using class-weighted loss functions.