---
title: Classification
description: A comprehensive guide to binary and multi-class classification in machine learning. Learn about sigmoid and softmax functions, essential evaluation metrics like accuracy, precision, recall, and F1 score, and practical considerations for model performance. Discover how to handle challenges such as class imbalance, feature engineering, and model selection. Perfect for data scientists and ML practitioners looking to master classification techniques for real-world applications.
canonical_url: https://superlinked.com/glossary/binary-and-multi-class-classification
last_updated: 2026-06-11
---

# What is Classification in Machine Learning?

Classification is a supervised learning task where a model learns to assign inputs to one of a fixed set of categories. Binary classification predicts one of two outcomes (e.g. spam or not spam); multi-class classification predicts one of three or more categories (e.g. document type). The model learns from labelled examples and generalises to unseen inputs.

---

## Why does classification matter?

Classification is one of the most common tasks in applied machine learning. It underlies document routing, intent detection, content moderation, medical diagnosis, fraud detection, and many RAG pre-processing pipelines where documents must be labelled or filtered before indexing.

---

## How does binary classification work?

Binary classification produces a score between 0 and 1 using a sigmoid activation on the output:

```
σ(x) = 1 / (1 + e^(-x))
```

A threshold (typically 0.5) converts the score to a class label. The model is trained by minimising binary cross-entropy loss:

```
Loss = -[y·log(p) + (1-y)·log(1-p)]
```

Where `y` is the true label (0 or 1) and `p` is the predicted probability.

---

## How does multi-class classification work?

Multi-class classification outputs a probability distribution over all classes using a softmax activation:

```
softmax(xᵢ) = e^xᵢ / Σ e^xⱼ
```

All probabilities sum to 1. The model is trained with categorical cross-entropy loss.

---

## What metrics should you use to evaluate a classifier?

Raw accuracy is misleading on imbalanced datasets. Use these metrics together:

| Metric | What it measures | When to prioritise |
|---|---|---|
| Accuracy | Overall correctness | Balanced classes |
| Precision | Of predicted positives, how many are correct | When false positives are costly |
| Recall | Of actual positives, how many were found | When false negatives are costly |
| F1 Score | Harmonic mean of precision and recall | Imbalanced datasets |
| AUC-ROC | Discrimination ability across thresholds | Comparing classifiers |

For document classification in search pipelines, F1 and precision are usually the most important metrics.

---

## How does classification relate to semantic search and RAG?

Classification often appears as a pre- or post-processing step in inference pipelines:

- **Query intent classification**: route queries to different retrieval strategies based on detected intent
- **Document type classification**: tag documents before indexing so they can be filtered at retrieval time
- **Answer relevance classification**: judge whether a retrieved chunk is relevant before passing it to an LLM
- **Reranker output**: cross-encoder rerankers can be framed as binary classifiers predicting (query, document) relevance

Embedding models hosted on SIE can be fine-tuned for classification tasks using a classification head on top of the encoder.

---

## Binary vs multi-class vs multi-label classification

| Type | Output | Example |
|---|---|---|
| Binary | One of two classes | Spam / not spam |
| Multi-class | One of N classes | Document category |
| Multi-label | Multiple classes simultaneously | Document topics |

Multi-label classification is common for document tagging in knowledge bases and RAG pipelines.

---

## Frequently asked questions

**What's the difference between classification and regression?**
Classification predicts a discrete category. Regression predicts a continuous value. The model architectures are similar, but the output layer and loss function differ.

**Can I use an embedding model for classification?**
Yes. A common pattern is to encode text with an embedding model (e.g. BGE-M3 via SIE) and then train a lightweight classification head on top of the frozen embeddings, far more efficient than fine-tuning the full model.

**What is class imbalance and how do you handle it?**
Class imbalance occurs when one class has far more examples than others. Techniques include oversampling the minority class (SMOTE), undersampling the majority, and using class-weighted loss functions.

---

## Related resources

- [Browse encoder models for classification on SIE](/models)
- [What is a neural network?](/glossary/neural-networks)
- [What is a loss function?](/glossary/loss-function)
- [What is feature engineering?](/glossary/feature-engineering)
- [What is RAG?](/glossary/what-is-rag)