---
title: Feature Scaling
description: Master feature scaling and normalization techniques for ML. Learn min-max scaling vs standardization with real-world applications and implementation tips.
canonical_url: https://superlinked.com/glossary/feature-scaling-and-normalization
last_updated: 2026-06-11
---

# What is Feature Scaling and Normalisation?

Feature scaling transforms numerical features to a comparable range so that no single feature dominates a model due to its magnitude. The two most common techniques are min-max scaling (normalisation), which maps values to [0, 1], and standardisation (z-score scaling), which centres data at zero with unit variance. Most distance-based and gradient-based models require feature scaling to perform well.

---

## Why does feature scaling matter?

Consider a dataset with document length (range: 10-50,000 words) and number of images (range: 0-20). Without scaling, the document length feature dominates Euclidean distance calculations simply because its values are larger, not because it's more informative.

Models particularly sensitive to feature scale:
- **K-Nearest Neighbours**: distance-based, heavily affected
- **Support Vector Machines**: margin maximisation is scale-sensitive
- **Neural networks / gradient descent**: large feature ranges cause unstable learning
- **PCA**: maximises variance, so large-scale features dominate components

Models that don't require scaling:
- **Tree-based models** (decision trees, random forests, gradient boosting): split on thresholds, scale-invariant

---

## Min-max scaling (normalisation)

Maps all values to the range [0, 1]:

```
x_scaled = (x - x_min) / (x_max - x_min)
```

```python
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)
```

**Best for:** neural network inputs, when you know the approximate min/max of the data.
**Weakness:** sensitive to outliers, since a single extreme value compresses all other values towards 0.

---

## Standardisation (Z-score scaling)

Centres data at zero and scales to unit variance:

```
x_scaled = (x - mean) / std
```

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_train)
```

**Best for:** most general cases, especially when you don't know the range.
**Weakness:** doesn't bound values, so outliers remain as outliers.

---

## Min-max vs standardisation: which to use?

| Scenario | Recommended |
|---|---|
| Known bounded range (e.g. pixel values 0-255) | Min-max |
| Unknown range, possible outliers | Standardisation |
| Neural network training | Either (standardisation more common) |
| SVM | Standardisation |
| PCA | Standardisation |
| Data has extreme outliers | RobustScaler (uses median/IQR) |

---

## Feature scaling and embedding vectors

Dense embedding vectors produced by SIE's encoding models are already unit-normalised (L2 norm = 1) for cosine similarity search. You don't need to apply additional scaling to embedding vectors.

However, when combining embedding similarity scores with other tabular features (e.g. document recency, click-through rate) in a re-ranking model, you'll need to scale the non-embedding features to be comparable with the similarity scores.

---

## A critical rule: fit on training data, transform test data

Always fit the scaler on training data only, then apply the same transformation to validation and test data:

```python
# Correct
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)   # fit + transform
X_test_scaled = scaler.transform(X_test)          # transform only

# Wrong — leaks test statistics into training
X_all_scaled = scaler.fit_transform(X_all)
```

Fitting on the full dataset causes data leakage and produces optimistically biased evaluation metrics.

---

## Frequently asked questions

**Do I need to scale features for tree-based models?**
No. Decision trees, random forests, and gradient boosted trees split on thresholds, which are scale-invariant. Scaling doesn't hurt, but it doesn't help either.

**What is RobustScaler?**
RobustScaler uses the median and interquartile range instead of mean and standard deviation, making it resistant to outliers:
`x_scaled = (x - median) / IQR`

**Does feature scaling change the information content?**
No. Scaling is a monotonic transformation that changes magnitude but preserves rank order and relative differences. The information is identical.

---

## Related resources

- [What is feature engineering?](/glossary/feature-engineering)
- [What is feature selection?](/glossary/feature-selection)
- [What is dimensionality reduction?](/glossary/dimensionality-reduction-taming-the-curse-of-high-dimensional-data)
- [What is a neural network?](/glossary/neural-networks)
- [Browse models on SIE](/models)