What is Dimensionality Reduction?
Dimensionality reduction is the process of representing high-dimensional data in fewer dimensions while preserving as much meaningful structure as possible. It reduces computational cost, mitigates the curse of dimensionality, and makes data easier to visualise and cluster. Common techniques include PCA (linear) and UMAP/t-SNE (non-linear).
Why does dimensionality reduction matter for search?
Dense embedding vectors are typically 768-4096 dimensions. Storing and searching millions of such vectors is expensive. Dimensionality reduction can:
- Reduce storage cost: a 256-dim vector uses 3× less memory than a 768-dim one
- Speed up ANN search: lower dimensions mean faster distance computations
- Improve clustering quality: high-dimensional spaces suffer from the curse of dimensionality, where all points become roughly equidistant
- Enable visualisation: reduce to 2D or 3D for exploring embedding space structure
The trade-off is accuracy: compression loses some information, so reduced vectors are slightly less discriminative than full-dimensional ones.
What is the curse of dimensionality?
In high-dimensional spaces, counterintuitive things happen:
- The volume of space grows exponentially with dimensions, so data becomes increasingly sparse
- Distance metrics (cosine, Euclidean) lose discriminative power, and all pairwise distances become similar
- Models need exponentially more data to cover the space adequately
This is why techniques like PCA are applied before clustering high-dimensional embeddings, and why embedding models don’t use unnecessarily large dimensions.
How does PCA work?
Principal Component Analysis (PCA) finds the directions (principal components) of maximum variance in the data and projects onto the top-k components:
from sklearn.decomposition import PCA
pca = PCA(n_components=256)reduced_vectors = pca.fit_transform(embedding_matrix) # e.g. (10000, 768) → (10000, 256)
# Explained variance tells you how much information is retainedprint(f"Variance retained: {pca.explained_variance_ratio_.sum():.2%}")PCA is linear: it preserves global structure well but may not capture complex non-linear relationships.
PCA vs t-SNE vs UMAP
| PCA | t-SNE | UMAP | |
|---|---|---|---|
| Type | Linear | Non-linear | Non-linear |
| Preserves | Global variance | Local structure | Local + some global |
| Speed | Fast | Slow | Medium |
| Scalable | ✓ | ✗ | ✓ |
| Out-of-sample | ✓ | ✗ | ✓ |
| Best for | Compression, preprocessing | 2D visualisation | Visualisation + compression |
For production dimensionality reduction of embedding vectors, UMAP or PCA is preferred over t-SNE (which doesn’t support new data points without refitting).
Matryoshka embedding models and built-in dimensionality reduction
Modern embedding models like BGE-M3 use Matryoshka Representation Learning (MRL), a training technique that makes early dimensions of the vector more information-dense than later ones. This allows you to truncate vectors to a smaller size at inference time without a separate PCA step:
from sie_sdk import SIEClientfrom sie_sdk.types import Itemimport numpy as np
client = SIEClient("http://localhost:8080")
# Full 1024-dim vectors (batch)results = client.encode("BAAI/bge-m3", [Item(text=t) for t in texts])full_vectors = np.stack([r["dense"] for r in results])
# Truncate to 256 dims — works well due to MRL trainingreduced_vectors = full_vectors[:, :256]SIE supports Matryoshka-capable models, giving you built-in flexible dimensionality without post-processing.
Frequently asked questions
How many dimensions should I reduce to? Depends on the accuracy-cost trade-off. For most retrieval tasks, 256-512 dimensions retain 95%+ of retrieval quality vs full dimensions. Test on your specific dataset using recall@k metrics.
Does dimensionality reduction affect retrieval accuracy? Yes, there’s always some loss. The question is whether it’s acceptable. Modern Matryoshka models minimise this loss by design.
Can I apply PCA to vectors already stored in a vector database? You’d need to re-encode or transform existing vectors. It’s easier to apply reduction at encoding time and re-index.