---
title: Feature Selection
description: Learn essential feature selection techniques for machine learning. Discover filter, wrapper, and embedded methods to improve model performance and reduce overfitting.
canonical_url: https://superlinked.com/glossary/feature-selection
last_updated: 2026-06-11
---

# What is Feature Selection?

Feature selection is the process of choosing the most informative subset of features from a dataset to use in a model, and discarding the rest. It reduces overfitting, lowers computational cost, improves model interpretability, and can improve generalisation by removing noisy or redundant signals. The three main approaches are filter methods, wrapper methods, and embedded methods.

---

## Why does feature selection matter?

More features are not always better. Irrelevant or redundant features add noise, increase training time, and can cause a model to overfit. Feature selection focuses the model on the signals that actually matter, improving accuracy, speed, and explainability.

In document processing and retrieval pipelines, feature selection determines which metadata fields, text statistics, or derived features are worth including in ranking models alongside embedding similarity.

---

## What are the three main approaches?

### Filter methods
Rank features independently of any model using statistical measures:

| Method | Measures | Type |
|---|---|---|
| Correlation | Linear relationship with target | Numerical |
| Chi-squared | Dependency between categorical feature and target | Categorical |
| Mutual information | Any statistical dependency | Both |
| Variance threshold | Removes near-constant features | Both |

```python
from sklearn.feature_selection import SelectKBest, mutual_info_classif

selector = SelectKBest(score_func=mutual_info_classif, k=20)
X_selected = selector.fit_transform(X_train, y_train)
```

Filter methods are fast but don't account for feature interactions.

### Wrapper methods
Evaluate subsets of features by training a model and measuring performance:

- **Forward selection**: start empty, add features one at a time
- **Backward elimination**: start with all features, remove least useful
- **Recursive Feature Elimination (RFE)**: repeatedly trains a model and prunes the weakest feature

Wrapper methods are more accurate but computationally expensive, since each iteration requires retraining the model.

### Embedded methods
Feature selection happens as part of model training:

- **L1 regularisation (Lasso)**: shrinks irrelevant feature weights to zero
- **Tree-based importance**: random forests and gradient boosted trees rank features by their contribution to splits
- **Elastic net**: combines L1 and L2 regularisation

```python
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
importances = rf.feature_importances_
```

Embedded methods are efficient and account for feature interactions within the model.

---

## Filter vs wrapper vs embedded methods

| | Filter | Wrapper | Embedded |
|---|---|---|---|
| Model-dependent | ✗ | ✓ | ✓ |
| Computational cost | Low | High | Medium |
| Accounts for interactions | ✗ | ✓ | ✓ |
| Best for | Initial exploration | Small feature sets | Large datasets |

---

## Feature selection in retrieval and re-ranking

When building a re-ranking model on top of embedding retrieval, feature selection determines which signals to include alongside cosine similarity:

**Likely informative features:**
- Embedding similarity score
- Document recency (days since publication)
- Source authority (domain trust score)
- Query-document metadata match (same category, same language)
- Document length (as a quality signal)

**Likely uninformative:**
- Raw document ID
- File creation timestamp (vs publication date)
- Formatting metadata irrelevant to content

---

## Frequently asked questions

**What's the difference between feature selection and feature extraction?**
Feature selection picks from existing features. Feature extraction creates new representations (e.g. PCA components, embedding vectors). Embedding models perform feature extraction.

**Can I use multiple selection methods together?**
Yes. A common pipeline is: variance threshold (remove constants) → correlation filter (remove redundant) → RFE or embedded selection (final selection). Each step removes different types of irrelevant features.

**How do you evaluate whether feature selection improved the model?**
Compare cross-validated accuracy/F1/recall on the selected feature set vs the full feature set. Also compare inference speed and model size.

---

## Related resources

- [What is feature engineering?](/glossary/feature-engineering)
- [What is feature scaling?](/glossary/feature-scaling-and-normalization)
- [What is dimensionality reduction?](/glossary/dimensionality-reduction-taming-the-curse-of-high-dimensional-data)
- [What is a reranker?](/glossary/what-is-a-reranker)
- [What is gradient boosting?](/glossary/gradient-boosting-and-adaptive-boosting)
