Recommendation System

What is a Recommendation System?

A recommendation system is a machine learning system that predicts which items a user is most likely to find relevant or useful, and surfaces them proactively. It learns from user behaviour (clicks, purchases, ratings) and item characteristics to personalise suggestions. The three main approaches are collaborative filtering, content-based filtering, and hybrid systems that combine both.

Why do recommendation systems matter?

Recommendation systems drive a significant share of engagement and revenue in consumer products. Netflix estimates ~80% of content watched comes from recommendations, and Amazon attributes ~35% of revenue to them. They are also increasingly used in enterprise contexts: surfacing relevant documents, prioritising support tickets, and recommending knowledge base articles.

The core technical challenge, finding semantically similar items and personalising to user history, overlaps heavily with semantic search and RAG infrastructure.

What are the main types of recommendation systems?

Collaborative filtering

Recommends items based on what similar users liked, without using item content:

User-based CF: find users with similar history to the target user, recommend what they liked
Item-based CF: find items with similar interaction patterns, recommend similar ones
Matrix factorisation: decompose the user-item interaction matrix into latent factor vectors (SVD, ALS, BPR)

Strengths: captures taste patterns beyond content. Weaknesses: cold start problem (new users/items), sparse interaction data.

Content-based filtering

Recommends items similar to ones the user has previously engaged with, based on item features:

Compare item embeddings (text descriptions, images, metadata)
Return items whose vectors are closest to the user’s interaction history

Strengths: works for new items (no interaction data needed). Weaknesses: limited to “more of the same”; doesn’t discover new tastes.

Hybrid systems

Most production recommendation systems combine both:

final_score = α × collaborative_score + (1-α) × content_score

Neural approaches (Two-Tower models, DLRM) learn to combine signals end-to-end.

How do embeddings power modern recommendation?

The modern approach represents both users and items as dense vectors in a shared embedding space. Recommendation becomes a nearest-neighbour search:

Encode item content (text, images, metadata) into item embeddings
Build user embeddings from their interaction history (average of interacted item embeddings, or a learned aggregation)
Retrieve nearest item vectors to the user’s vector via ANN search

import numpy as np
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Encode item descriptions
item_embeddings = np.stack(
    [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=d) for d in item_descriptions])]
)

# User embedding = average of their interaction history
user_embedding = item_embeddings[user_interactions].mean(axis=0)

# ANN search for nearest items
recommendations = vector_db.search(user_embedding, top_k=20)

SIE provides the encoding step; a vector database (Qdrant, Weaviate) handles the retrieval.

The cold start problem

Cold start occurs when there’s insufficient interaction data to make good recommendations:

Scenario	Type	Solution
New user, no history	User cold start	Content-based from onboarding, popular items
New item, no interactions	Item cold start	Content embeddings, metadata-based
New system	System cold start	Content-based until data accumulates

Content embeddings (from SIE) are the standard solution to item cold start. You can recommend new items based on their semantic similarity to items the user has engaged with.

Evaluation metrics for recommendation systems

Metric	What it measures
Precision@K	Of top-K recommendations, fraction that are relevant
Recall@K	Of all relevant items, fraction retrieved in top-K
NDCG@K	Ranking quality: relevant items ranked higher score better
Hit Rate	Whether the relevant item appears in top-K at all
MRR	Mean Reciprocal Rank: how high the first relevant result is

Recommendation systems vs semantic search

	Semantic search	Recommendation
Input	Explicit query	Implicit behaviour / history
Personalisation	Usually none	Core goal
Cold start	Less of a problem	Major challenge
Infrastructure	Embedding model + vector DB	Same + user model

The infrastructure overlaps significantly. Systems built on SIE for semantic search can be extended to support recommendation by adding a user history aggregation layer.

Frequently asked questions

What is a Two-Tower model? A Two-Tower (or dual-encoder) model uses separate encoders for users and items, training them end-to-end with contrastive loss so that relevant user-item pairs are close in embedding space. This is essentially the same architecture as bi-encoder text embedding models.

How is recommendation different from personalised search? Personalised search takes an explicit query and re-ranks results based on user history. Recommendation has no explicit query; it surfaces items proactively based on predicted interest.

What role does a reranker play in recommendation? After fast ANN retrieval of candidates, a reranker (cross-encoder or gradient boosted model) can score each candidate more precisely using richer features, the same pattern as in search pipelines.