Why did we open-source our inference engine? Read the post
← All Glossary Articles

What is a Recommendation System?

A recommendation system is a machine learning system that predicts which items a user is most likely to find relevant or useful, and surfaces them proactively. It learns from user behaviour (clicks, purchases, ratings) and item characteristics to personalise suggestions. The three main approaches are collaborative filtering, content-based filtering, and hybrid systems that combine both.


Why do recommendation systems matter?

Recommendation systems drive a significant share of engagement and revenue in consumer products. Netflix estimates ~80% of content watched comes from recommendations, and Amazon attributes ~35% of revenue to them. They are also increasingly used in enterprise contexts: surfacing relevant documents, prioritising support tickets, and recommending knowledge base articles.

The core technical challenge, finding semantically similar items and personalising to user history, overlaps heavily with semantic search and RAG infrastructure.


What are the main types of recommendation systems?

Collaborative filtering

Recommends items based on what similar users liked, without using item content:

  • User-based CF: find users with similar history to the target user, recommend what they liked
  • Item-based CF: find items with similar interaction patterns, recommend similar ones
  • Matrix factorisation: decompose the user-item interaction matrix into latent factor vectors (SVD, ALS, BPR)

Strengths: captures taste patterns beyond content. Weaknesses: cold start problem (new users/items), sparse interaction data.

Content-based filtering

Recommends items similar to ones the user has previously engaged with, based on item features:

  • Compare item embeddings (text descriptions, images, metadata)
  • Return items whose vectors are closest to the user’s interaction history

Strengths: works for new items (no interaction data needed). Weaknesses: limited to “more of the same”; doesn’t discover new tastes.

Hybrid systems

Most production recommendation systems combine both:

final_score = α × collaborative_score + (1-α) × content_score

Neural approaches (Two-Tower models, DLRM) learn to combine signals end-to-end.


How do embeddings power modern recommendation?

The modern approach represents both users and items as dense vectors in a shared embedding space. Recommendation becomes a nearest-neighbour search:

  1. Encode item content (text, images, metadata) into item embeddings
  2. Build user embeddings from their interaction history (average of interacted item embeddings, or a learned aggregation)
  3. Retrieve nearest item vectors to the user’s vector via ANN search
import numpy as np
from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Encode item descriptions
item_embeddings = np.stack(
[r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=d) for d in item_descriptions])]
)
# User embedding = average of their interaction history
user_embedding = item_embeddings[user_interactions].mean(axis=0)
# ANN search for nearest items
recommendations = vector_db.search(user_embedding, top_k=20)

SIE provides the encoding step; a vector database (Qdrant, Weaviate) handles the retrieval.


The cold start problem

Cold start occurs when there’s insufficient interaction data to make good recommendations:

ScenarioTypeSolution
New user, no historyUser cold startContent-based from onboarding, popular items
New item, no interactionsItem cold startContent embeddings, metadata-based
New systemSystem cold startContent-based until data accumulates

Content embeddings (from SIE) are the standard solution to item cold start. You can recommend new items based on their semantic similarity to items the user has engaged with.


Evaluation metrics for recommendation systems

MetricWhat it measures
Precision@KOf top-K recommendations, fraction that are relevant
Recall@KOf all relevant items, fraction retrieved in top-K
NDCG@KRanking quality: relevant items ranked higher score better
Hit RateWhether the relevant item appears in top-K at all
MRRMean Reciprocal Rank: how high the first relevant result is

Semantic searchRecommendation
InputExplicit queryImplicit behaviour / history
PersonalisationUsually noneCore goal
Cold startLess of a problemMajor challenge
InfrastructureEmbedding model + vector DBSame + user model

The infrastructure overlaps significantly. Systems built on SIE for semantic search can be extended to support recommendation by adding a user history aggregation layer.


Frequently asked questions

What is a Two-Tower model? A Two-Tower (or dual-encoder) model uses separate encoders for users and items, training them end-to-end with contrastive loss so that relevant user-item pairs are close in embedding space. This is essentially the same architecture as bi-encoder text embedding models.

How is recommendation different from personalised search? Personalised search takes an explicit query and re-ranks results based on user history. Recommendation has no explicit query; it surfaces items proactively based on predicted interest.

What role does a reranker play in recommendation? After fast ANN retrieval of candidates, a reranker (cross-encoder or gradient boosted model) can score each candidate more precisely using richer features, the same pattern as in search pipelines.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.