Why did we open-source our inference engine? Read the post
← All Glossary Articles

How Does Haystack Work with Embedding Models?

Haystack is an open-source framework for building NLP and RAG pipelines by connecting modular components (document stores, retrievers, readers, and generators) into composable directed acyclic graphs. It works with embedding models by providing retriever components that call the embedding model at query and index time. SIE integrates with Haystack as a self-hosted embedding backend, replacing managed API calls with GPU inference in your own cloud.


Why Haystack?

Haystack is the right choice when:

  • You want a structured pipeline framework rather than writing retrieval logic from scratch
  • You need to compose complex multi-hop or multi-stage pipelines (retrieve → rerank → generate → verify)
  • You’re building production RAG systems and want pre-built components for evaluation, caching, and monitoring
  • You want to swap components (different retrievers, LLMs, document stores) without rewriting pipeline logic
  • You need multi-modal pipelines handling both text and images

Haystack’s component abstraction means you can prototype with OpenAI embeddings and switch to SIE self-hosted inference for production without changing pipeline logic.


Core Haystack concepts

Components: individual pipeline steps with typed inputs and outputs (EmbeddingRetriever, SentenceTransformersTextEmbedder, OpenAIGenerator, etc.)

Pipeline: a directed acyclic graph of components connected by their input/output types

Document Store: the storage layer (InMemoryDocumentStore, QdrantDocumentStore, WeaviateDocumentStore, etc.)

Documents: Haystack’s data type representing a piece of text with metadata and an optional embedding vector


How SIE integrates with Haystack

SIE acts as the embedding backend. Use a custom TextEmbedder component that calls SIE:

from haystack import component, Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from sie_sdk import SIEClient
from sie_sdk.types import Item
# Custom SIE embedder component
@component
class SIETextEmbedder:
def __init__(self, model: str = "BAAI/bge-m3"):
self.client = SIEClient("http://localhost:8080")
self.model = model
@component.output_types(embedding=list[float])
def run(self, text: str):
result = self.client.encode(self.model, Item(text=text), is_query=True)
return {"embedding": result["dense"].tolist()}
@component
class SIEDocumentEmbedder:
def __init__(self, model: str = "BAAI/bge-m3"):
self.client = SIEClient("http://localhost:8080")
self.model = model
@component.output_types(documents=list[Document])
def run(self, documents: list[Document]):
texts = [doc.content for doc in documents]
encode_results = self.client.encode(
self.model,
[Item(text=t) for t in texts],
)
for doc, res in zip(documents, encode_results):
doc.embedding = res["dense"].tolist()
return {"documents": documents}

Building an indexing pipeline with SIE + Haystack

from haystack import Pipeline
from haystack.components.writers import DocumentWriter
document_store = InMemoryDocumentStore()
# Indexing pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", SIEDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder.documents", "writer.documents")
# Index documents
raw_docs = [Document(content=chunk, meta={"source": src}) for chunk, src in zip(chunks, sources)]
indexing_pipeline.run({"embedder": {"documents": raw_docs}})

Building a RAG query pipeline with SIE + Haystack

from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
PROMPT = """
Answer the question using the provided context.
Context: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ query }}
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component("query_embedder", SIETextEmbedder())
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=5))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=PROMPT))
rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))
rag_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")
result = rag_pipeline.run({
"query_embedder": {"text": "What are the termination conditions?"},
"prompt_builder": {"query": "What are the termination conditions?"}
})
print(result["llm"]["replies"][0])

Using Haystack with Qdrant and SIE

For production, swap InMemoryDocumentStore for QdrantDocumentStore:

from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
document_store = QdrantDocumentStore(
url="http://localhost:6333",
index="documents",
embedding_dim=1024, # BGE-M3 output dim
)
retriever = QdrantEmbeddingRetriever(document_store=document_store, top_k=20)

Haystack vs LangChain vs LlamaIndex

HaystackLangChainLlamaIndex
Pipeline modelTyped DAGChain / AgentQuery engine
Component typingStrictLooseMedium
RAG focusGeneral
Evaluation toolingStrongGrowingGood
Production maturityHighHighHigh
Best forProduction RAG, evaluationAgents, diverse tasksDocument QA

All three integrate well with SIE; the choice comes down to team familiarity and pipeline complexity.


Frequently asked questions

Does Haystack have a built-in SIE integration? The SIE SDK is used via a custom component as shown above. A native Haystack SIE integration is available. See the SIE + Haystack integration guide for the current implementation.

Can Haystack pipelines be serialised and deployed? Yes. Haystack pipelines serialise to YAML, enabling reproducible deployments and version-controlled pipeline definitions.

Does Haystack support reranking with SIE? Yes. Add a custom reranker component that calls client.score() between the retriever and prompt builder steps.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.