---
title: How Does Weaviate Work with Embedding Models?
description: Weaviate is an open-source vector database that stores objects with vector representations and enables semantic search, hybrid search, and filtered retrieval via a GraphQL or REST API. It works with embedding models by accepting vectors at insert time (generated by SIE) and indexing them in HNSW graphs for fast AN...
canonical_url: https://superlinked.com/glossary/how-does-weaviate-work-with-embedding-models
last_updated: 2026-06-11
---

# How Does Weaviate Work with Embedding Models?

Weaviate is an open-source vector database that stores objects with vector representations and enables semantic search, hybrid search, and filtered retrieval via a GraphQL or REST API. It works with embedding models by accepting vectors at insert time (generated by SIE) and indexing them in HNSW graphs for fast ANN retrieval. Weaviate's module system also enables direct integration with external vectorisers, though SIE's self-hosted approach keeps data within your own infrastructure.

---

## Why Weaviate?

Weaviate is a strong choice when:

- Your team prefers **GraphQL** as the query interface
- You want **schema-based** data modelling with typed properties
- You need **hybrid search** (BM25 + vector) out of the box
- You're building with **LangChain or LlamaIndex** (Weaviate has first-class integrations)
- You want a **module ecosystem** for connecting additional ML models

Weaviate's schema approach, defining data classes with properties, makes it particularly well-suited for structured document retrieval where metadata filtering is as important as vector similarity.

---

## How Weaviate and SIE work together

SIE encodes documents; Weaviate stores and retrieves them:

```python
import weaviate
from sie_sdk import SIEClient
from sie_sdk.types import Item

sie = SIEClient("http://localhost:8080")
w = weaviate.Client("http://localhost:8080")  # Weaviate client

# 1. Define schema
w.schema.create_class({
    "class": "Document",
    "vectorizer": "none",  # We provide vectors externally via SIE
    "properties": [
        {"name": "text", "dataType": ["text"]},
        {"name": "source", "dataType": ["text"]},
        {"name": "date", "dataType": ["date"]}
    ]
})

# 2. Encode and insert documents
encode_results = sie.encode("BAAI/bge-m3", [Item(text=c) for c in document_chunks])
vectors = [r["dense"] for r in encode_results]

with w.batch as batch:
    for chunk, vector, source, date in zip(document_chunks, vectors, sources, dates):
        batch.add_data_object(
            data_object={"text": chunk, "source": source, "date": date},
            class_name="Document",
            vector=vector.tolist()
        )

# 3. Search
query_vector = sie.encode("BAAI/bge-m3", Item(text=user_query), is_query=True)["dense"]

result = (
    w.query
    .get("Document", ["text", "source", "date"])
    .with_near_vector({"vector": query_vector.tolist()})
    .with_limit(20)
    .do()
)
```

---

## Hybrid search with Weaviate and SIE

Weaviate's hybrid search combines BM25 keyword search with vector search using Reciprocal Rank Fusion:

```python
# Hybrid search — no separate query encoding needed for BM25 component
result = (
    w.query
    .get("Document", ["text", "source"])
    .with_hybrid(
        query=user_query,           # BM25 uses text directly
        vector=query_vector.tolist(),  # SIE-encoded vector for semantic search
        alpha=0.5,                  # 0 = pure BM25, 1 = pure vector, 0.5 = balanced
        fusion_type="relativeScoreFusion"
    )
    .with_limit(20)
    .do()
)
```

BGE-M3's sparse vectors can also be used with Weaviate's sparse vector support for even more accurate hybrid retrieval.

---

## Filtered vector search in Weaviate

Weaviate's `where` filter enables metadata-filtered ANN search:

```python
result = (
    w.query
    .get("Document", ["text", "source", "date"])
    .with_near_vector({"vector": query_vector.tolist()})
    .with_where({
        "path": ["date"],
        "operator": "GreaterThan",
        "valueDate": "2024-01-01T00:00:00Z"
    })
    .with_limit(20)
    .do()
)
```

Weaviate uses a roaring bitmap filter for high-performance metadata filtering that maintains recall under selective filters.

---

## Weaviate v4 client (Python)

Weaviate v4 introduced a new Python client with a cleaner API:

```python
import weaviate
import weaviate.classes as wvc

client = weaviate.connect_to_local()

# Create collection
documents = client.collections.create(
    name="Document",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
    properties=[
        wvc.config.Property(name="text", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="source", data_type=wvc.config.DataType.TEXT),
    ]
)

# Insert with vectors
documents.data.insert_many([
    wvc.data.DataObject(properties={"text": chunk, "source": src}, vector=vec.tolist())
    for chunk, src, vec in zip(chunks, sources, vectors)
])

# Search
results = documents.query.near_vector(
    near_vector=query_vector.tolist(),
    limit=20
)
```

---

## Weaviate vs Qdrant: key differences

| | Weaviate | Qdrant |
|---|---|---|
| Query language | GraphQL + REST | REST + gRPC |
| Schema | Required (typed classes) | Optional (flexible payload) |
| Hybrid search | ✓ (built-in) | ✓ (built-in) |
| Module ecosystem | ✓ (vectorisers, readers) | Limited |
| Language | Go | Rust |
| Performance | Good | Slightly faster at scale |
| Best for | Structured docs, LangChain | Raw performance, flexibility |

---

## Frequently asked questions

**What is Weaviate's vectorizer module system?**
Weaviate modules allow automatic vectorisation at insert time using external models (OpenAI, Cohere, HuggingFace). When using SIE, set `vectorizer: "none"` and provide vectors directly. This gives you full control over the encoding model and keeps data off external APIs.

**Does Weaviate support multi-vector / ColBERT retrieval?**
Weaviate has added multi-vector support. Check the [SIE + Weaviate integration docs](/docs/integrations/weaviate) for current ColBERT implementation details.

**Can I run Weaviate and SIE in the same Kubernetes cluster?**
Yes. Both are containerised and can run in the same cluster, with SIE on GPU nodes and Weaviate on CPU nodes.

---

## Related resources

- [SIE + Weaviate full integration guide](/docs/integrations/weaviate)
- [What is a vector database?](/glossary/what-is-a-vector-database)
- [What is hybrid search?](/glossary/what-is-hybrid-search)
- [What is BGE-M3?](/glossary/what-is-bge-m3)
- [How does Qdrant work with embedding models?](/glossary/how-does-qdrant-work-with-embedding-models)
