---
title: How Does SIE Compare to Infinity?
description: SIE (Superlinked Inference Engine) and Infinity are both open-source servers for self-hosting text embedding and reranking models. Infinity is a lightweight, fast single-model server with a focus on OpenAI-compatible API endpoints. SIE is a broader inference platform with multi-model support, LoRA hot-loading, GPU c...
canonical_url: https://superlinked.com/glossary/sie-vs-infinity
last_updated: 2026-06-11
---

# How Does SIE Compare to Infinity?

SIE (Superlinked Inference Engine) and Infinity are both open-source servers for self-hosting text embedding and reranking models. Infinity is a lightweight, fast single-model server with a focus on OpenAI-compatible API endpoints. SIE is a broader inference platform with multi-model support, LoRA hot-loading, GPU cluster management via Terraform and Helm, and first-class support for document processing workloads.

---

## Quick comparison

| | SIE | Infinity |
|---|---|---|
| Model types | Embeddings, rerankers, OCR, extraction | Embeddings, rerankers, re-rank, CLIP |
| Multi-model per deployment | ✓ (shared GPU cluster) | Limited (one model per instance typical) |
| LoRA hot-loading | ✓ | ✗ |
| GPU cluster (Terraform + Helm) | ✓ | Manual |
| AWS / GCP Terraform modules | ✓ | ✗ |
| SDK | ✓ (`sie-sdk`) | OpenAI-compatible REST |
| OpenAI-compatible API | ✓ | ✓ (primary design goal) |
| Dynamic batching | ✓ | ✓ |
| INT8 / quantisation | ✓ | ✓ |
| Licence | Apache 2.0 | MIT |
| Backed by | Superlinked | Michael Feil (open source) |

---

## What is Infinity?

Infinity is a high-throughput embedding inference server created by Michael Feil. Its primary design goals are:

- **OpenAI API compatibility**: drop-in replacement for OpenAI's embedding endpoint, making it easy to swap without changing client code
- **Speed**: aggressive batching, CUDA optimisations, and Flash Attention for high throughput
- **Simplicity**: minimal configuration, designed to be started with a single Docker command

```bash
docker run michaelf34/infinity:latest \
  v2 --model-name-or-path BAAI/bge-m3 --port 7997
```

It's a strong choice for teams that need a quick self-hosted replacement for the OpenAI embeddings API.

---

## When should you use Infinity?

Infinity is a good fit when:

- You want an **OpenAI API drop-in**: your existing code uses `openai.embeddings.create()` and you want to swap to self-hosted without changing client code
- You need a **single model** served simply and quickly
- Your team prefers **minimal configuration** over infrastructure tooling
- You're deploying on existing infrastructure and don't need Terraform/Helm automation

---

## When should you use SIE?

SIE is the better choice when:

- You need **multiple models** in one deployment (embedding + reranker + OCR)
- You want **LoRA adapter hot-loading**: swap domain-specific adapters per-request without server restart
- You're deploying on **AWS or GCP** and want managed Terraform modules for the full cluster
- You need **document processing** capabilities (OCR, extraction) alongside embeddings
- You want a **production-grade SDK** rather than raw HTTP calls
- You need **SOC2 Type 2** certified infrastructure
- You want **built-in monitoring** and GPU utilisation metrics

---

## Performance comparison

Both servers implement dynamic batching and CUDA-optimised inference. For single-model, single-GPU benchmarks, Infinity and SIE achieve comparable throughput. Both are bottlenecked by the GPU, not the server layer.

The performance difference emerges at scale:

- **Multi-model workloads**: SIE's shared GPU memory pool is more efficient than running separate Infinity instances per model
- **Cluster scale**: SIE's auto-scaling handles traffic spikes; Infinity requires manual scaling
- **Concurrent mixed workloads**: encoding + reranking in the same pipeline benefits from SIE's coordinated batching

See the [SIE vs TEI vs OpenAI benchmark](/docs/examples/benchmark) for detailed throughput and cost data.

---

## Migration path: Infinity → SIE

If you're using Infinity and want to move to SIE, the transition is straightforward. SIE exposes an OpenAI-compatible endpoint, so client code changes are minimal:

```python
# Before (Infinity or OpenAI)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:7997", api_key="dummy")
response = client.embeddings.create(model="BAAI/bge-m3", input=texts)
vectors = [e.embedding for e in response.data]

# After (SIE SDK — more features, same data)
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
results = client.encode("BAAI/bge-m3", [Item(text=t) for t in texts])
vectors = [r["dense"] for r in results]
```

Or keep using the OpenAI-compatible REST endpoint with the same client code, just update the `base_url`.

---

## SIE vs Infinity vs TEI summary

| Use case | Recommended |
|---|---|
| Quick OpenAI drop-in, single model | Infinity |
| Single model, HuggingFace ecosystem | TEI |
| Production, multi-model, AWS/GCP | SIE |
| LoRA domain adaptation | SIE |
| Document processing + embeddings | SIE |
| Minimal devops, just need it working | Infinity or TEI |

---

## Frequently asked questions

**Is Infinity actively maintained?**
Yes. Infinity is actively developed and has a growing community. It's a legitimate production choice for single-model embedding serving.

**Does SIE support the OpenAI embeddings API format?**
Yes. SIE exposes an OpenAI-compatible `/v1/embeddings` endpoint, so you can use it as a drop-in replacement without changing OpenAI client code.

**Can I run SIE and Infinity in the same pipeline?**
In theory yes, but in practice you'd choose one. Both solve the same problem: self-hosted GPU inference for embedding models.

---

## Related resources

- [SIE vs TEI comparison](/glossary/sie-vs-tei)
- [SIE vs TEI vs OpenAI benchmark](/docs/examples/benchmark)
- [What is self-hosted inference?](/glossary/what-is-self-hosted-inference)
- [How to deploy on AWS](/glossary/how-to-deploy-embedding-model-on-aws)
- [Browse models on SIE](/models)
