Config API
Add models to a running SIE cluster with a single API call. If the model’s adapter is already in a deployed bundle, no adapter image rebuild is needed. The change is written to sie-config, distributed over NATS, and mirrored by every gateway replica.
The Config API is split across two services:
| Service | Role |
|---|---|
sie-config | Authoritative control plane. Owns writes, persistence, bundle metadata, snapshots, epoch, and NATS publishing. |
sie-gateway | Read-side cache. Serves config reads, resolve, and per-replica SIE server sidecar readiness status. It does not handle config writes. |
Quick Example
Section titled “Quick Example”# Add a model at runtime through sie-configcurl -X POST http://sie-config:8080/v1/configs/models \ -H "Content-Type: application/x-yaml" \ -H "Authorization: Bearer $SIE_ADMIN_TOKEN" \ -H "Idempotency-Key: add-e5-base-001" \ -d 'sie_id: intfloat/multilingual-e5-basehf_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 adapter_options: loadtime: {} runtime: pooling: mean normalize: true'Response:
{ "model_id": "intfloat/multilingual-e5-base", "created_profiles": ["default"], "existing_profiles_skipped": [], "warnings": [], "routable_bundles_by_profile": {"default": ["default"]}, "router_id": "sie-config"}This response means the config was accepted, persisted, and applied to sie-config’s registry. NATS publish failures are surfaced in warnings; a fully unavailable publisher returns 503. The response does not mean every eligible SIE server sidecar has reported readiness for the model. To check serving readiness, poll a gateway replica:
curl http://sie-gateway:8080/v1/configs/models/intfloat/multilingual-e5-base/status \ -H "Authorization: Bearer $SIE_AUTH_TOKEN"{ "model_id": "intfloat/multilingual-e5-base", "config_epoch": 42, "all_bundles_acked": true, "no_bundles": false, "bundles": [ { "bundle_id": "default", "expected_bundle_config_hash": "sha256...", "total_eligible_workers": 3, "acked_workers": ["worker-a", "worker-b", "worker-c"], "pending_workers": [], "acked": true } ], "source": "gateway-registry"}How It Works
Section titled “How It Works”Admin client -> POST /v1/configs/models on sie-config -> persist model YAML to ConfigStore -> mutate sie-config ModelRegistry -> increment config epoch -> publish NATS deltas: sie.config.models.{bundle_id} -> SIE server sidecars inside worker pods sie.config.models._all -> gateways
Gateways -> apply _all deltas to their ModelRegistry -> poll /v1/configs/epoch for missed deltas or bundle drift -> expose /v1/configs/models/{id}/status for readiness- Admin tooling sends
POST /v1/configs/modelstosie-config. sie-configvalidates that every new profile’sadapter_pathis routable by at least one known bundle.- A single-process asyncio write lock serializes persist, registry mutation, epoch increment, and NATS publish.
- SIE server sidecars subscribed to
sie.config.models.{bundle_id}receive bundle-scoped config notifications and forward accepted YAML to thesie-serveradapter over IPC. - Gateways subscribed to
sie.config.models._allupdate their in-memory registries. - SIE server sidecars publish the updated
bundle_config_hashinsie.health.<worker_id>after thesie-serveradapter accepts the config, or after export reconciliation catches up. - Gateway
/statusendpoints expose whether this replica has eligible SIE server sidecar health records with the expected hash.
When to Use
Section titled “When to Use”| Scenario | Use Config API? | Alternative |
|---|---|---|
| Add a model with an existing adapter | Yes | - |
| Add a new profile to an existing model | Yes | - |
| Add a model that needs a new adapter | No | Create adapter, rebuild bundle image |
| Add a new bundle | No | Define in repo, rebuild images |
| Change a model’s adapter_path | No | Append-only; create a new profile instead |
The Config API is append-only. You can add models and profiles, but not modify or delete existing ones.
Endpoints
Section titled “Endpoints”Endpoint Placement
Section titled “Endpoint Placement”| Endpoint | sie-config | sie-gateway |
|---|---|---|
POST /v1/configs/models | Yes | No, returns 405 Method Not Allowed |
GET /v1/configs/models | Yes | Yes, from gateway registry |
GET /v1/configs/models/{id} | Yes | Yes, from gateway registry |
GET /v1/configs/models/{id}/status | No | Yes, per-replica config-hash readiness |
GET /v1/configs/bundles | Yes | Yes |
GET /v1/configs/bundles/{id} | Yes | Yes |
POST /v1/configs/resolve | Yes | Yes |
GET /v1/configs/export | Yes | No, consumed by gateways |
GET /v1/configs/epoch | Yes | No, consumed by gateways |
List Models
Section titled “List Models”curl http://sie-gateway:8080/v1/configs/models{ "models": [ { "model_id": "BAAI/bge-m3", "profiles": ["default", "sparse"], "source": "gateway-registry" }, { "model_id": "intfloat/multilingual-e5-base", "profiles": ["default"], "source": "gateway-registry" } ]}On the gateway, source: "gateway-registry" means the response comes from that replica’s in-memory config mirror. Call sie-config directly if you need to distinguish persisted API-added models from filesystem seed models.
Get Model
Section titled “Get Model”curl http://sie-gateway:8080/v1/configs/models/BAAI/bge-m3On the gateway, this returns a minimal YAML registry view with sie_id, source: gateway-registry, and compatible bundles. Call sie-config directly for the full stored model YAML with profile definitions.
Add Model
Section titled “Add Model”curl -X POST http://sie-config:8080/v1/configs/models \ -H "Content-Type: application/x-yaml" \ -H "Authorization: Bearer $SIE_ADMIN_TOKEN" \ -d @model-config.yaml| Status | Meaning |
|---|---|
201 | Model or profiles created |
200 | All profiles already existed (idempotent) |
400 | Invalid YAML |
401 | SIE_ADMIN_TOKEN is configured but the request is missing bearer auth |
403 | Write attempted with only the inference token configured |
409 | Profile exists with different content (content-equality check) |
422 | Validation failed (unroutable adapter, missing fields) |
503 | NATS unavailable or config store unavailable |
The gateway does not register this route. If you send the same POST to sie-gateway, the response is 405 Method Not Allowed.
List Bundles
Section titled “List Bundles”curl http://sie-gateway:8080/v1/configs/bundles{ "bundles": [ { "bundle_id": "default", "priority": 10, "adapter_count": 18, "source": "gateway-registry", "connected_workers": 3 } ]}Get Bundle
Section titled “Get Bundle”curl http://sie-gateway:8080/v1/configs/bundles/defaultReturns bundle metadata as YAML including the adapter list.
Resolve Routing
Section titled “Resolve Routing”curl -X POST http://sie-gateway:8080/v1/configs/resolve \ -H "Content-Type: application/json" \ -d '{"model": "BAAI/bge-m3", "bundle": "default"}'Returns the bundle that would be selected for a request without executing inference. Omit bundle to use the registry’s default bundle priority, or use the default:/BAAI/bge-m3 model-spec form for an explicit bundle override.
Config YAML Format
Section titled “Config YAML Format”The model config format is the same as static model configs. For runtime writes, sie-config validates the YAML schema and requires new profiles to be routable by existing bundle adapters. Full metadata such as hf_id, inputs, and tasks is recommended for catalog quality; many adapters can run from sie_id plus profiles alone.
Minimal Config
Section titled “Minimal Config”sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192Full Config
Section titled “Full Config”sie_id: intfloat/multilingual-e5-basehf_id: intfloat/multilingual-e5-base
inputs: text: true
tasks: encode: dense: dim: 768
profiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 adapter_options: loadtime: {} runtime: pooling: mean normalize: true financial: extends: default adapter_options: runtime: pooling: mean normalize: true instruction: "Retrieve financial documents"Profile Append
Section titled “Profile Append”POST the same sie_id with additional profiles. Existing profiles are skipped; new ones are created.
sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 medical: extends: default adapter_options: runtime: instruction: "Retrieve medical literature"Response: 201 with created_profiles: ["medical"] and existing_profiles_skipped: ["default"].
Serving Readiness
Section titled “Serving Readiness”POST /v1/configs/models does not wait for SIE server sidecar convergence. sie-config has no sidecar-health registry, so readiness is a read-side concern on each gateway replica. SIE server sidecars advertise the local bundle_config_hash after the sie-server adapter applies a config delta or replays a missed entry from GET /v1/configs/export.
| Field | Description |
|---|---|
config_epoch | Highest control-plane epoch applied on this gateway |
all_bundles_acked | true when every eligible bundle has at least one healthy SIE server sidecar health record with the expected hash |
no_bundles | true when the model resolves to zero bundles on this gateway |
bundles[].expected_bundle_config_hash | Hash SIE server sidecars must report for the bundle |
bundles[].acked_workers | Healthy worker IDs whose reported hash matches |
bundles[].pending_workers | Healthy eligible worker IDs that have not reported the expected hash |
all_bundles_acked: false does not mean the write failed. The model can already be in the catalog while SIE server sidecar health is still catching up or the worker pod is scaling from zero. Admin tooling that needs a fleet-wide view should poll every gateway replica.
Persistence
Section titled “Persistence”API-added models are persisted by sie-config, not by the gateway. On sie-config startup, SIE_CONFIG_RESTORE=true restores model configs from the configured store. Gateways do not read the store directly; they fetch snapshots from sie-config.
Storage Backends
Section titled “Storage Backends”| Backend | Config | Use Case |
|---|---|---|
| Local filesystem | SIE_CONFIG_STORE_DIR=/data/config | Development or Kubernetes PVC |
| S3 | SIE_CONFIG_STORE_DIR=s3://bucket/prefix | AWS production persistence |
| GCS | SIE_CONFIG_STORE_DIR=gs://bucket/prefix | GCP production persistence |
sie-config runs as a single writer. The local backend writes atomically with a temp file, fsync, and replace; cloud backends use object-store PUT semantics.
Environment Variables
Section titled “Environment Variables”| Variable | Default | Description |
|---|---|---|
SIE_CONFIG_STORE_DIR | Local pod filesystem | Config store path used by sie-config |
SIE_CONFIG_RESTORE | false | Set to true to restore API-added models from the store on sie-config startup |
SIE_NATS_URL | None | NATS server URL for config distribution |
SIE_BUNDLES_DIR | /app/bundles | Bundle YAML directory baked into the sie-config image |
SIE_MODELS_DIR | /app/models | Baseline model YAML directory baked into the sie-config image |
NATS Distribution
Section titled “NATS Distribution”Config changes are distributed to SIE server sidecars and gateways via NATS Core pub/sub. NATS is transport for config deltas, not the durable source of truth.
| Subject | Subscribers | Purpose |
|---|---|---|
sie.config.models.{bundle_id} | SIE server sidecars in that bundle | Per-bundle config notifications |
sie.config.models._all | All gateways | Gateway registry sync |
Gateway Recovery
Section titled “Gateway Recovery”Gateways recover missed messages by polling sie-config:
GET /v1/configs/epochreturns the authoritative epoch plus abundles_hash.- If the epoch or bundle hash drifts, the gateway re-runs bootstrap.
- Bootstrap fetches bundles from
GET /v1/configs/bundles{,/{id}}and models fromGET /v1/configs/export.
NATS Unavailable
Section titled “NATS Unavailable”If NATS is configured but temporarily unavailable:
- Config writes return
503with{"detail": {"error": "nats_unavailable", "message": "..."}}rather than persisting a change that cannot be distributed. - Existing inference depends on the separate JetStream work queue and continues only if that queue path is healthy.
- Once config pub/sub recovers, gateways close any missed-delta gap through the epoch poller.
If only some bundle publishes fail, the write can still return 201 with a warnings entry such as nats_publish_partial. The config is durable, and gateways recover through the epoch/export path; SIE server sidecars on the affected bundle may lag until their live subscriber or export reconciler catches up.
Authentication
Section titled “Authentication”Config API uses the same auth tokens as the rest of the SIE API:
| Operation | Token Required |
|---|---|
GET /v1/configs/* | SIE_AUTH_TOKEN or SIE_ADMIN_TOKEN depending on deployment auth mode |
POST /v1/configs/models on sie-config | SIE_ADMIN_TOKEN |
GET /v1/configs/export on sie-config | SIE_ADMIN_TOKEN |
If neither token is configured, all endpoints are open (development mode). If SIE_AUTH_TOKEN is set but SIE_ADMIN_TOKEN is not, writes are rejected with 403; the inference token never grants config-write access.
Helm Configuration
Section titled “Helm Configuration”Kubernetes deployments run sie-config and sie-gateway as separate deployments. Enable NATS-based config distribution and persistent config storage in Helm values:
nats: enabled: true
config: enabled: true configStore: enabled: true size: 10Gi
gateway: replicas: 2The chart’s built-in persistence path is the config.configStore PVC. The sie-config service also supports SIE_CONFIG_STORE_DIR=s3://... or gs://..., but wiring that environment variable requires a chart overlay or custom deployment because the stock values file does not expose an extraEnv knob for the config service.
Limitations
Section titled “Limitations”- Append-only: Models and profiles cannot be modified or deleted after creation.
- Adapter must be bundled: The model’s
adapter_pathmust exist in at least one known bundle. Adding models that require new adapters still requires an image rebuild. - Bundles are build-time only: Bundles cannot be created or modified via API. Rebuild and redeploy
sie-configplus worker-pod images for bundle changes; gateways pick up the new bundle set fromsie-config. sie-configis single-writer: Run one replica. Multi-replica writes require shared idempotency state, which is intentionally not part of the current topology.- Readiness is per gateway replica:
GET /v1/configs/models/{id}/statusreports the SIE server sidecar health records visible to that gateway. Poll all replicas for a fleet-wide view. - Gateway cold start depends on
sie-config: A fresh gateway that cannot reachsie-configstarts with whatever optional filesystem seed was mounted. In the default deployment, typed requests may return404until bootstrap succeeds.