Why did we open-source our inference engine? Read the post

google/siglip-so400m-patch14-384

SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository.

Architecture
SigLIP
Parameters
878M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 1,152
Max Sequence Length
64 tokens
License
apache-2.0

Benchmarks

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.9001
map at 10 0.8364
mrr at 10 0.9663
Performance L4-SPOT b1 c8
Corpus 202 tok/s
Corpus p50 523.6ms
Query 9.7 img/s
Query p50 711.3ms
Performance L4 b1 c16
Corpus 597 tok/s
Corpus p50 381.3ms
Query 5.7 mpix/s
Query p50 459.0ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.