Why did we open-source our inference engine? Read the post

laion/CLIP-ViT-H-14-laion2B-s32B-b79K

1. Model Details 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model

Architecture
CLIP
Parameters
986M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 1,024
Max Sequence Length
77 tokens
License
mit

Benchmarks

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.8624
map at 10 0.7856
mrr at 10 0.9488
Performance L4-SPOT b1 c8
Corpus 181 tok/s
Corpus p50 533.4ms
Query 10.2 img/s
Query p50 625.9ms
Performance L4 b1 c16
Corpus 580 tok/s
Corpus p50 388.7ms
Query 6.6 mpix/s
Query p50 395.9ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.