Why did we open-source our inference engine? Read the post

openai/clip-vit-base-patch32

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Architecture
CLIP
Parameters
151M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 512
Max Sequence Length
77 tokens
License

Benchmarks

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.7165
map at 10 0.6029
mrr at 10 0.8521
Performance L4 b1 c16
Corpus 958 tok/s
Corpus p50 234.0ms
Query 10.0 mpix/s
Query p50 245.6ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.