openai/clip-vit-base-patch32

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Architecture

CLIP

Parameters

151M

Tasks

Encode

Outputs

Dense

Dimensions

Dense: 512

Max Sequence Length

77 tokens

License

—

Benchmarks

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000

Quality

ndcg at 10 0.7165

map at 10 0.6029

mrr at 10 0.8521

Performance L4 b1 c16

Corpus 958 tok/s

Corpus p50 234.0ms

Query 10.0 mpix/s

Query p50 245.6ms