laion/CLIP-ViT-H-14-laion2B-s32B-b79K

1. Model Details 3. Training Details 4. Evaluation 5. Acknowledgements 6. Citation 7. How To Get Started With the Model

Architecture

CLIP

Parameters

986M

Tasks

Encode

Outputs

Dense

Dimensions

Dense: 1,024

Max Sequence Length

77 tokens

License

mit

Benchmarks

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000

Quality

ndcg at 10 0.8624

map at 10 0.7856

mrr at 10 0.9488

Performance L4-SPOT b1 c8

Corpus 181 tok/s

Corpus p50 533.4ms

Query 10.2 img/s

Query p50 625.9ms

Performance L4 b1 c16

Corpus 580 tok/s

Corpus p50 388.7ms

Query 6.6 mpix/s

Query p50 395.9ms