openai/clip-vit-large-patch14

Disclaimer: The model card is taken and modified from the official CLIP repository, it can be found here.

Overview

Architecture

CLIP

Parameters

428M

Tasks

Encode

Outputs

Dense

Dimensions

Dense: 768

Max Sequence Length

77 tokens

License

—

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000

Quality

ndcg at 10 0.7824

map at 10 0.6816

mrr at 10 0.9111

Performance L4 b1 c16

Corpus 977 tok/s

Corpus p50 228.0ms

Query 7.6 mpix/s

Query p50 340.1ms