vidore/colqwen2.5-v0.2
ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
Benchmarks
Vidore3ComputerScienceRetrieval
Visual document retrieval on computer science papers and slides
Performance L4 b1 c16
Corpus 7.6 mpix/s
Corpus p50 1.9s
Query 337 tok/s
Query p50 414.9ms
Vidore3FinanceEnRetrieval
Visual document retrieval on financial reports
Performance L4 b1 c16
Corpus 7.6 mpix/s
Corpus p50 1.9s
Query 315 tok/s
Query p50 413.7ms
Vidore3HrRetrieval
Visual document retrieval on HR-related documents
Performance L4 b1 c16
Corpus 7.8 mpix/s
Corpus p50 1.9s
Query 377 tok/s
Query p50 429.2ms
Vidore3PharmaceuticalsRetrieval
Visual document retrieval on pharmaceutical documents
Performance L4 b1 c16
Corpus 5.4 mpix/s
Corpus p50 1.8s
Query 348 tok/s
Query p50 425.4ms