Why did we open-source our inference engine? Read the post

vidore/colqwen2.5-v0.2

ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.

Architecture
Qwen2
Parameters
7.0B
Tasks
Encode
Outputs
Multi-Vec
Dimensions
Multi-Vec: 128
Max Sequence Length
2,048 tokens
License
mit
Languages
en

Benchmarks

Vidore3ComputerScienceRetrieval

technology retrieval en

Visual document retrieval on computer science papers and slides

Performance L4 b1 c16
Corpus 7.6 mpix/s
Corpus p50 1.9s
Query 337 tok/s
Query p50 414.9ms
Reference →

Vidore3FinanceEnRetrieval

finance retrieval en

Visual document retrieval on financial reports

Performance L4 b1 c16
Corpus 7.6 mpix/s
Corpus p50 1.9s
Query 315 tok/s
Query p50 413.7ms
Reference →

Vidore3HrRetrieval

general retrieval en

Visual document retrieval on HR-related documents

Performance L4 b1 c16
Corpus 7.8 mpix/s
Corpus p50 1.9s
Query 377 tok/s
Query p50 429.2ms
Reference →

Vidore3PharmaceuticalsRetrieval

medical retrieval en

Visual document retrieval on pharmaceutical documents

Performance L4 b1 c16
Corpus 5.4 mpix/s
Corpus p50 1.8s
Query 348 tok/s
Query p50 425.4ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.