Choosing the right model

Cloud-Based Models

LLM Vision is compatible with multiple providers, each of which has different models available. Some providers run in the cloud, while others are self-hosted. To see which model is best for your use case, check the figure below. It visualizes the averaged scores of available cloud-based models. The higher the score, the more accurate the output.

Gemini 2.0 Flash is priced at just $0.175/1M input tokens, but its performance surpasses that of GPT-4o with an MMMU score of 72.7 (compared to 69.1).

Data is based on the MMMU Leaderboard

Self-hosted Models

Gemma 3 with 12B parameters delivers performance comparable to GPT-4o Mini while remaining efficient enough to fit within 12GB of VRAM.

Data is based on the MMMU Leaderboard

Last updated

Was this helpful?