LLM Vision is compatible with multiple providers, each of which has different models available. Some providers run in the cloud, while others are self-hosted.
To see which model is best for your use case, check the figure below. It visualizes the averaged MMMU scores of available cloud-based models. The higher the score, the more accurate the output.
gpt-5-miniis the recommended model due to its strong performance-to-price ratio.