AI Models with Vision Support
This page lists Large Language Models that offer Vision. Compare models, see how they implement this feature, and find the best option for projects requiring robust Vision.
Providers
Models with this Capability
Claude 3 Haiku
Anthropic · Claude 3
Input
200K tokens
Output
4.1K tokens
Input Cost
$0.25/1M
Output Cost
$1.25/1M
Gemini 2.5 Flash Preview
Google · Gemini
Input
1M tokens
Output
0 tokens
Input Cost
$0.15/1M
Output Cost
$0.60/1M
Exceptional at:
Gemini 2.0 Flash
Google · Gemini
Input
1M tokens
Output
0 tokens
Input Cost
$0.10/1M
Output Cost
$0.40/1M
Exceptional at:
Gemini 2.0 Flash
Google · Gemini
Input
1M tokens
Output
0 tokens
Input Cost
$0.10/1M
Output Cost
$0.40/1M
Exceptional at:
Gemini 1.5 Flash-8B
Google · Gemini
Input
1M tokens
Output
0 tokens
Input Cost
$0.04/1M
Output Cost
$0.15/1M
Claude 3.7 Sonnet
Anthropic · Claude 3
Input
200K tokens
Output
4.1K tokens
Input Cost
$3.00/1M
Output Cost
$15.00/1M
Exceptional at:
Claude 3.5 Haiku
Anthropic · Claude 3.5
Input
200K tokens
Output
8.2K tokens
Input Cost
$0.80/1M
Output Cost
$4.00/1M
Exceptional at:
Claude 3.5 Sonnet
Anthropic · Claude 3
Input
200K tokens
Output
4.1K tokens
Input Cost
$3.00/1M
Output Cost
$15.00/1M
Exceptional at:
Claude 3 Haiku
Anthropic · Claude 3
Input
200K tokens
Output
4.1K tokens
Input Cost
$0.25/1M
Output Cost
$1.25/1M
Exceptional at:
GPT-4o
OpenAI · GPT-4o
Input
128K tokens
Output
16.4K tokens
Input Cost
$2.50/1M
Output Cost
$10.00/1M
Exceptional at:
GPT-4o mini Audio
OpenAI · GPT-4o
Input
128K tokens
Output
4.1K tokens
Input Cost
$0.15/1M
Output Cost
$0.60/1M
Exceptional at:
omni-moderation
OpenAI · omni-moderation
Input
0 tokens
Output
0 tokens
Input Cost
$0.00/1M
Output Cost
$0.00/1M
Exceptional at:
o3-2025-04-16
OpenAI · o3
Input
200K tokens
Output
100K tokens
Input Cost
$10.00/1M
Output Cost
$40.00/1M
Exceptional at:
o3
OpenAI · OpenAI
Input
200K tokens
Output
100K tokens
Input Cost
$10.00/1M
Output Cost
$40.00/1M
Exceptional at:
GPT-4o
OpenAI · GPT-4o
Input
128K tokens
Output
16.4K tokens
Input Cost
$2.50/1M
Output Cost
$10.00/1M
Exceptional at:
Gemini 2.5 Pro Preview
Google · Gemini
Input
1M tokens
Output
0 tokens
Input Cost
$1.25/1M
Output Cost
$10.00/1M
Exceptional at:
Claude 3 Opus
Anthropic · Claude 3
Input
200K tokens
Output
4.1K tokens
Input Cost
$15.00/1M
Output Cost
$75.00/1M
Exceptional at:
Similar Capabilities
Multimodal Input
Found in 17 models with Vision
Long Context
Found in 13 models with Vision
Function Calling
Found in 12 models with Vision
Thinking
Found in 3 models with Vision
Audio Understanding
Found in 2 models with Vision
Video Understanding
Found in 2 models with Vision