AI Models with Audio Understanding Support

This page lists Large Language Models that offer Audio Understanding. Compare models, see how they implement this feature, and find the best option for projects requiring robust Audio Understanding.

Providers

Google

Models with this Capability

Gemini 2.5 Flash Preview

Google · Gemini

ID: google-gemini-2.5-flash-preview

preview

Input

1M tokens

Output

0 tokens

Input Cost

$0.15/1M

Output Cost

$0.60/1M

Exceptional at:

mathematics

Multimodal Input

Long Context

Thinking

Gemini 2.0 Flash

Google · Gemini

ID: google-gemini-2.0-flash-live

Current

Input

1M tokens

Output

0 tokens

Input Cost

$0.10/1M

Output Cost

$0.40/1M

Exceptional at:

instruction following

Multimodal Input

Long Context

Vision

+14

Gemini 1.5 Pro

Google · Gemini

ID: google-gemini-1.5-pro

Current

Input

2M tokens

Output

0 tokens

Input Cost

$1.25/1M

Output Cost

$5.00/1M

Exceptional at:

long context processing

complex reasoning

Multimodal Input

Long Context

Structured Output

AI Models with Audio Understanding Support

Providers

Models with this Capability

Gemini 2.5 Flash Preview

Gemini 2.0 Flash

Gemini 1.5 Pro

Similar Capabilities