Back to all models
GPT-4o Audio - In-Depth Overview
OpenAI · GPT-4o
preview
Latest in family
Model ID: gpt-4o-audio-preview
Get all the details on GPT-4o Audio, an AI model from OpenAI. This page covers its token limits, pricing structure, key capabilities such as multimodal_input, function_calling, streaming, available API code samples, and performance strengths.
Key Metrics
Input Limit
128K tokens
Output Limit
16.4K tokens
Input Cost
$2.50/1M
Output Cost
$10.00/1M
Sample API Code
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-audio-preview",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
Required Libraries
openai
openai
Notes
This is a preview release of the GPT-4o Audio models. These models accept audio inputs and outputs, and can be used in the Chat Completions REST API. Structured outputs are not supported. Fine-tuning, distillation, and predicted outputs are not supported.
Capabilities
Supported Data Types
Input Types
text
audio
Output Types
text
audio
Strengths & Weaknesses
Exceptional at
audio input processing
audio output generation
Good at
text input processing
text output generation
function calling
streaming
Poor at
structured outputs
fine tuning
Additional Information
Latest Update
Dec 17, 2024
Knowledge Cutoff
Oct 1, 2023