Back to all models
Get all the details on GPT-4o mini Audio, an AI model from OpenAI. This page covers its token limits, pricing structure, key capabilities such as multimodal_input, audio_to_text, text_to_audio, available API code samples, and performance strengths.
Key Metrics
Input Limit
128K tokens
Output Limit
4.1K tokens
Input Cost
$0.15/1M
Output Cost
$0.60/1M
Sample API Code
from openai import OpenAI
client = OpenAI()
# Example for speech to text (transcription)
audio_file= open("/path/to/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="gpt-4o-mini-audio-preview-2024-12-17",
file=audio_file
)
print(transcription.text)
# Example for text to speech
response = client.audio.speech.create(
model="gpt-4o-mini-audio-preview-2024-12-17",
voice="alloy",
input="Hello, this is a test of the text-to-speech model."
)
response.stream_to_file("output.mp3")
Required Libraries
openai
openai
Notes
A smaller, cost-optimized model capable of processing audio inputs and generating audio outputs. It is a specialized variant of the GPT-4o mini family, inheriting its multimodal capabilities.
Capabilities
multimodal input
audio to text
text to audio
cost optimized
vision
function calling
json mode
Supported Data Types
Input Types
text
audio
image
Output Types
text
audio
json
Strengths & Weaknesses
Exceptional at
audio to text transcription
text to audio synthesis
cost efficiency for audio tasks
Good at
multimodal understanding
Additional Information
Latest Update
Dec 17, 2024
Knowledge Cutoff
Oct 1, 2023