GPT-4o mini Audio - In-Depth Overview

OpenAI · GPT-4o

preview

Model ID: gpt-4o-mini-audio-preview-2024-12-17

Get all the details on GPT-4o mini Audio, an AI model from OpenAI. This page covers its token limits, pricing structure, key capabilities such as multimodal_input, audio_to_text, text_to_audio, available API code samples, and performance strengths.

Key Metrics

Input Limit

128K tokens

Output Limit

4.1K tokens

Input Cost

$0.15/1M

Output Cost

$0.60/1M

Sample API Code

from openai import OpenAI
client = OpenAI()
# Example for speech to text (transcription)
audio_file= open("/path/to/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
  model="gpt-4o-mini-audio-preview-2024-12-17",
  file=audio_file
)
print(transcription.text)
# Example for text to speech
response = client.audio.speech.create(
    model="gpt-4o-mini-audio-preview-2024-12-17",
    voice="alloy",
    input="Hello, this is a test of the text-to-speech model."
)
response.stream_to_file("output.mp3")

Required Libraries

openai

Notes

A smaller, cost-optimized model capable of processing audio inputs and generating audio outputs. It is a specialized variant of the GPT-4o mini family, inheriting its multimodal capabilities.