GPT-4o mini Audio - In-Depth Overview

OpenAI · GPT-4o

preview

Model ID: gpt-4o-mini-audio-preview

Get all the details on GPT-4o mini Audio, an AI model from OpenAI. This page covers its token limits, pricing structure, key capabilities such as streaming, multimodal_input, function_calling, available API code samples, and performance strengths.

Key Metrics

Input Limit

128K tokens

Output Limit

16.4K tokens

Input Cost

$0.15/1M

Output Cost

$0.60/1M

Sample API Code

from openai import OpenAI
client = OpenAI()

# Example: Text-to-Speech
speech_file_path = "speech.mp3"
response = client.audio.speech.create(
    model="gpt-4o-mini-audio-preview",
    voice="alloy",
    input="Hello, this is a test of the GPT-4o mini audio preview model."
)
response.stream_to_file(speech_file_path)

# Example: Speech-to-Text (Transcription)
audio_file= open("audio.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="gpt-4o-mini-audio-preview",
    file=audio_file
)
print(transcript.text)

Required Libraries

openai

Notes

A smaller model capable of audio inputs and outputs, designed to input audio or create audio outputs via the REST API. It is a preview release.

Capabilities

Streaming

Multimodal Input

Function Calling

Supported Data Types

Input Types

text

audio

Output Types

text

audio

Strengths & Weaknesses

Exceptional at

audio input processing

audio output generation

Good at

realtime conversations

transcription

speech synthesis

Poor at

structured outputs

fine tuning

Additional Information

Latest Update

Dec 17, 2024

Knowledge Cutoff

Oct 1, 2023

Similar Models

Gemini 2.5 Flash Preview

Google

preview

Gemini 2.0 Flash

Google

Current

Gemini 2.0 Flash

Google

Current

Similar Capabilities

Long Context

34 models

Thinking

4 models

Vision

32 models