Logo
Back to all models

GPT-4o mini Audio - In-Depth Overview

OpenAI · GPT-4o

preview

Get all the details on GPT-4o mini Audio, an AI model from OpenAI. This page covers its token limits, pricing structure, key capabilities such as multimodal_input, audio_to_text, text_to_audio, available API code samples, and performance strengths.

Key Metrics

Input Limit

128K tokens

Output Limit

4.1K tokens

Input Cost

$0.15/1M

Output Cost

$0.60/1M

Sample API Code

from openai import OpenAI
client = OpenAI()
# Example for speech to text (transcription)
audio_file= open("/path/to/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
  model="gpt-4o-mini-audio-preview-2024-12-17",
  file=audio_file
)
print(transcription.text)
# Example for text to speech
response = client.audio.speech.create(
    model="gpt-4o-mini-audio-preview-2024-12-17",
    voice="alloy",
    input="Hello, this is a test of the text-to-speech model."
)
response.stream_to_file("output.mp3")

Required Libraries

openai
openai

Notes

A smaller, cost-optimized model capable of processing audio inputs and generating audio outputs. It is a specialized variant of the GPT-4o mini family, inheriting its multimodal capabilities.

Capabilities

multimodal input
audio to text
text to audio
cost optimized
vision
function calling
json mode

Supported Data Types

Input Types

text
audio
image

Output Types

text
audio
json

Strengths & Weaknesses

Exceptional at

audio to text transcription
text to audio synthesis
cost efficiency for audio tasks

Good at

multimodal understanding

Additional Information

Latest Update

Dec 17, 2024

Knowledge Cutoff

Oct 1, 2023