Key Metrics
Input Limit
128K tokens
Output Limit
4.1K tokens
Input Cost
$5.00/1M
Output Cost
$20.00/1M
Sample API Code
Required Libraries
Notes
This is a preview release of the GPT-4o Realtime model, the default and current version, capable of responding to audio and text inputs in realtime over WebRTC or a WebSocket interface. It supports function calling. Snapshots are available for version locking.
Capabilities
realtime text processing
realtime audio processing
function calling
WebRTC interface support
WebSocket interface support
Supported Data Types
Input Types
text
audio
Output Types
text
audio
Strengths & Weaknesses
Exceptional at
realtime multimodal interaction (text and audio)
complex understanding
Good at
function calling
fast response times
Poor at
structured outputs (not supported)
fine-tuning (not supported)
distillation (not supported)
predicted outputs (not supported)
image input/output (not supported)
Additional Information
Latest Update
Dec 17, 2024
Knowledge Cutoff
Oct 01, 2023