I am trying to use the Agents SDK (Python version) to create a realtime session in transcription mode, but i am running into issues.
So far i have the following code based from example code on GitHub openai-agents-python/blob/main/docs/realtime/quickstart.md.
1. Create ephemeral key.
"session": {
"type": "transcription",
"audio": {
"input": {
"noise_reduction": {
"type": "far_field",
},
"transcription": {
"language": "en",
"model": "gpt-4o-mini-transcribe",
"prompt": TRANSCRIBE_SYSTEM_PROMPT,
}
}
},
"include": [
"item.input_audio_transcription.logprobs",
],
}
2. Create an Agent and Runner.
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
)
runner = RealtimeRunner(starting_agent=agent)
! Note:
I have deliberately not passed aconfigto the runner as i receive errors about using the wrong model in transcribe mode.
3. Starting the session.
session = await runner.run(model_config={"api_key": ephemeral_key, "url": "wss://api.openai.com/v1/realtime?intent=transcription"})
async with session:
print("Session started! The agent will stream audio responses in real-time.")
# Process events
async for event in session:
...
! Important
I made sure to set the api_key to use the ephemeral_key that i generated in step 1.
I have also added the url parameter to ensure the request is being sent with?intent=transcriptionin the url.
Issue:
After running this code, there are error server events returned. I do not have any code sending a ‘session update event’ to the session, I believe the SDK to be doing this automatically.
Session started! The agent will stream audio responses in real-time.
Raw model event: RealtimeModelRawServerEvent(data={'type': 'session.created', 'event_id': 'event_CGmN76McuLtMzxsjjXghR', 'session': {'type': 'transcription', 'object': 'realtime.transcription_session', 'id': 'sess_CGm...', 'expires_at': 1758118825, 'audio': {'input': {'format': {'type': 'audio/pcm', 'rate': 24000}, 'transcription': {'model': 'gpt-4o-mini-transcribe', 'language': 'en', 'prompt': 'Expect a conversation between a human and customer service agent. Expect technology terms to be used.'}, 'noise_reduction': {'type': 'far_field'}, 'turn_detection': {'type': 'server_vad', 'threshold': 0.7, 'prefix_padding_ms': 300, 'silence_duration_ms': 500}}}, 'include': ['item.input_audio_transcription.logprobs']}}, type='raw_server_event')
Raw model event: RealtimeModelRawServerEvent(data={'type': 'error', 'event_id': 'event_CGmN76FDdk3iXyhToLLZu', 'error': {'type': 'invalid_request_error', 'code': 'invalid_parameter', 'message': 'Passing a realtime session update event to a transcription session is not allowed.', 'param': '', 'event_id': None}}, type='raw_server_event')
Raw model event: RealtimeModelErrorEvent(error=RealtimeError(message='Passing a realtime session update event to a transcription session is not allowed.', type='invalid_request_error', code='invalid_parameter', event_id=None, param=''), type='error')