How can Realtime Transcription mode be used with Agents SDK (Python)?

I am trying to use the Agents SDK (Python version) to create a realtime session in transcription mode, but i am running into issues.

So far i have the following code based from example code on GitHub openai-agents-python/blob/main/docs/realtime/quickstart.md.

1. Create ephemeral key.

"session": {
            "type": "transcription",
            "audio": {
                "input": {
                    "noise_reduction": {
                        "type": "far_field",
                    },
                    "transcription": {
                        "language": "en",
                        "model": "gpt-4o-mini-transcribe",
                        "prompt": TRANSCRIBE_SYSTEM_PROMPT,
                    }
                }
            },
            "include": [
                "item.input_audio_transcription.logprobs",
            ],
        }

2. Create an Agent and Runner.

agent = RealtimeAgent(
        name="Assistant",
        instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
    )
runner = RealtimeRunner(starting_agent=agent)

! Note:
I have deliberately not passed a config to the runner as i receive errors about using the wrong model in transcribe mode.

3. Starting the session.

session = await runner.run(model_config={"api_key": ephemeral_key, "url": "wss://api.openai.com/v1/realtime?intent=transcription"})

async with session:
        print("Session started! The agent will stream audio responses in real-time.")
        # Process events
        async for event in session:
...

! Important
I made sure to set the api_key to use the ephemeral_key that i generated in step 1.
I have also added the url parameter to ensure the request is being sent with ?intent=transcription in the url.

Issue:

After running this code, there are error server events returned. I do not have any code sending a ‘session update event’ to the session, I believe the SDK to be doing this automatically.

Session started! The agent will stream audio responses in real-time.
Raw model event: RealtimeModelRawServerEvent(data={'type': 'session.created', 'event_id': 'event_CGmN76McuLtMzxsjjXghR', 'session': {'type': 'transcription', 'object': 'realtime.transcription_session', 'id': 'sess_CGm...', 'expires_at': 1758118825, 'audio': {'input': {'format': {'type': 'audio/pcm', 'rate': 24000}, 'transcription': {'model': 'gpt-4o-mini-transcribe', 'language': 'en', 'prompt': 'Expect a conversation between a human and customer service agent. Expect technology terms to be used.'}, 'noise_reduction': {'type': 'far_field'}, 'turn_detection': {'type': 'server_vad', 'threshold': 0.7, 'prefix_padding_ms': 300, 'silence_duration_ms': 500}}}, 'include': ['item.input_audio_transcription.logprobs']}}, type='raw_server_event')
Raw model event: RealtimeModelRawServerEvent(data={'type': 'error', 'event_id': 'event_CGmN76FDdk3iXyhToLLZu', 'error': {'type': 'invalid_request_error', 'code': 'invalid_parameter', 'message': 'Passing a realtime session update event to a transcription session is not allowed.', 'param': '', 'event_id': None}}, type='raw_server_event')
Raw model event: RealtimeModelErrorEvent(error=RealtimeError(message='Passing a realtime session update event to a transcription session is not allowed.', type='invalid_request_error', code='invalid_parameter', event_id=None, param=''), type='error')

Hey, I’m not sure how you could do it with AgentSDK. However, when I was testing with WebSockets, I got the same error. I had to change the WebSocket URL to wss://api.openai.com/v1/realtime?intent=transcription (notice I did not include the model in the params).

And this was my session payload. Hope this helps:

{
    "type": "transcription_session.update",
    "event_id": generate_short_random_id("msg"),
    "session": {
        "input_audio_format": "pcm16",
        "input_audio_transcription": {
            "language": "en",
            "model": "gpt-4o-mini-transcribe",
            "prompt": "",
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 500,
        },
        "input_audio_noise_reduction": {"type": "near_field"},
        "include": ["item.input_audio_transcription.logprobs"],
    },
}

Hi thanks for the reply, unfortunately this is the url i have already been using.