I have done Python implementation which works pretty well however I cannot interrupt the assistant even if I send the response.cancel during the output audio streaming
Here is my test scenario:
- I am using client-side VAD (also tested with server vad and the same problem) which detects the speech and if I start speaking while AI is outputting audio, “response.cancel” event is sent to the OpenAI service.
- At start, I tell AI to count to 20.
- AI starts to count, audio playback happens smoothly
- I say please stop counting, “response.cancel” event is sent to service
Problem:
- AI do not cancel the response, but always completes the response (ie. counts to till the end)
Here is part of the trace for more details:
INFO:main:Session updated: {‘id’: ‘sess_AHaqRB3pQQwCdnY5XCHKZ’, ‘object’: ‘realtime.session’, ‘model’: ‘gpt-4o-realtime-preview’, ‘expires_at’: 1728757471, ‘modalities’: [‘text’, ‘audio’], ‘instructions’: ‘You are a helpful assistant. Respond concisely. If user asks to tell story, tell story very shortly.’, ‘voice’: ‘alloy’, ‘turn_detection’: None, ‘input_audio_format’: ‘pcm16’, ‘output_audio_format’: ‘pcm16’, ‘input_audio_transcription’: {‘model’: ‘whisper-1’}, ‘tool_choice’: ‘auto’, ‘temperature’: 0.8, ‘max_response_output_tokens’: ‘inf’, ‘tools’: [{‘name’: ‘get_weather’, ‘description’: ‘Get the current weather for a location.’, ‘parameters’: {‘type’: ‘object’, ‘properties’: {‘location’: {‘type’: ‘string’}}, ‘required’: [‘location’]}, ‘type’: ‘function’}]}
INFO:vad:Speech started
INFO:main:Speech has started.
INFO:main:Sending audio data to the client.
… I said “Please count to 20”
INFO:main:New Conversation Item: {‘id’: ‘item_AHaqWiaRzvSTacllhd5u8’, ‘object’: ‘realtime.item’, ‘type’: ‘message’, ‘status’: ‘in_progress’, ‘role’: ‘assistant’, ‘content’: }
INFO:main:New Part Added: {‘type’: ‘audio’, ‘transcript’: ‘’}
INFO:main:Transcript Delta: Sure
INFO:main:Transcript Delta: ,
INFO:main:Transcript Delta: here
INFO:main:Transcript Delta: you
INFO:main:Transcript Delta: go
INFO:main:Transcript Delta: :
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Transcript Delta: One
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Transcript Delta: ,
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Transcript Delta: two
… I try to stop the assistant by saying “Please stop”
INFO:vad:Speech started
INFO:main:Speech has started.
INFO:main:User started speaking while audio is playing.
INFO:main:Clearing input audio buffer.
INFO:main:Cancelling response.
INFO:main:Truncate the current audio, current item ID: item_AHaqWiaRzvSTacllhd5u8, current audio content index: 0
CRITICAL:openai_realtime_common.web_socket_manager:Sending message: {“type”: “response.cancel”}
INFO:main:Sending audio data to the client.
INFO:main:Sending audio data to the client.
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Sending audio data to the client.
INFO:main:Sending audio data to the client.
…
INFO:main:Transcript Delta: eight
…
… I have said “Please stop!”
INFO:vad:Speech ended
INFO:main:Speech has ended
INFO:main:Requesting the client to generate a response.
…
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Received audio delta for Response ID resp_AHaqWXORK0F9uAdKQAd9i, Item ID item_AHaqWiaRzvSTacllhd5u8, Content Index 0
INFO:main:Transcript Delta: eleven
… Assistant just continues
INFO:main:Audio done for response ID resp_AHaqWXORK0F9uAdKQAd9i, item ID item_AHaqWiaRzvSTacllhd5u8
INFO:main:Audio transcript done: ‘Sure, here you go: One, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty.’ for response ID resp_AHaqWXORK0F9uAdKQAd9i
INFO:main:Content part done: ‘’ of type ‘audio’ for response ID resp_AHaqWXORK0F9uAdKQAd9i
INFO:main:Output item done for response ID resp_AHaqWXORK0F9uAdKQAd9i with content: [{‘type’: ‘audio’, ‘transcript’: ‘Sure, here you go: One, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty.’}]
INFO:main:Response completed with status ‘completed’ and ID ‘resp_AHaqWXORK0F9uAdKQAd9i’
… Later assistant say it will stops but this comes too late
INFO:main:New Conversation Item: {‘id’: ‘item_AHaqdkiBS12XzLRE39wTo’, ‘object’: ‘realtime.item’, ‘type’: ‘message’, ‘status’: ‘in_progress’, ‘role’: ‘assistant’, ‘content’: }
INFO:main:New Part Added: {‘type’: ‘audio’, ‘transcript’: ‘’}
INFO:main:Transcript Delta: Alright
INFO:main:Transcript Delta: ,
INFO:main:Transcript Delta: I’ll
INFO:main:Transcript Delta: stop
INFO:main:Transcript Delta: .
Questions:
- The received audio delta content index is 0 always, I wonder if assistant is able to interrupt the audio of the current index?
- Sometimes I receive response done with cancelled status so I am pretty sure “response.cancel” event goes to service.
Would appreciate if there is something to fix the issue, I would like to be able to interrupt the assistant to make the conversation more alive.