Realtime API: Did anybody managed to provide previous conversation transcript history while keeping audio answers?

redvivi · October 5, 2024, 2:22pm

It looks like conversation.item.create only supports history of text type - not event transcript

Provide conversation items history in Realtime API can only be done under this format:
// For User

{
   "type":"conversation.item.create",
   "item":{
      "type":"message",
      "role":"user",
      "content":[
         {
            "type":"input_text",
            "text":"transcript"
         }
      ]
   }
}

For assistant:

{
   "type":"conversation.item.create",
   "item":{
      "type":"message",
      "status":"completed",
      "role":"assistant",
      "content":[
         {
            "type":"text",
            "text":"transcript"
         }
      ]
   }
}

Once all those conversations items are provided,Realtime API answers only into text-mode, not audio.

Is that normal?

maig · October 11, 2024, 4:12pm

I use 2 different events to send transcript back to my client:

case 'response.done':
                        const agentMessage = response.response.output[0]?.content?.find(content => content.transcript)?.transcript || 'Agent message not found';
                        console.debug('Agent message:', agentMessage);
                        const conversationItem1 = {
                            event: 'conversation',
                            streamSid: streamSid,
                            conversation: {
                                role: 'assistant',
                                content: agentMessage
                            }
                        }
                        connection.send(JSON.stringify(conversationItem1));
                        break;
                    case 'conversation.item.input_audio_transcription.completed':
                        const userMessage = response.transcript
                        console.debug('User message:', userMessage);
                        const conversationItem2 = {
                            event: 'conversation',
                            streamSid: streamSid,
                            conversation: {
                                role: 'user',
                                content: userMessage
                            }
                        }
                        connection.send(JSON.stringify(conversationItem2));
                        break;

Hope this helps.

redvivi · October 11, 2024, 4:40pm

I am unsure to follow.

The issue I have is that not during the conversation, but after loading the conversation history with the messages I mentioned in the OP, I am unable to converse because Realtime API falls back in text mode for the subsequent interaction.

chouaib06 · October 13, 2024, 8:51pm

I have the same issue, did you find a solution ?

samiy8030 · October 16, 2024, 6:11am

I have the same issue, did you find a solution ?

There is currently a little bit of a workaround. It’s not prod-ready, but it works for testing. Hoping OpenAI fixes this soon.

hagen.rode · October 30, 2024, 9:04am

What are people doing about this? i.e how are people maintaining conversation history? When sending multiple events of type conversation.item.create = text the response from the API changes to text.

The options I see so far are:

Send a single conversation.item.create event that has text as per below

{
                        "type": "input_text",
                        "text": """
                             User: Who was Vladimir Lenin?
                             Assistant: Lenin was the leader of the Bolshevik Revolution in Russia in 1917 and the founder of the Soviet Union. He was a Marxist revolutionary who aimed to create a socialist state and led Russia through a civil war, establishing the foundation of Soviet communism.
                             User: Who was Joseph Stalin?
                             Assistant: Stalin succeeded Lenin as the leader of the Soviet Union. Known for his totalitarian rule, he transformed the USSR into a major world power but did so through brutal policies, including widespread purges, forced collectivization, and repression. His rule is marked by significant industrial progress but also severe human rights abuses and mass deaths.
                             """,
                    }

Send a session.update event with instructions such as

"instructions": """
                You are a friendly AI assistant. Respond to me very briefly in English. Reference previous conversation history when asked. 
                
                Previous conversation:
                -----------------------------------
                User: Who was Vladimir Lenin?
                Assistant: Lenin was the leader of the Bolshevik Revolution in Russia in 1917 and the founder of the Soviet Union. He was a Marxist revolutionary who aimed to create a socialist state and led Russia through a civil war, establishing the foundation of Soviet communism.
                User: Who was Joseph Stalin?
                Assistant: Stalin succeeded Lenin as the leader of the Soviet Union. Known for his totalitarian rule, he transformed the USSR into a major world power but did so through brutal policies, including widespread purges, forced collectivization, and repression. His rule is marked by significant industrial progress but also severe human rights abuses and mass deaths.
                """,

How are others doing it? My approach at this stage, will be to send a session.update event with the previous conversation.

Excuse the reference to Russian leaders.

j0rdan · October 31, 2024, 11:35am

I have also encountered this issue, if you try to build up the conversation history as outlined in the docs, the assistant just replies with text, ignoring all my instructions to reply with audio.
For the time being until we find a solution for this, I’m asking the assistant itself at the end of each session to provide a summary of the conversation in text, and I add that in the instructions for the next session.

dennis17 · January 9, 2025, 7:22am

I encountered this same issue

I got it working by creating a summary of the previous conversation by using a chat completion ( a summary message created from the realtime session once it’s ended would work as well)

and then placing that summary in the new realtime session instructions

I tested it by asking it to give me random numbers and phrases in the first session, creating the summary, then, asking it to tell me those previous numbers and phrases again in the new session.

Hope this helps!

naiduk798 · February 17, 2025, 6:14am

I am also encountered with same issue. Did they provide any solution for this?

activescott · February 18, 2025, 3:25am

Similar to @dennis17, I’ve also had success creating a simplified “transcript” of the previous conversation and adding it to the instructions in the session request. I can provide an example if someone is stuck.

Bazz0r · February 19, 2025, 7:24pm

This would be helpful, thank you.

Topic		Replies	Views
How can I switch from text generation to audio generation? API realtime	11	1237	February 22, 2025
Trouble Loading Previous Messages with Realtime API API realtime	8	524	January 21, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	10	2440	April 29, 2025
Unable to Access User Audio Transcript in Realtime API API api-realtime	5	1396	February 10, 2025
Realtime API: Updating Modalities API voice , advanced-voice , realtime , api-realtime-speech	12	1484	April 29, 2025

Realtime API: Did anybody managed to provide previous conversation transcript history while keeping audio answers?

Related topics