Realtime API: Did anybody managed to provide previous conversation transcript history while keeping audio answers?

It looks like conversation.item.create only supports history of text type - not event transcript

Provide conversation items history in Realtime API can only be done under this format:
// For User

{
   "type":"conversation.item.create",
   "item":{
      "type":"message",
      "role":"user",
      "content":[
         {
            "type":"input_text",
            "text":"transcript"
         }
      ]
   }
}

For assistant:

{
   "type":"conversation.item.create",
   "item":{
      "type":"message",
      "status":"completed",
      "role":"assistant",
      "content":[
         {
            "type":"text",
            "text":"transcript"
         }
      ]
   }
}

Once all those conversations items are provided,Realtime API answers only into text-mode, not audio.

Is that normal?

2 Likes

I use 2 different events to send transcript back to my client:

case 'response.done':
                        const agentMessage = response.response.output[0]?.content?.find(content => content.transcript)?.transcript || 'Agent message not found';
                        console.debug('Agent message:', agentMessage);
                        const conversationItem1 = {
                            event: 'conversation',
                            streamSid: streamSid,
                            conversation: {
                                role: 'assistant',
                                content: agentMessage
                            }
                        }
                        connection.send(JSON.stringify(conversationItem1));
                        break;
                    case 'conversation.item.input_audio_transcription.completed':
                        const userMessage = response.transcript
                        console.debug('User message:', userMessage);
                        const conversationItem2 = {
                            event: 'conversation',
                            streamSid: streamSid,
                            conversation: {
                                role: 'user',
                                content: userMessage
                            }
                        }
                        connection.send(JSON.stringify(conversationItem2));
                        break;

Hope this helps.

I am unsure to follow.

The issue I have is that not during the conversation, but after loading the conversation history with the messages I mentioned in the OP, I am unable to converse because Realtime API falls back in text mode for the subsequent interaction.

I have the same issue, did you find a solution ?

I have the same issue, did you find a solution ?

There is currently a little bit of a workaround. It’s not prod-ready, but it works for testing. Hoping OpenAI fixes this soon.

What are people doing about this? i.e how are people maintaining conversation history? When sending multiple events of type conversation.item.create = text the response from the API changes to text.

The options I see so far are:

  1. Send a single conversation.item.create event that has text as per below
{
                        "type": "input_text",
                        "text": """
                             User: Who was Vladimir Lenin?
                             Assistant: Lenin was the leader of the Bolshevik Revolution in Russia in 1917 and the founder of the Soviet Union. He was a Marxist revolutionary who aimed to create a socialist state and led Russia through a civil war, establishing the foundation of Soviet communism.
                             User: Who was Joseph Stalin?
                             Assistant: Stalin succeeded Lenin as the leader of the Soviet Union. Known for his totalitarian rule, he transformed the USSR into a major world power but did so through brutal policies, including widespread purges, forced collectivization, and repression. His rule is marked by significant industrial progress but also severe human rights abuses and mass deaths.
                             """,
                    }
  1. Send a session.update event with instructions such as
"instructions": """
                You are a friendly AI assistant. Respond to me very briefly in English. Reference previous conversation history when asked. 
                
                Previous conversation:
                -----------------------------------
                User: Who was Vladimir Lenin?
                Assistant: Lenin was the leader of the Bolshevik Revolution in Russia in 1917 and the founder of the Soviet Union. He was a Marxist revolutionary who aimed to create a socialist state and led Russia through a civil war, establishing the foundation of Soviet communism.
                User: Who was Joseph Stalin?
                Assistant: Stalin succeeded Lenin as the leader of the Soviet Union. Known for his totalitarian rule, he transformed the USSR into a major world power but did so through brutal policies, including widespread purges, forced collectivization, and repression. His rule is marked by significant industrial progress but also severe human rights abuses and mass deaths.
                """,

How are others doing it? My approach at this stage, will be to send a session.update event with the previous conversation.

Excuse the reference to Russian leaders.

I have also encountered this issue, if you try to build up the conversation history as outlined in the docs, the assistant just replies with text, ignoring all my instructions to reply with audio.
For the time being until we find a solution for this, I’m asking the assistant itself at the end of each session to provide a summary of the conversation in text, and I add that in the instructions for the next session.

1 Like