The issue I have is that not during the conversation, but after loading the conversation history with the messages I mentioned in the OP, I am unable to converse because Realtime API falls back in text mode for the subsequent interaction.
What are people doing about this? i.e how are people maintaining conversation history? When sending multiple events of type conversation.item.create = text the response from the API changes to text.
The options I see so far are:
Send a single conversation.item.create event that has text as per below
{
"type": "input_text",
"text": """
User: Who was Vladimir Lenin?
Assistant: Lenin was the leader of the Bolshevik Revolution in Russia in 1917 and the founder of the Soviet Union. He was a Marxist revolutionary who aimed to create a socialist state and led Russia through a civil war, establishing the foundation of Soviet communism.
User: Who was Joseph Stalin?
Assistant: Stalin succeeded Lenin as the leader of the Soviet Union. Known for his totalitarian rule, he transformed the USSR into a major world power but did so through brutal policies, including widespread purges, forced collectivization, and repression. His rule is marked by significant industrial progress but also severe human rights abuses and mass deaths.
""",
}
Send a session.update event with instructions such as
"instructions": """
You are a friendly AI assistant. Respond to me very briefly in English. Reference previous conversation history when asked.
Previous conversation:
-----------------------------------
User: Who was Vladimir Lenin?
Assistant: Lenin was the leader of the Bolshevik Revolution in Russia in 1917 and the founder of the Soviet Union. He was a Marxist revolutionary who aimed to create a socialist state and led Russia through a civil war, establishing the foundation of Soviet communism.
User: Who was Joseph Stalin?
Assistant: Stalin succeeded Lenin as the leader of the Soviet Union. Known for his totalitarian rule, he transformed the USSR into a major world power but did so through brutal policies, including widespread purges, forced collectivization, and repression. His rule is marked by significant industrial progress but also severe human rights abuses and mass deaths.
""",
How are others doing it? My approach at this stage, will be to send a session.update event with the previous conversation.
I have also encountered this issue, if you try to build up the conversation history as outlined in the docs, the assistant just replies with text, ignoring all my instructions to reply with audio.
For the time being until we find a solution for this, I’m asking the assistant itself at the end of each session to provide a summary of the conversation in text, and I add that in the instructions for the next session.