Trouble Loading Previous Messages with Realtime API

Hi everyone,

I’m having trouble loading previous messages into the Realtime API. Has anyone successfully managed to do this?

Here’s the sequence of events I’m sending:

{
  "type": "session.update",
  "session": {
    "modalities": ["text", "audio"],
    "instructions": "Assist the user.",
    "voice": "ash",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "input_audio_transcription": {"model": "whisper-1"},
    "turn_detection": null,
    "temperature": 0.8
  }
}
{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "status": "completed",
    "role": "system",
    "content": [{"type": "input_text", "text": "Say hi to the user."}]
  }
}
{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [{"type": "text", "text": "Hello, how can I assist you today?"}]
  }
}
{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [{"type": "text", "input_text": "Hello, can you tell me a joke?"}]
  }
}
{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "status": "completed",
    "role": "system",
    "content": [{"type": "input_text", "text": "The user interupted the conversation, continue from where you stopped."}]
  }
}

After this I send a response.create message:

{
    "type": "response.create", "response": {"modalities": ["text", "audio"]},
}

The issue I’m experiencing is that sometimes I only get text responses without audio, or I encounter errors for some messages. I’ve been unable to get it working reliably.

If anyone has insights, tips, or a working example, I’d greatly appreciate your help!

Thanks in advance!

1 Like

When role is user, your content part type must beinput_text. Also the other key must be text instead of input_text:

[{"type": "input_text", "text": "Hello, can you tell me a joke?"}]

Also, make sure to catch and log any error event you receive as it’s very easy to miss them.

That is a known issue, I haven’t been able to get it to work reliably either. The model tends to drop out of voice mode when you construct the conversation history with text messages.
One thing you could do, is to get the model to generate a summary of the conversation at the end of a session and send that in your next session. This is a gimmicky workaround though. It only works when the conversation ends gracefully (e.g. you don’t get disconnected from the session). It might help you get started until we get an official fix for this.

You are right, that was an error while writing the question here, in my code it is correct.

Do we have any contact with OpenAI team regarding this issue?

Sorry about this. This is a bug we’re currently looking into (reminder, the Realtime API is still in beta :sweat_smile:). One workaround is to add an audio message as your first user message, before the other history, which might help coax the model to respond with audio as well.

5 Likes

Thank you for your response! I gave that approach a try by adding a 1-second silent audio as a user message before the others, but unfortunately, it didn’t work. I believe the best solution would be the one suggested by @j0rdan summarizing the previous conversation and including it as context.

1 Like

Can confirm this works in a couple of use cases I am involved with!

1 Like

Thanks for the clarification. I tried your suggested workaround, but it doesn’t work reliably (I saw it work a couple of times, but nowhere near a reliable amount of times to make it viable).
If the conversation history is quite long, there will be a lot of assistant messages of type text, which I think misguides the model to respond with text from that point onward.
Looking forward to any further updates on this.