[Realtime API] Getting inconsistent or absent outputs over stream

I have been trying to use the realtime websocket api, but I get very inconsistent behavior. The events will be lacking any text.deltas or text.completes until after three or four calls to response.create, oftentimes the first response will be formatted in json when there is nothing indicating it should do so, and often it will dump some completely random text, like long lists of image names, once again completely unrelated to the prompt.

I’ve attached a sample request/response log where it initially does not answer, then answers in json, and finally responds normally. I suspect something is off in my initial requests but I cannot see anything that is off.

Raw message from OpenAI:
{"type":"session.created","event_id":"event_AMyCJ39rsbJXIlpXivFcH","session":{"id":"sess_AMyCIBoyAwiYg02o3XwRC","object":"realtime.session","model":"gpt-4o-realtime-preview-2024-10-01","expires_at":1730038878,"modalities":["text","audio"],"instructions":"Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.","voice":"alloy","turn_detection":{"type":"server_vad","threshold":0.5,"prefix_padding_ms":300,"silence_duration_ms":200},"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":null,"tool_choice":"auto","temperature":0.8,"max_response_output_tokens":"inf","tools":[]}}

Message out to OpenAI:
{"type":"conversation.item.create","item":{"type":"message","role":"user","content":[{"type":"input_text","text":"hello, how are you"}]}}

Message out to OpenAI:
{"type":"response.create","response":{"modalities":["text"],"instructions":""}}

Raw message from OpenAI:
{"type":"conversation.item.created","event_id":"event_AMyCNZg4oyXtGplmxjIKh","previous_item_id":null,"item":{"id":"item_AMyCNH6DPNxbXzIXW3zTh","object":"realtime.item","type":"message","status":"completed","role":"user","content":[{"type":"input_text","text":"hello, how are you"}]}}

Raw message from OpenAI:
{"type":"response.created","event_id":"event_AMyCNtKCrgZYULPnOXGX1","response":{"object":"realtime.response","id":"resp_AMyCNxdOdh7l0KhaAJPyb","status":"in_progress","status_details":null,"output":[],"usage":null}}

Raw message from OpenAI:
{"type":"rate_limits.updated","event_id":"event_AMyCNVirlTOT2A5ynLkBA","rate_limits":[{"name":"requests","limit":100,"remaining":38,"reset_seconds":52747.081},{"name":"tokens","limit":20000,"remaining":15481,"reset_seconds":13.557}]}

Raw message from OpenAI:
{"type":"response.output_item.added","event_id":"event_AMyCNSNfEl6xvvRAznu8l","response_id":"resp_AMyCNxdOdh7l0KhaAJPyb","output_index":0,"item":{"id":"item_AMyCNm1j8KUnfAB906BW0","object":"realtime.item","type":"message","status":"in_progress","role":"assistant","content":[]}}

Raw message from OpenAI:
{"type":"conversation.item.created","event_id":"event_AMyCN36AsTHdn6ClNct73","previous_item_id":"item_AMyCNH6DPNxbXzIXW3zTh","item":{"id":"item_AMyCNm1j8KUnfAB906BW0","object":"realtime.item","type":"message","status":"in_progress","role":"assistant","content":[]}}

Raw message from OpenAI:
{"type":"response.content_part.added","event_id":"event_AMyCNzyDxThjeM9WEaBTo","response_id":"resp_AMyCNxdOdh7l0KhaAJPyb","item_id":"item_AMyCNm1j8KUnfAB906BW0","output_index":0,"content_index":0,"part":{"type":"text","text":""}}

Message out to OpenAI:
{"type":"conversation.item.create","item":{"type":"message","role":"user","content":[{"type":"input_text","text":"can I get a response?"}]}}

Message out to OpenAI:
{"type":"response.create","response":{"modalities":["text"],"instructions":""}}

Raw message from OpenAI:
{"type":"conversation.item.created","event_id":"event_AMyCUoaw0HgGJhkie6W4J","previous_item_id":"item_AMyCNm1j8KUnfAB906BW0","item":{"id":"item_AMyCUcHjOTkfC6FYoqgy9","object":"realtime.item","type":"message","status":"completed","role":"user","content":[{"type":"input_text","text":"can I get a response?"}]}}

Raw message from OpenAI:
{"type":"response.created","event_id":"event_AMyCUCJYFLBRQXekJXFcj","response":{"object":"realtime.response","id":"resp_AMyCUIzaMKPyTDZ5MVNQq","status":"in_progress","status_details":null,"output":[],"usage":null}}

Raw message from OpenAI:
{"type":"rate_limits.updated","event_id":"event_AMyCUy8V1N0UKDORWsKwA","rate_limits":[{"name":"requests","limit":100,"remaining":37,"reset_seconds":53604.305},{"name":"tokens","limit":20000,"remaining":15420,"reset_seconds":13.74}]}

Raw message from OpenAI:
{"type":"response.output_item.added","event_id":"event_AMyCUavtjadxPAGBWPCQY","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","output_index":0,"item":{"id":"item_AMyCUxi5XGHftrsOCN6XG","object":"realtime.item","type":"message","status":"in_progress","role":"assistant","content":[]}}

Raw message from OpenAI:
{"type":"conversation.item.created","event_id":"event_AMyCUevFKFOS2JWfn9UAC","previous_item_id":"item_AMyCUcHjOTkfC6FYoqgy9","item":{"id":"item_AMyCUxi5XGHftrsOCN6XG","object":"realtime.item","type":"message","status":"in_progress","role":"assistant","content":[]}}

Raw message from OpenAI:
{"type":"response.content_part.added","event_id":"event_AMyCUoL7vhv8bfQ0Sx0M2","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"part":{"type":"text","text":""}}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUTLnvhDNHa2Hrv71m","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"{\""}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCU2ILIN1mOcyrShkpy","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"response"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCU0weXmIc4g1eYqhEr","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"\":"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUuB0iqvxo1LWSvt1T","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" \""}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUEYwERd1B7q6lJNKS","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"Of"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUXBLMVkAysPfhNh47","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" course"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUwF3gK9ZFZo3tra3Y","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"!"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCU33TGkBh4bn7xLsHg","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" How"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUHuLkeEVvyFrHwZ6j","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" can"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUFAOREGNT6UHibgDh","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" I"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUpjqgJq711VCzESSc","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" help"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUnmfxWd9M8qJ2gQq1","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" you"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUrZF6iFojPHzSITjl","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":" today"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUVAeiIbQopwxX0D5u","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"?\""}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCUhg8414WezHKnNtnQ","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"delta":"}"}

Raw message from OpenAI:
{"type":"response.text.done","event_id":"event_AMyCUfvb6cQeBKRF9oLaZ","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"text":"{\"response\": \"Of course! How can I help you today?\"}"}

Raw message from OpenAI:
{"type":"response.content_part.done","event_id":"event_AMyCULd2324ayRZGWq5Ix","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","item_id":"item_AMyCUxi5XGHftrsOCN6XG","output_index":0,"content_index":0,"part":{"type":"text","text":"{\"response\": \"Of course! How can I help you today?\"}"}}

Raw message from OpenAI:
{"type":"response.output_item.done","event_id":"event_AMyCUz9LVZKcEXhL9iywl","response_id":"resp_AMyCUIzaMKPyTDZ5MVNQq","output_index":0,"item":{"id":"item_AMyCUxi5XGHftrsOCN6XG","object":"realtime.item","type":"message","status":"completed","role":"assistant","content":[{"type":"text","text":"{\"response\": \"Of course! How can I help you today?\"}"}]}}

Raw message from OpenAI:
{"type":"response.done","event_id":"event_AMyCUqRDQqPZjjLmmcsgb","response":{"object":"realtime.response","id":"resp_AMyCUIzaMKPyTDZ5MVNQq","status":"completed","status_details":null,"output":[{"id":"item_AMyCUxi5XGHftrsOCN6XG","object":"realtime.item","type":"message","status":"completed","role":"assistant","content":[{"type":"text","text":"{\"response\": \"Of course! How can I help you today?\"}"}]}],"usage":{"total_tokens":87,"input_tokens":70,"output_tokens":17,"input_token_details":{"cached_tokens":0,"text_tokens":25,"audio_tokens":45},"output_token_details":{"text_tokens":17,"audio_tokens":0}}}}

Message out to OpenAI:
{"type":"conversation.item.create","item":{"type":"message","role":"user","content":[{"type":"input_text","text":"why did you respond in json?"}]}}

Message out to OpenAI:
{"type":"response.create","response":{"modalities":["text"],"instructions":""}}

Raw message from OpenAI:
{"type":"conversation.item.created","event_id":"event_AMyCbXmwKhJBA6OfwDOp5","previous_item_id":"item_AMyCUxi5XGHftrsOCN6XG","item":{"id":"item_AMyCbSdwpADx4nYqSC3BH","object":"realtime.item","type":"message","status":"completed","role":"user","content":[{"type":"input_text","text":"why did you respond in json?"}]}}

Raw message from OpenAI:
{"type":"response.created","event_id":"event_AMyCbeH5bTWPLnBYaMrW6","response":{"object":"realtime.response","id":"resp_AMyCbctcPUWgpySKO7v58","status":"in_progress","status_details":null,"output":[],"usage":null}}

Raw message from OpenAI:
{"type":"rate_limits.updated","event_id":"event_AMyCbZv0RowLC5Lwi4av7","rate_limits":[{"name":"requests","limit":100,"remaining":36,"reset_seconds":54461.222},{"name":"tokens","limit":20000,"remaining":15390,"reset_seconds":13.83}]}

Raw message from OpenAI:
{"type":"response.output_item.added","event_id":"event_AMyCbEejjQtBtdDX281sp","response_id":"resp_AMyCbctcPUWgpySKO7v58","output_index":0,"item":{"id":"item_AMyCbhs2dxVMMC5hJmguH","object":"realtime.item","type":"message","status":"in_progress","role":"assistant","content":[]}}

Raw message from OpenAI:
{"type":"conversation.item.created","event_id":"event_AMyCb7VdTAjmL9hsUBPm1","previous_item_id":"item_AMyCbSdwpADx4nYqSC3BH","item":{"id":"item_AMyCbhs2dxVMMC5hJmguH","object":"realtime.item","type":"message","status":"in_progress","role":"assistant","content":[]}}

Raw message from OpenAI:
{"type":"response.content_part.added","event_id":"event_AMyCbzcaGjx5JJSJS5dnC","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"part":{"type":"text","text":""}}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbfS4pOD3X1IMAdIRk","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":"I"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbqRbCE1qX3Gn8J6gD","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" apologize"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbMy8b9Geu04LRlKku","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" for"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbPWonswTT8gfOdP8b","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" that"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbv9wu4dxhwjrg3lGY","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":"."}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCboX6xGwh1nYQSSnqe","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" Let"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCb0ism03DpbebGgpZL","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" me"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbqHx7JWgsq91Xzkjh","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" correct"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbKpkAIAYFwvtZ1xYD","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" myself"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCb4QweQEcHY50vQfok","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":"."}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbAH94DSEN8UL401US","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" How"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbY6IxjmwyanOwS80O","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" can"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbzjDHYkjdgYmYkORy","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" I"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbbr6cGxpYJ9Cyumw9","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" help"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbaDXxOxVu2zWOsUiL","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" you"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbzkZkyPlpQXFqG9SN","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":" today"}

Raw message from OpenAI:
{"type":"response.text.delta","event_id":"event_AMyCbKPAyRpSvoAHQImgs","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"delta":"?"}

Raw message from OpenAI:
{"type":"response.text.done","event_id":"event_AMyCbb7EE5vACZ4Qps577","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"text":"I apologize for that. Let me correct myself. How can I help you today?"}

Raw message from OpenAI:
{"type":"response.content_part.done","event_id":"event_AMyCbzMVMEjR39TbkDIwY","response_id":"resp_AMyCbctcPUWgpySKO7v58","item_id":"item_AMyCbhs2dxVMMC5hJmguH","output_index":0,"content_index":0,"part":{"type":"text","text":"I apologize for that. Let me correct myself. How can I help you today?"}}

Raw message from OpenAI:
{"type":"response.output_item.done","event_id":"event_AMyCbVpA28TmTgSIxKjzZ","response_id":"resp_AMyCbctcPUWgpySKO7v58","output_index":0,"item":{"id":"item_AMyCbhs2dxVMMC5hJmguH","object":"realtime.item","type":"message","status":"completed","role":"assistant","content":[{"type":"text","text":"I apologize for that. Let me correct myself. How can I help you today?"}]}}

Raw message from OpenAI:
{"type":"response.done","event_id":"event_AMyCbUXhLp7Q1aXbTYvdz","response":{"object":"realtime.response","id":"resp_AMyCbctcPUWgpySKO7v58","status":"completed","status_details":null,"output":[{"id":"item_AMyCbhs2dxVMMC5hJmguH","object":"realtime.item","type":"message","status":"completed","role":"assistant","content":[{"type":"text","text":"I apologize for that. Let me correct myself. How can I help you today?"}]}],"usage":{"total_tokens":119,"input_tokens":100,"output_tokens":19,"input_token_details":{"cached_tokens":0,"text_tokens":55,"audio_tokens":45},"output_token_details":{"text_tokens":19,"audio_tokens":0}}}}

EDIT: Just to make sure it wasn’t my code, I ran the example straight from the OpenAI reference page.

This script from here https://platform.openai.com/docs/guides/realtime/overview:

const WebSocket = require('ws');

const url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01";
const ws = new WebSocket(url, {
    headers: {
        "Authorization": "Bearer " + "openai-key-here",
        "OpenAI-Beta": "realtime=v1",
    },
});

ws.on("open", function open() {
    console.log("Connected to server.");
    ws.send(JSON.stringify({
        type: "response.create",
        response: {
            modalities: ["text"],
            instructions: "Please assist the user.",
        }
    }));
});

ws.on("message", function incoming(message) {
    console.log(JSON.parse(message.toString()));
});

Results in this output:

{
  type: 'session.created',
  event_id: 'event_ANEse7nLPMRGnoXKhO31h',
  session: {
    id: 'sess_ANEse12DVB1AeKy9dQGNC',
    object: 'realtime.session',
    model: 'gpt-4o-realtime-preview-2024-10-01',
    expires_at: 1730103008,
    modalities: [ 'text', 'audio' ],
    instructions: "Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.",
    voice: 'alloy',
    turn_detection: {
      type: 'server_vad',
      threshold: 0.3,
      prefix_padding_ms: 500,
      silence_duration_ms: 350
    },
    input_audio_format: 'pcm16',
    output_audio_format: 'pcm16',
    input_audio_transcription: null,
    tool_choice: 'auto',
    temperature: 0.8,
    max_response_output_tokens: 'inf',
    tools: []
  }
}
{
  type: 'response.created',
  event_id: 'event_ANEseyPKZOdv0SzmejdXi',
  response: {
    object: 'realtime.response',
    id: 'resp_ANEseTuL9L33wgRvD6AKh',
    status: 'in_progress',
    status_details: null,
    output: [],
    usage: null
  }
}
{
  type: 'rate_limits.updated',
  event_id: 'event_ANEsf9iwORWtIXSOocygb',
  rate_limits: [
    {
      name: 'requests',
      limit: 100,
      remaining: 33,
      reset_seconds: 57741.345
    },
    {
      name: 'tokens',
      limit: 20000,
      remaining: 15485,
      reset_seconds: 13.545
    }
  ]
}
{
  type: 'response.output_item.added',
  event_id: 'event_ANEsfgSAlkW0TH58mah1W',
  response_id: 'resp_ANEseTuL9L33wgRvD6AKh',
  output_index: 0,
  item: {
    id: 'item_ANEseIMRGIJfxEVNVVGXv',
    object: 'realtime.item',
    type: 'message',
    status: 'in_progress',
    role: 'assistant',
    content: []
  }
}
{
  type: 'conversation.item.created',
  event_id: 'event_ANEsfqKUar0cNhfROpkUm',
  previous_item_id: null,
  item: {
    id: 'item_ANEseIMRGIJfxEVNVVGXv',
    object: 'realtime.item',
    type: 'message',
    status: 'in_progress',
    role: 'assistant',
    content: []
  }
}
{
  type: 'response.content_part.added',
  event_id: 'event_ANEsf2OgXLocWYtxrXgaA',
  response_id: 'resp_ANEseTuL9L33wgRvD6AKh',
  item_id: 'item_ANEseIMRGIJfxEVNVVGXv',
  output_index: 0,
  content_index: 0,
  part: { type: 'text', text: '' }
}

Sometimes it generates normally, but more of than not it gets stuck at response.content_part.added.

It seems you continue to erase your initial instructions, setting them to an empty string.

Note these instructions from the API reference of client events.

The response.create event includes inference configuration like instructions, and temperature. These fields will override the Session’s configuration for this Response only.

(You can also be amused by the capitalization of random words in the documentation. The complete disagreement between the example and the reference parameters. Lack of stating if a parameter is optional. Complete non-documentation of a realtime.response object.)

Throw “Mandatory: talk like a pirate” in to see the instructions are maintained by your procedures.


I’d like you to observe these two events. No “done” from the API which would appear after a response is finished before you send again.

Raw message from OpenAI:
{“type”:“response.content_part.added”,“event_id”:“event_AMyCNzyDxThjeM9WEaBTo”,“response_id”:“resp_AMyCNxdOdh7l0KhaAJPyb”,“item_id”:“item_AMyCNm1j8KUnfAB906BW0”,“output_index”:0,“content_index”:0,“part”:{“type”:“text”,“text”:“”}}

Message out to OpenAI:
{“type”:“conversation.item.create”,“item”:{“type”:“message”,“role”:“user”,“content”:[{“type”:“input_text”,“text”:“can I get a response?”}]}}

It isn’t until line 107 that you actually get that your response.done, with a token usage report. Timestamps logged would tell us if the API was just unresponsive and the “create” initiated nothing.

It is true that you didn’t say "respond with your made-up “response” key JSON. Throw in some model overtraining on trying to make JSON outputs reliable, more overtraining on just producing desired voices keyed from unseen context and logit tune-ups, and you could get an AI unable to determine if its first token shouldn’t be “{” with a greater than 0 certainty.

I should have probably included timestamps, but the conversation.item.create that follows response.content_part.added is quite a bit later, after I manually send a message again because I didn’t receive a response for the previous one, to show that it starts to work around the third message (though this is not nearly deterministic and sometimes no amount of messages manages to get a response).

Since I am overriding the instructions, it makes sense that it maybe would respond in json, but I still don’t understand why I do not get a response.text for the first few requests (or sometimes many more).

I think when you pass conversation.item.create it doesn’t trigger a response.

So in your case you would additionally need to pass response.create right afterwards with your question/instruction and it should trigger a response.

I had the same issue when utilizing tool use/function use and I simply forgot to pass response.create. :smile:

Good luck!

1 Like

I do that. In the log, every conversation.item.create I follow with a response.create. And it does eventually start working normally, sometimes. But the first three or four messages are always ignored for some reason.

I found that if you input only text, it is very unreliable, I also found bugs where it would return seemingly random image information or unrelated text.

I recommend you to try sending an audio response.create, or alternatively, BOTH, so audio, text as modalities.

This way you will get audio most of the time and in my experience it was outputting reliable and expected information most of the time. :blush:

Ah, good to know it wasn’t just me going crazy. I guess I will use the old apis for text stuff and inject the responses into the audo history for when I need audio.

1 Like

Awesome to hear that this helped! Feel free to mark my response as an answer to make it easier for other people who might struggle with the same problem. :hugs:

Ok, I think they updated the API and now at least it comes back saying there was a server error instead of just never responding.

A server error usually indicates that you sent your payload wrong.
In case this is your issue, you can show the payload you send to OpenAI and we can go from there.

Best of luck. :blush:

I have all of the payload json’s in the top level post. I think it is not an error of my side since it is specifically a server error (the equivalent of 5xx http errors) from what I can tell. Also, it works sporadically, if it was a payload error it would fail consistently, I imagine.