I am working on a frontend prototype to test the real-time API. I started with text-only messages as documented in the official docs. My sendMessage
function is straightforward:
const sendMessage = () => {
if (ws && isConnected.value) {
const event = {
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [
{
type: 'input_text',
text: 'Say three random words!'
}
]
}
};
ws.send(JSON.stringify(event));
ws.send(
JSON.stringify({
type: 'response.create',
response: {
modalities: ["text"],
instructions: 'Please assist the user.'
}
})
);
} else {
console.warn('[AI Function] WebSocket is not connected.');
}
};
On the first sendMessage
(right after the WebSocket connection is available), I often get strange answers with the following issues:
Issue 1: Forced JSON
The very first request in a session usually generates the answer in a random JSON format. Subsequent requests in the same session then usually work and return pure text.
Example:
User: Say three random words!
Assistant:
{"random_words":["Sunflower","Journey","Harmony"]}
The next requests in the same session then usually work and return pure text as expected:
User: Say three random words!
Assistant: Serendipity, Cascade, Whisper.
Issue 2: Potential Content Bleeding
The first request sometimes generates the answer in a random JSON format with random data in it (ignoring the question). Answers are sometimes very specific to certain topics (but ignoring my request). This looks like potential “content bleeding.”
Example:
User: How are you?
Assistant:
{"Account Balance": "$3552.60"}
Or:
User: Say “Hello!”
Assistant:
{"Temperature": "34 Degrees"}
Issue 3: Potential Cost Risk (and Maybe “Content Bleeding”)
The first request sometimes generates a huge number of pseudo-random “image placeholder” tags until I break the WebSocket connection.
Example:
User: Say three random words!
Assistant:
<|is_landscape_image|><|xlimage|><|image_border_1024|><|vq_image_2035|><|vq_image_5132|><|vq_image_5132|><|vq_image_5132|>
… [repeats many times]
Issue 4: Incoming Events stop after response.content_part.added
After first “request.create” the upcoming events sometimes stop after “response.content_part.added” was received. Every subsequent request.create in the same session will then also be “broken” (most of the time).