Query Misinterpretation in Realtime Preview

adityavishwak99 · January 18, 2025, 4:50pm

I’ve noticed a frustrating issue with the OpenAI Realtime Preview. For example, if I ask a question like “What does grift mean?”, the system correctly picks up the term “grift” but often provides an explanation that’s completely unrelated. This happens frequently, and it’s quite annoying to deal with.

It feels like the model struggles to stay focused on the specific question being asked, which makes it harder to trust the responses. I hope this can be improved so the system better aligns with user queries in the future.

kevin6 · January 18, 2025, 9:02pm

If I understand, this issue comes from the unbalanced training data of audio compared to text, which makes the model not handle smaller audio inputs in the way text are being handled or processed (both text and audio are in one embedding space but the audio has different features compare to transcribed text form), although I’m not sure if this is what you’re complaining about. But one tip is: when asking a question try to be more descriptive or add more background and context, instead of “What does grift mean?” asking “Dear Chatbot, today I am wondering what the term “grift” means, and when to use it and when not to?”

adityavishwak99 · January 19, 2025, 10:41am

This is indeed quite insightful. Let me try and check the effect on Voice Mode responses. If you have any other recommendation please reply

adityavishwak99 · January 19, 2025, 10:50am

okay I am using this system instruction and its proving helpful if I state the topic of conversation upfront to the voice mode.

“”"
You shall provide comprehensive and detailed responses regardless of input length or formality. Short queries like “What is X?” should receive the same depth of analysis as longer, more formal versions. Never penalize concise questions with shallow answers. Treat all queries as equally deserving of thorough responses. Compare each new question against conversation history, identifying novel elements or different angles to ensure responses build upon rather than repeat past insights. Always respond in paragraph format and limit responses to 30 words unless explicitly requested otherwise.
“”"

steK · January 21, 2025, 9:40am

I’m getting similar issues. I say “I’m interested in buying a new car” and it says “Great. Bikes are an excellent way to get around”

I’ve tried a bit but haven’t been able to get it to pick up on context of the conversation that well at all.

MKMKC · January 21, 2025, 12:05pm

I noticed some issues with voice recognition in (Japanese?) conversations. This might not be exactly what you were asking about, but from what I understand:

The voice recognition seems to have trouble accurately capturing short (Japanese?) phrases. From my experience, adding filler words like “ano-” or making the sentence intentionally longer helps with recognition accuracy. When checking the conversation history, I’ve noticed that short phrases are sometimes recorded differently from what was actually said.

This seems to be specifically an issue with Japanese language processing, though I’m not sure about how it performs with English. Hope this helps clarify the situation, but please correct me if I’ve misunderstood anything!

adityavishwak99 · February 3, 2025, 10:55am

someone at OpenAI needs to fix this

Topic		Replies	Views
Follow-up Inquiry on Realtime API Issues in AI Interviewer Implementation Bugs	1	210	December 6, 2024
Challenges in Multilingual Understanding with Realtime APIs Feedback bug , realtime , api-realtime	3	900	October 25, 2024
RealTime API Transcription errors Bugs realtime	7	1613	January 9, 2025
Issue with Mismatch Between Realtime Audio and Transcription Text Bugs realtime	2	302	February 4, 2025
About gpt4o audio (wasteful credit consumption) Bugs api	1	185	December 13, 2024

Query Misinterpretation in Realtime Preview

Related topics