Lack Of Self Knowledge and Rampant Hallucination with Openai Documentation

I used deep research for gathering documentation comparing different api’s and model capabilities to speed up development of my apps that integrate with openai’s api. But I’ve noticed that it is so bad with hallucinations when citing openai documentation, in a way that makes me suspicious that there are some guardrails to hinder developers. With other topics I have had good results and it correctly represented it’s cited sources.

With GPT5.2 on a 30-45 minute deep research task about the GPT-Realtime documentation limited to 2025, came back with a report that was 100% useless and incorrect. It claimed GPT-realtime is another name for gpt-4o and is just gpt-4 with audio output… and the odd part was that it didn’t have incorrect information cited, I read all the cited source, and all of them were up to date, and non of them indicated this incorrect information. The second part of its research task was to collect all realtime speech-speech models available today and list them, and the resulting report said the only other speech-speech realtime models are SeamlessM4L and Bark… neither are conversation chatbot models and both are from 2023. Despite being told to only search for models that were released in late 2024 and 2025. After I made it aware of its mistakes and redescribed the research task, and provided 5 examples of speech to speech realtime models that exist, it went for another 45 minutes and then came back thinking that gpt-4o is gpt-realtime again and instead of finding me another example of a speech to speech model it repeated the information I gave it.

Other models that assist developers, like claude code, have very good self knowledge even without web search, If I want to learn the capabilities I can just ask. But nearly every gpt model is absolutely clueless about openai’s api features, and clueless about the models openai offers and rampantly hallucinates.

Some people respond that this is typical behaviour of deep research, but I disagree. Deep research usually can return useful results about most topics if this was typical then the entire model/deep-research product is a failure. This specifically hallucinates and doesn’t follow instructions when it is related gathering state of the art AI research, or about building voice AI chatbots even with OpenAI’s api. It doesn’t make sense that they would sabatoge a model from helping customers use their product but I guess it makes sense to nerf software developers working on AI products.

1 Like

gpt-4o-realtime-xx is just another name for gpt-4o - it has gpt-4o right in the name. gpt-4o is a multimodal model that can directly generate audio and is trained on voices.

OpenAI doesn’t tell us the underlying multimodal model for gpt-realtime, but given the release date and parallel models, it is possible that it is a gpt-5 variant, but still more likely that it is tuned gpt-4o and they wanted to get rid of the 4o branding when using a name like “gpt-realtime-mini-2025-12-15”.

Look at the unidirectional models most recently released with the same date and mini size: gpt-4o

  • tts-1, tts-1-hd, gpt-4o-mini-tts, or gpt-4o-mini-tts-2025-12-15.
  • gpt-4o-transcribe, gpt-4o-mini-transcribe-2025-12-15

First thing is to not act smarter than AI, it’s there to give you expert answers.


The fault is OpenAI putting documentation behind a login, and loading it dynamically. They really should have a separate documentation site that runs no code and doesn’t take 20 seconds to show you an example API call when you have the direct link to the anchor within.

You can answer your own question by using the models API endpoint, and filtering by the word “realtime” in the models. For free.