I used deep research for gathering documentation comparing different api’s and model capabilities to speed up development of my apps that integrate with openai’s api. But I’ve noticed that it is so bad with hallucinations when citing openai documentation, in a way that makes me suspicious that there are some guardrails to hinder developers. With other topics I have had good results and it correctly represented it’s cited sources.
With GPT5.2 on a 30-45 minute deep research task about the GPT-Realtime documentation limited to 2025, came back with a report that was 100% useless and incorrect. It claimed GPT-realtime is another name for gpt-4o and is just gpt-4 with audio output… and the odd part was that it didn’t have incorrect information cited, I read all the cited source, and all of them were up to date, and non of them indicated this incorrect information. The second part of its research task was to collect all realtime speech-speech models available today and list them, and the resulting report said the only other speech-speech realtime models are SeamlessM4L and Bark… neither are conversation chatbot models and both are from 2023. Despite being told to only search for models that were released in late 2024 and 2025. After I made it aware of its mistakes and redescribed the research task, and provided 5 examples of speech to speech realtime models that exist, it went for another 45 minutes and then came back thinking that gpt-4o is gpt-realtime again and instead of finding me another example of a speech to speech model it repeated the information I gave it.
Other models that assist developers, like claude code, have very good self knowledge even without web search, If I want to learn the capabilities I can just ask. But nearly every gpt model is absolutely clueless about openai’s api features, and clueless about the models openai offers and rampantly hallucinates.
Some people respond that this is typical behaviour of deep research, but I disagree. Deep research usually can return useful results about most topics if this was typical then the entire model/deep-research product is a failure. This specifically hallucinates and doesn’t follow instructions when it is related gathering state of the art AI research, or about building voice AI chatbots even with OpenAI’s api. It doesn’t make sense that they would sabatoge a model from helping customers use their product but I guess it makes sense to nerf software developers working on AI products.