Hey all,
We’ve been using gpt-4-1106-preview for past several months and have been getting high quality responses (JSON) in an academic solution. We are trying to generate a bunch of questions grouped by type and correct answers all within the JSON. As per the documentation several settings were tuned to get high quality and reliable JSON format.
However, due the popular choice and 128k token limit, we were forced to use the brand new GPT-4o model for our entire application. And we were in for a surprise, because this new model is lighning fast everywhere. However, the more complex the request becomes, the more unreliable the responses become. Because all of our response handlers are programmed to parse JSON, we asked JSON from the same model. It is not giving a valid JSON. It breaks at many areas. More so, when concurrent requests are sent, it breaks 50% of the responses, and we rely on the group of responses to build or save our information. When we switched back to the older model (gpt-4-1106-preview), all our responses are accurate and valid. However, it’s slower.
Does anyone have an accurate comparisona between both the models (gpt-4-1106-preview and gpt4-o) with regards to validitiy and reliability of the JSON responses? Why is the former still better than the supposed new , upgraded model?