Hello, I am having problems with the existing model prompt results when I use the Gpt-4o August 6th model, as the number of tokens used in the results is reduced by about 1/3.
To explain a little bit about the application I’m building, it’s a product that generates long articles based on keywords entered by the user, and it has 6 paragraphs and each paragraph should contain at least 4-5 sentences.
Using the May 13th model, the output uses about 3500 tokens, but the August 6th model uses only 1300 tokens and the quality of the prompt results has decreased.
output with gpt-4o-2024-08-06
{
“model”: “gpt-4o-2024-08-06”,
“result”: “some eror msg, in my language”,
“total_tokens”: 1725
}
I’m having the same problem. We upgraded to gpt-4o-2024-08-06 explicitly because of the marketed increased max token output (4,096=>16,384). Despite this, running the exact same input on both gpt-4o and gpt-4o-2024-08-06 produces far fewer output tokens on the newer model. Sometimes as much as a reduction of nearly 90%
Any updates on this? We are definitely encountering this issue with our internal benchmark tests. We are running hundreds of queries, and there is a statistically significant difference in output token length and answer quality.
The most concerning part is that the previous model (2024-05-13) has become slower since the new version was released. If the latest model isn’t going to outperform the previous model in every aspect, and if OpenAI is cutting resources for the previous versions, we will have to make trade-offs in our migration. This is not ideal.
I already noticed shorter and more clean and direct answers from gpt-4o-2024-08-06 and liked it for situations where I manually ask it something (I am using a GUI tool to do this through the API).
However, I am also having this use case where I am using it regulary to convert on the fly notes into structured text with a certain writing style that my prompt describes. The current gpo-4o model (gpt-4o-2024-05-13) does a very good job of staying close to the original text and keeping every single detail of it.
gpt-4o-2024-08-06 however produces only a bit more than half of the original text and to achieve this it leaves out details from the original notes, sometimes an entire section. When I then tell it to also consider that section, it does, but leaves out a different section. It seems that this model is trained to save on output at all cost, which contradicts the increased output token capability.