Hello, I am having problems with the existing model prompt results when I use the Gpt-4o August 6th model, as the number of tokens used in the results is reduced by about 1/3.
To explain a little bit about the application I’m building, it’s a product that generates long articles based on keywords entered by the user, and it has 6 paragraphs and each paragraph should contain at least 4-5 sentences.
Using the May 13th model, the output uses about 3500 tokens, but the August 6th model uses only 1300 tokens and the quality of the prompt results has decreased.
output with gpt-4o-2024-08-06
{
“model”: “gpt-4o-2024-08-06”,
“result”: “some eror msg, in my language”,
“total_tokens”: 1725
}
I’m having the same problem. We upgraded to gpt-4o-2024-08-06 explicitly because of the marketed increased max token output (4,096=>16,384). Despite this, running the exact same input on both gpt-4o and gpt-4o-2024-08-06 produces far fewer output tokens on the newer model. Sometimes as much as a reduction of nearly 90%
Any updates on this? We are definitely encountering this issue with our internal benchmark tests. We are running hundreds of queries, and there is a statistically significant difference in output token length and answer quality.
The most concerning part is that the previous model (2024-05-13) has become slower since the new version was released. If the latest model isn’t going to outperform the previous model in every aspect, and if OpenAI is cutting resources for the previous versions, we will have to make trade-offs in our migration. This is not ideal.