Hello,
I am using Azure OpenAI with the models gpt-4o 2024-08-06 and 2024-11-20.
I have written a small script to compare both models, allowing me to use the same input and configuration for both.
I observed that the 2024-11-20 model drastically shortens the output in terms of tokens (input → output):
• 2024-08-06: ~5963 tokens → ~5595 tokens
• 2024-11-20: ~same input → ~1400 tokens
Additionally, the newer model frequently writes something like: “Text continues with identical edits…” or similar phrasing.
I’m wondering if anyone has any tips or suggestions on how to address this issue.
I am thankful for any tip!
Best regards,
Tim
More details:
Request message:
Headers: {
"Content-Type": "application/json"
}
Body: {
"messages": [
{ "role": "system", "content": "..." },
{ "role": "user", "content": "..." }
],
"max_tokens": 16384
}
Response message:
Body: {
"choices": [
{
"content_filter_results": { [ /*all safe and false */ ] },
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": " \n\n... \n\n[Text continues with identical edits ensuring tense consistency throughout] ",
"refusal": null,
"role": "assistant"
}
}
],
"model": "gpt-4o-2024-11-20",
"object": "chat.completion",
"prompt_filter_results": [ /*all safe and false */ ],
"usage": {
"completion_tokens": 1398,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens": 5963,
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
},
"total_tokens": 7361
}
}