GPT-4o 2024-08-06 - Context Output 16k Tokens - My Requests Max Tokens Around ~3k

Hello everyone,

According to the documentation for GPT-4o, the output window is approximately 16k tokens. However, in my requests, I can’t seem to generate responses that exceed 3.1k tokens.

Could someone kindly guide me on this?

Thank you!

[
{
“id”: “chatcmpl-AjVlynmQu4R7OXV5ky8iau295QcVa”,
“object”: “chat.completion”,
“created”: 1735410258,
“model”: “gpt-4o-2024-08-06”,
“choices”: [
{
“index”: 0,
“message”: {
“role”: “assistant”,
“content”: "{content} ",
“refusal”: null
},
“logprobs”: null,
“finish_reason”: “stop”
}
],
“usage”: {
“prompt_tokens”: 2699,
“completion_tokens”: 2542,
“total_tokens”: 5241,
“prompt_tokens_details”: {
“cached_tokens”: 2432,
“audio_tokens”: 0
},
“completion_tokens_details”: {
“reasoning_tokens”: 0,
“audio_tokens”: 0,
“accepted_prediction_tokens”: 0,
“rejected_prediction_tokens”: 0
}
},
“system_fingerprint”: “fp_d28bcae782”
}
]

Hello @guile.brazil,

Welcome to the forum.

The maximum output tokens is the upper limit up to which the model can generate tokens. Any outputs exceeding this limit will have the ”finish_reason” : “length”.

In your case, the completion reached its logprob-ablistic end, i.e., the model finished generating tokens and emitted the default stop sequence.

2 Likes

Can you share your prompt?

Also, the latest 4o usually gives me longer output if I feed it a good prompt.

1 Like