4096 completion token limit with gpt-4o. Using assistant streaming API

adnanboz · July 20, 2024, 5:20pm

Hi guys,
I chat with OpenAI support and they confirmed that the gpt-4o has a limit of 4096 completion tokens and I should use a strategy to work around it. (btw, It is not documented on the model card). Now, I am trying to catch when the assistant streaming api call hits the limit. But, it seems to me that there is a discrepancy between the documentation at https://platform.openai.com/docs/assistants/deep-dive/run-lifecycle and how the API actually behaves.

The documentation says:

incomplete: Run ended due to max_prompt_tokens or max_completion_tokens reached. You can view the specific reason by looking at the incomplete_details object in the Run.
in_progress: While in_progress, the Assistant uses the model and tools to perform steps. You can view progress being made by the Run by examining the [Run Steps](https://platform.openai.com/docs/api-reference/runs/step-object).

How the APi behaves:
To be clear this is my Python code:

with (OpenAIClient.api.beta.threads.runs.stream(**stream_params) as stream):
       run = stream.get_final_run()

A) When I don’t set the max_completion_tokens or set it to a higher number than 4096. The run finishes without any errors, with “completed” status code. Everything including the “usage” information is available. The only way to see the problem is to actually look at the end of the message to see if the sentence or json, or whatever result I am expecting is cut off.

B) To work around this, I tried to set the max_completion_tokens to 4096. But, on the contrary what the documentation says the run finishes with “in_progress” status rather than “incomplete”. Also, the run does not return “incomplete_details”. But, I found out that the message object returns incomplete_details with the reason “max_tokens”. So, the following code seems to be working to detect if the message has been cut-off and that I need to apply my strategy.

if run.status in ("in_progress", "incomplete") and \
                                final_messages and len(final_messages) > 0 and \
                                final_messages[0].status == "incomplete" and \
                                final_messages[0].incomplete_details.reason == "max_tokens":

Just want to check if you see any issues in this approach, if I am in the wrong path, if you encountered this issue?

Topic		Replies	Views
Assistant max_completion_tokens not returning status incomplete API gpt-4 , assistants-api	0	117	October 8, 2024
Large JSON Responses from Assistant API are truncated API json , assistants-api	5	1484	June 20, 2024
Assistant API v2: max_prompt_tokens gets exceeded, barely, consistently Bugs	5	948	July 4, 2024
Assistant max_completion_tokens not working as expected API assistants-api	4	2240	April 29, 2024
Assistant API run status incomplete with max_prompt_tokens when max_prompt_tokens is NOT set API assistants-api	3	1109	June 26, 2024

4096 completion token limit with gpt-4o. Using assistant streaming API

Related topics