Error Encountered When Using max_tokens Parameter with GPT-4 API

Issue Description : I am encountering a 400 Bad Request error when attempting to use the max_tokens parameter in a request to the GPT-4 API. The issue occurs when I try to limit the number of tokens in the response by sending the following request:

"model": "gpt-4",
        "messages": [
            {
                "role": "system",
                "content": "Ты бот-помощник для составления списков продуктов. Твоя главная задача давать точные ответы на запросы пользователя."
            },
            {
                "role": "user", 
                "content": f"{message}"
            },
        ],
        "max_tokens": 2048

When I remove the max_tokens parameter from the request, the response is returned without any errors.

"model": "gpt-4",
        "messages": [
            {
                "role": "system",
                "content": "Ты бот-помощник для составления списков продуктов. Твоя главная задача давать точные ответы на запросы пользователя."
            },
            {
                "role": "user", 
                "content": f"{message}"
            },
        ],
        #"max_tokens": 2048

I have reviewed the documentation but found no clear information on how to correctly use the max_tokens parameter with the GPT-4 API. I would appreciate assistance in resolving this issue.

I want to add the real reason for using this parameter:

I have been encountering an issue with the GPT-4 API where the generated responses are being truncated prematurely, even when the total tokens are under the specified or default limit. The finish_reason parameter in the API response indicates length as the cause of truncation, suggesting that the response was cut off due to reaching a token limit, although the total tokens have not reached the limit.

The response received is truncated and the finish_reason parameter returns length. The completion_tokens count in the response is well below the maximum limit, yet the response ends abruptly without completing the information requested.

I have tried varying the max_tokens parameter, but the issue persists. When max_tokens is not specified, the default behavior seems to truncate responses prematurely. I am seeking a solution to receive complete responses for the queries sent to the GPT-4 API, without any arbitrary truncation before reaching the token limit.

I have noticed similar issues being reported by other users on the OpenAI Developer Forum, where responses are being truncated with finish_reason indicating length, despite the total tokens being under the maximum limit.

I would appreciate any assistance or guidance on how to resolve this issue to ensure complete and detailed responses from the GPT-4 API. Please let me know if there’s any additional information required to diagnose and address this issue.

It seems that the whole point is that the token count applies to both input and output data summarily, i.e. if I have a large input query, the output result will be cut off?

Welcome to the OpenAI community @defendershow

The reason you’re encountering length as finish_reason is because your input is large enough to consume most of the model’s context length, which then reflects in generated response truncated.

The max_tokens is checked before sampling and in your case:
input + max_tokens > context length
Hence it results in 400 error.

Also, in case of chat completion all the context length apart from the input is set to max_token automatically.

2 Likes