Feature Request - Improved Handling of Maximum Context Length in Create Endpoints

Hi everyone,

I have a feature request for OpenAI. Please consider adding a flag to the create endpoints that automatically truncates input tokens if they exceed the model’s maximum context length. This will prevent errors and make integration smoother for developers.

My proposal is to introduce a flag in the create endpoints that, when enabled, would automatically truncate the input tokens if they exceed the model’s maximum context length. This would help prevent the occurrence of the InvalidRequestError error and provide a more seamless DX.

By implementing this feature, users would not need to manually truncate their input prompts to fit within the context length limit. The flag would handle the truncation process automatically, ensuring that the conversation remains within the allowed token range. Therefore, we avoid hacky solution like the following:

I would say do what the error message says. Reduce your input prompt by truncating the older messages.

Be proactive at estimating your input prompt and keep it under a certain level. You can do this by the estimate of W = T/sqrt(2), where W is the number of English words and T is the number of tokens.

In your case, T = 1800 (or 1700 for more margin). If you use 1700 tokens, then this is 1200 words. If you count more than 1200 words, then drop the older history until it fits to less than 1200 words.

Source: https://community.openai.com/t/gpt-3-how-to-reset-context-length-after-error/

The counter to that is to use a token counter such as TikToken prior to sending your message to the API.

The same problem remains. Performing such calculations shouldn’t be required. You may do so to manage truncation, but for standard truncation (oldest messages first), a flag would suffice.

Ultimately, the goal is to provide a similar experience for API users as is available in the ChatGPT interface (you are not deleting the old messages manually until the chat history fits, right?).

Well you run into other problems when doing it that way, e.g. A novice API user sends text that is too large and is reliant on some section of data from the start that is now truncated away, they are unaware of the Truncated flag and spend a long time trying to debug why some of their message work and some not. Doing it this way firmly places the responsibility for token management on the caller, thus reducing potential confusion.

The truncated flag would be false by default.

1 Like