Maximum Context Length Error with gpt-3.5-turbo-16k Models

Hi!

I am experiencing an error while working with the gpt-3.5-turbo-16k and the gpt-3.5-turbo-16k-0613 models, which are supposedly designed to handle a maximum context of 16,385 tokens. However, when I try to process a request that consists of 12,644 tokens, I’m encountering an “InvalidRequestError”.

The error message I receive is as follows:

“openai.error.InvalidRequestError: This model’s maximum context length is 8191 tokens, however you requested 12644 tokens (12644 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.”

I’m a bit confused about this, as my understanding is that the gpt-3.5-turbo-16k models should be able to handle up to 16,385 tokens. Could anyone shed some light on why I’m getting this error and how to rectify it?

Any assistance or guidance would be greatly appreciated.

Thank you.

Hi,

Can you post the code that is producing this error please.

Hello Foxabilo,

Thank you for your prompt response. I appreciate your willingness to assist.

Here is the actual code snippet that I’m using:

system_message = "You are a helpful assistant."
user_message = f"Here's a text. Please add more relevant details to it: {text}"

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k",
  messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ],
) 

print(response["choices"][0]["message"]["content"])

Please do let me know if you require additional information. Thank you again for your help!

Well I just tested this bit of code

response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo-16k',
        messages=[{"role": "user", "content": prompt}],
        max_tokens=13000,
    )

and it ran perfectly, I can only think that you are somehow calling the non 16k version somehow, do you only have one API in your code?

I notice you do not have a max_tokens in your call, might be something to try. Could be it’s defaulting to 8k if you do not specify it.

3 Likes

I’ve incorporated the ‘max_tokens’ parameter into my function call and also added a token count function to ensure that the total tokens, including the prompt and completion, do not exceed the model’s maximum context length. Here is my updated code:

import tiktoken
import openai

openai.api_key = OPENAI_API_KEY

MAX_TOKENS_CONTEXT = 16385

def count_tokens(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

system_message = "You are a helpful assistant."
user_message = f"Here's a text. Please add more relevant details to it: {text}"

complete_prompt = system_message + user_message

prompt_tokens = count_tokens(complete_prompt, "gpt2")

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k",
  messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ],
  max_tokens = MAX_TOKENS_CONTEXT - prompt_tokens,
) 

print(response["choices"][0]["message"]["content"])

As you had pointed out, not specifying ‘max_tokens’ seems to default the value to 8191, rather than the full 16385 tokens that the model is capable of handling.

Thank you very much for your help and your insightful suggestions. I really appreciate your time and expertise!

2 Likes