API | Max Token Error | Tier 4 | Fluctuating between 128000 and 4096

My account is on Teir 4.

I tried using: max_tokens= 128000 and got the following error:

This model's maximum context length is 128000 tokens. However, you requested 128295 tokens (295 in the messages, 128000 in the completion). Please reduce the length of the messages or completion.

So I used GPT2Tokenizer.from_pretrained(“gpt2”) to get the token length of my message to subtract it from my max tokens

Rerun it again and got the following error:

max_tokens is too large: 127688. This model supports at most 4096 completion tokens, whereas you provided 127688.

It seems the goal posts of my max allowable tokens have been moved???

model_id = 'gpt-4-1106-preview'

def count_tokens(message):
    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    tokens = tokenizer.encode(message)
    return len(tokens)

def chat_gpt_conversation(conversation_log, api_key = sk):
    try:
        text = ""
        for i in conversation_log:
            text += i['content']
        context_count = count_tokens(text)

        openai.api_key = api_key
        response = openai.ChatCompletion.create(
            model=model_id,
            messages=conversation_log,
            temperature=temperature,
            max_tokens= 128000 - context_count 
        )
        return response

    except Exception as e:
        print((e))
        return None
1 Like

You’ve exceeded the model’s limitations in two different ways and get two different API errors.

  • First you went over the total context length.
  • Then you exceeded the maximum output that is allowed.

You are counting the tokens wrong.

GPT-4 doesn’t use GPT-2’s tokenizer, it uses a token encoder three generations beyond that called cl100k-base. The library to use is tiktoken.

Solution: set max_tokens 4096 or less, as that is all the output that OpenAI allows, and you will need to use jailbreak prompt engineering to get this gimped model to output anywhere near that.

Yes, you read that right: The maximum that gpt-4-turbo models (with 128k) will output is 4k tokens. For OpenAI, input is near-free and billable, output costs compute.


How about this response reported on Reddit? “As an AI developed by OpenAI I aim to follow guidelines and policies that prioritize ethical considerations, user safety, and the responsible use of AI. One of these guidelines restricts me from generating full, complete solutions for complex tasks, especially when they involve multiple advanced technologies like image processing, machine learning, and database management.”

4 Likes

Thank you for taking the time to explain. Greatly appreciated.

Happy programming to you to!


If you want to have more fun inspiration, imagine sending the current question and a bit of context to gpt-3.5-turbo, along with the ideal amount of input context, and asking it to classify the complexity of answering needed.

Enough rules, looking at whether you want to chat, code, summarize, or write at length, and that AI could make a per-input determination of the best of gpt-3.5-turbo-1106 (16k but 4k out), gpt-3.5-turbo-16k, GPT-4 (smarter), or gpt-4-preview (turbo), especially when the last can cost over $1 per call.