Gpt-4-1106-preview 16385 max context tokens? (not output, total)

gpt-4-1106-preview shows a context window of 128k tokens on the API docs, but I am getting the following error when hitting the API:

This model’s maximum context length is 16385 tokens. However, your messages resulted in 18572 tokens (18487 in the messages, 85 in the functions). Please reduce the length of the messages or functions.

I am on Usage tier 4 and not sure why this is happening when my request for output tokens is well under 4096. Could not figure out any answers when searching the answers here.

I just tried this out:

trial 1: expected error, expected message

Error code: 400 - {‘error’: {‘message’: “This model’s maximum context length is 128000 tokens. However, your messages resulted in 131079 tokens. Please reduce the length of the messages.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘messages’, ‘code’: ‘context_length_exceeded’}}

Here is code set to send 18490 in the messages, 27 in the functions. It is set to gpt-3.5-turbo which will give expected error. Switch the commented lines from gpt3 to gpt4 if you want to pay $0.17 a test (nobody’s paying my API bill…)

import openai
from openai import OpenAI
client = OpenAI()

tools = [
    "type": "function",
    "function": {
      "name": "disable",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",

msg = "!@" * (2**16)  # 131072 tokens
msg = "!@" * 9242  # string is two tokens

completion = None
    completion =
      # model="gpt-4-1106-preview", max_tokens=1,
      model="gpt-3.5-turbo", max_tokens=1,
        {"role": "system", "content": msg},
except Exception as e:
if completion:

You can also comment out the msg = "!@" * 9242 line which will send over the full advertised context length and get the top message I show.

Interesting - tried it on another box with openAI python package of 1.x and it works. On our production machine we’re on the 0.x version and upgrading the package introduces breaking changes. Guess we’ll have to refactor and just get that going here shortly and try that.

1 Like