Why am I hitting 300_000 tokens limit on GPT4.1. which should have 1M context length?

Hello everyone

  • I am sending requests to the new GPT4.1. model via Langsmith playground (look at the picture).
  • There is around 330000 tokens in the message
  • From this I get an error with this message:

openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “This model’s maximum context length is 300000 tokens. However, your messages resulted in 330294 tokens (including 57 in the response_format schemas.). Please reduce the length of the messages or schemas.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘messages’, ‘code’: ‘context_length_exceeded’}}

According to the documentation there should be 1M tokens am I right?

And yes I am 100% sure I am using GPT4.1. :smiley: :

  • invocation_params

  • _type: “openai-chat”

  • model: “gpt-4.1-mini”

  • model_name: “gpt-4.1-mini”

  • response_format: “<class ‘backend.edmund.tools.eplan_tool.pydantic_models.RerankingFormat’>”

  • stop: null

  • stream: false

  • temperature: 0.1

1 Like

Hi @jakub.szlaur

I was unable to reproduce this in my tests.

CompletionUsage(completion_tokens=23, prompt_tokens=343222, total_tokens=343245, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
Here's the code I used
import openai
from pydantic import BaseModel
from longst import longst
client = openai.OpenAI()

class Typos(BaseModel):
    pos: str
    word: str
    correction: str

text = longst*3


r = client.beta.chat.completions.parse(
    model='gpt-4.1-mini',
    messages=[{"role": "user", "content": text}],
    response_format=Typos
)

print (r.usage)

longst is just a very long string i.e around 141k Tokens long.

Can you try to test if this issue occurs when you run this code?

1 Like