GPT-4o Context Window is 128K but Getting error model's maximum context length is 8192 tokens, however you requested 21026 tokens

udaykumarj · June 5, 2024, 7:21am

We have taken subscription of paid API key for accessing Open AI models through API in our Python Code and currently we are in Tier 1. I am using ‘GPT-4o’ model and performing RAG over our custom data. But, when I take 10 pages Document and ask question over it , it is giving me following error:

Error code: 400 - {‘error’: {‘message’: “This model’s maximum context length is 8192 tokens, however you requested 21026 tokens (21026 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.”, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

My question is, GPT-4o is having Context window of 128K then, I ideally I should not get the above errors.

Diet · June 5, 2024, 7:25am

Welcome to the community!

There’s multiple things that could be going wrong here. Could you post your entire request? (don’t forget to take out your API key)

udaykumarj · June 5, 2024, 7:54am

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model=‘gpt-4o’,temperature=0, max_tokens=256)

we are passing ‘llm’ to ServiceContext of LlamaIndex
we are loading custom data and passing it to GPTVectorStoreIndex
Then using LlamaIndex we are performing inferencing over custom data.

merefield · June 5, 2024, 8:06am

I think the community would find it helpful if you could share the ultimate REST calls made to the API - can you get those from logs?

anon22939549 · June 5, 2024, 11:56pm

My guess is this is a langchain issue. Make sure you’re updated to the latest version or use the official OpenAI library.

udaykumarj · June 6, 2024, 6:46am

We are using updated Langchain library in our code. We also updated openai library to latest version and using the following approach we still got error as:
“‘ChatCompletion’ object has no attribute ‘system_prompt’” —Error coming from Step1 below

 client = OpenAI(api_key="OPEN_AI_KEY_XXXXXXXX")
            response=client.chat.completions.create(model="gpt-4o",max_tokens=256,
                                               messages=[
                                                   {"role": "system",
                                                    "content": """You are a helpful assistant for question-answering tasks."""},
                                               {"role": "user", "content": user_input},],temperature=0)

Step 1: passing ‘response’ object to ServiceContext object of LllamaIndex
Step 2: Loading data from ElasticSearch using LlamaIndex
Step 3: passing ServiceContext Object to class GPTVectorStoreIndex of LlamaIndex
Step 4: Creating index using LlamaIndex
Step 5: Creating query_engine
Step 6: firing query using query engine over custom documents (of ElasticSearch which are loaded)

anon22939549 · June 6, 2024, 7:17am

It’s a langchain issue.

You’ll be more likely to get help there.

vupham · July 4, 2024, 5:02am

I’m currently using the GPT-4 API with a 4K token limit, as confirmed in the Playground. How can I increase the maximum token count to 128K?

_j · July 5, 2024, 1:48am

The setting you see is for the maximum response length. Not the total model context length.

All models since November 2023 have a cap on the length OpenAI allows them to produce as output. That makes them big input context, low output. (and input is now processed very fast with low attention paid)

hxrdxk · October 21, 2024, 6:56am

hey, same error. have you got any solution?

Topic		Replies	Views
Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens API api , token , gpt-4-turbo	8	5565	March 18, 2024
GPT-4o context window confusion API gpt-4 , api , error , gpt-4o	5	54111	August 4, 2024
GPT-4 API only supports 4096 context length? API gpt-4 , api	5	2389	December 19, 2023
Maximum Context Length Error across different models API	3	3293	December 4, 2023
Gpt-4-1106-preview 16385 max context tokens? (not output, total) API gpt-4	2	3223	December 12, 2023

GPT-4o Context Window is 128K but Getting error model's maximum context length is 8192 tokens, however you requested 21026 tokens

Related topics