We have taken subscription of paid API key for accessing Open AI models through API in our Python Code and currently we are in Tier 1. I am using ‘GPT-4o’ model and performing RAG over our custom data. But, when I take 10 pages Document and ask question over it , it is giving me following error:
Error code: 400 - {‘error’: {‘message’: “This model’s maximum context length is 8192 tokens, however you requested 21026 tokens (21026 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.”, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}
My question is, GPT-4o is having Context window of 128K then, I ideally I should not get the above errors.
We are using updated Langchain library in our code. We also updated openai library to latest version and using the following approach we still got error as: “‘ChatCompletion’ object has no attribute ‘system_prompt’” —Error coming from Step1 below
client = OpenAI(api_key="OPEN_AI_KEY_XXXXXXXX")
response=client.chat.completions.create(model="gpt-4o",max_tokens=256,
messages=[
{"role": "system",
"content": """You are a helpful assistant for question-answering tasks."""},
{"role": "user", "content": user_input},],temperature=0)
Step 1: passing ‘response’ object to ServiceContext object of LllamaIndex
Step 2: Loading data from ElasticSearch using LlamaIndex
Step 3: passing ServiceContext Object to class GPTVectorStoreIndex of LlamaIndex
Step 4: Creating index using LlamaIndex
Step 5: Creating query_engine
Step 6: firing query using query engine over custom documents (of ElasticSearch which are loaded)
The setting you see is for the maximum response length. Not the total model context length.
All models since November 2023 have a cap on the length OpenAI allows them to produce as output. That makes them big input context, low output. (and input is now processed very fast with low attention paid)