This model's maximum context length is 8191 tokens, Even when using gpt-3.5-16k

from langchain import OpenAI
from llama_index import LLMPredictor, GPTVectorStoreIndex, PromptHelper


llm_predictor = LLMPredictor(llm=OpenAI(model="gpt-3.5-turbo-16k" , max_retries=3))
prompt_helper = PromptHelper()
custom_LLM_index = GPTVectorStoreIndex(
    documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
)

Error
This model’s maximum context length is 8191 tokens, however you requested 16296 tokens (16296 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

even though the maximum context size for the 16k model is 16385

How am I supposed to find a workaround for this?

Thanks

If you are getting an error message saying 8192, can you please post the code used and a log of the error returned as that should not be happening.

The 16k model has a 16k context, for general use consider this number for be 16,000 as the system uses some tokens for internal use, if you absolutely need… lets say 16100 tokens that you need to experiment and ensure your application works across use cases.

That being said, the 16k is the maximum reply length and must also include your prompt, i.e. a prompt of 2k plus a response of 16k would be 18k and would cause an error. In the case of a 2k prompt you should set the reply token amount to 14k.

You can make use of the tiktoken library to calculate token numbers if you need to be precise.

import os
from langchain.chat_models import ChatOpenAI
from langchain import OpenAI
from pathlib import Path
from llama_index import download_loader


os.environ['OPENAI_API_KEY'] = "..."


PandasCSVReader = download_loader("PandasCSVReader")
loader = PandasCSVReader()
documents = loader.load_data(file=Path('trial.csv'))

from llama_index import LLMPredictor, GPTVectorStoreIndex, PromptHelper


# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(model="gpt-3.5-turbo-16k" , max_retries=3))


prompt_helper = PromptHelper()


custom_LLM_index = GPTVectorStoreIndex(
    documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
)

query_engine = custom_LLM_index.as_query_engine()

response = query_engine.query("Question Related to CSV")

print(response)

Yeah I basically need to query this csv file daily (it updates with new data), so 8k is unable to deal with that size of data, even though 16k would’nt be enough, but still a bit better.

Ok, well, you need to leave room for the reply and the data you send to it, so you will have to truncate any input down such that you leave enough room for response.

Yeah, I was gonna try that as well but I’m still in the unknown why has it been giving me the maximum of 8k context size error

I think that is a bug in the error message text, i.e. it got copy pasted from an 8k model and just not updated yet. Myself and others have done extensive testing on the 16K context and it is absolutely 16k :smiley: OpenAI’s offerings are always looked over with a fine toothed comb.

1 Like