I have fed an assistant API with about 400mb of data . PDFs . and now . When I run any simple prompt. It fails .
When using gpt-4o the error is the 30000tocken max limit . And when using gpt-3.5-… the error is unknown.
How can I deal with this .
PS : I really need the 400mb of data to be considered and at the same time I wanna optimize the consomption of tokens.
You might take a look at vector storage options like ChromaDB, Pinecone, Weaviate, etc.
When used in conjuctions with the likes of LangChain or Llama Index, these can be built fairly simply with Python to create very powerful vector stores for unlimited amounts of source data.
Anyway a lay person can do this?
The 30k token limit could be because of your org being at Tier-1.
I’d recommend either upgrading the tier or using gpt-4o-mini
.
thank you all ; I will take your answers into consideration. I might actually try use Lamma Index locally; still worried about the server performance if we scale ‘hopefully’
That is an impossibility. You can only load a small amount of information into the context window of an AI model. Unless by “considered” you simply mean “searched upon”.
Using Assistants, the dumb platform will try to load that small bit of the top search matches returned by file_search, and make continued model calls – over your token-per-minute budget, causing a rate limit error.
The important thing here is that it is not the amount of data in the vector store. It is the size of the return that is loaded into the model. About 15000 tokens or so from a typical file_search at the default parameters.
You can reduce the number of chunks returned from file search, down from the default of 20, as a setting for the Assistant ID. Or you can delete the vector store and re-embed all the documents with a smaller chunk size. Or both. Plus not have long conversations with a system that puts the maximum into a model possible from past conversations.
An external database will not fix the rate limit, but at least on Chat Completions, you can avoid API calls that are guaranteed to fail.
Or pay more money to OpenAI, $50+ in past payments to level up – which is obviously what they want by crippling the rate limit.