Big vector DB making prompting impossible

alaaae2000 · November 14, 2024, 12:37am

I have fed an assistant API with about 400mb of data . PDFs . and now . When I run any simple prompt. It fails .
When using gpt-4o the error is the 30000tocken max limit . And when using gpt-3.5-… the error is unknown.
How can I deal with this .
PS : I really need the 400mb of data to be considered and at the same time I wanna optimize the consomption of tokens.

Foxalabs · November 14, 2024, 1:03am

You might take a look at vector storage options like ChromaDB, Pinecone, Weaviate, etc.

When used in conjuctions with the likes of LangChain or Llama Index, these can be built fairly simply with Python to create very powerful vector stores for unlimited amounts of source data.

waywardcompass · November 15, 2024, 11:38am

Anyway a lay person can do this?

sps · November 15, 2024, 11:43am

@alaaae2000

The 30k token limit could be because of your org being at Tier-1.

I’d recommend either upgrading the tier or using gpt-4o-mini.

alaaae2000 · November 16, 2024, 4:38pm

thank you all ; I will take your answers into consideration. I might actually try use Lamma Index locally; still worried about the server performance if we scale ‘hopefully’

_j · November 16, 2024, 6:20pm

That is an impossibility. You can only load a small amount of information into the context window of an AI model. Unless by “considered” you simply mean “searched upon”.

Using Assistants, the dumb platform will try to load that small bit of the top search matches returned by file_search, and make continued model calls – over your token-per-minute budget, causing a rate limit error.

The important thing here is that it is not the amount of data in the vector store. It is the size of the return that is loaded into the model. About 15000 tokens or so from a typical file_search at the default parameters.

You can reduce the number of chunks returned from file search, down from the default of 20, as a setting for the Assistant ID. Or you can delete the vector store and re-embed all the documents with a smaller chunk size. Or both. Plus not have long conversations with a system that puts the maximum into a model possible from past conversations.

An external database will not fix the rate limit, but at least on Chat Completions, you can avoid API calls that are guaranteed to fail.

Or pay more money to OpenAI, $50+ in past payments to level up – which is obviously what they want by crippling the rate limit.

Topic		Replies	Views
Request too large for gpt-4o in organization Bugs	3	4671	October 9, 2024
Does OpenAI not chunk my documents in vector store? API gpt-4 , assistants-api , vector-store	1	176	November 11, 2024
Assistant API - way too much "input" tokens used API assistants-api , assistants-pricing	7	4277	September 6, 2024
Connecting assistant API to external api API	9	2043	July 30, 2024
Seeking guidance on managing long conversations and token limits while implementing ChatGPT in a mobile app for a design application API	6	2319	November 15, 2023

Big vector DB making prompting impossible

Related topics