How to analyze over huge data combined

Embeddings are useful for querying over large volume of data.
In vector databases we store embeddings and for retrieval we do similarity match to return the most similar matches.
My question: Is it possible for openAI apis to analyze the data across different chunks/embeddings?
Eg: Lets say I have a huge record of students with scores. I chunked the data (cause of token limit) and stored the embeddings in db.
Now if I want to get the count of all students who have secured scores >= 50 then will it return the total count across all the data? or will it return the count for just 1 chunk/embedding.

@a.singhal034 Welcome to the forum!

Possibly. Based on what you describe my understanding is that you plan to use a prompt to ask the question with scores >= 50 and expect a valid and accurate result. This is just as error prone as doing simple math addition with prompting.

If you really want an accurate result then just do it the old fashioned and accurate way with an SQL database and and a SQL query.

Embeddings with prompts is quite useful but it is not a one tool does all.

then isn’t contradicting the token limit?
if it can return the total count then it must have to load all the data in context… right?