Real context sharing by assistant within thread

mb0 · March 25, 2024, 7:42pm

I was experimenting with openai-api and Assistant of retrieval tool type. I have a large khowledge base file e.g. with 100K tokens, uploaded into Assistant and I need to make different retrieval questions about it, but seems each time the gpt model re-reads the whole file to answer (it’s seen by consumed token count), which puzzled me: shouldn’t the Assistant read the knowledge base once, create internal context for it inside and use it to answer next questions without re-reading all input tokens again?

More details what I did:

created Assistant and uploaded large khowledge base to it
created new thread with that Assistant and added first message-question there
started running that thread, waited until it’s completed successfully (e.g. 30 sec and consumed about 80K tokens, the most of the file approx.)
checked model responses to first message/question
added second message/question and repeated steps until answer
checked the response, it’s ok, but it again consumed e.g. 90K tokens from the khowledge file of Assistant…

so if that is the expected mode of operation - then Assistant is just seemed as … a file holder?
Otherwise - why it doesn’t create kind of interpreted context internally after first read of the input file and use it mainly for each next question without consuming token budget again, e.g. re-reading only small amount of tokens from input file (which were not taken into account on the previous reads etc.)

Thanks, apart from that - Assistant & Threads seems as very needed features!

Topic		Replies	Views
Does `context token` including the uploaded file in Assistant messages? API assistants-pricing	4	1247	March 26, 2024
Too many input tokens are used by Assistant Feedback assistants-api	2	152	November 20, 2024
Self-built OpenAI Assistant - How to avoid loading file and creating assistant obj every time during conversation between user and Assistant? API	4	559	February 6, 2024
New "Assistants" API a potential replacement for low level "RAG" style content generation? API	9	8539	March 4, 2024
Assistants API context tokens Number API assistants-api	4	978	December 4, 2023

Real context sharing by assistant within thread

Related topics