Help! Maintaining Large Contexts with Assistants

I urgently need some help. I’m struggling to figure this out:

  • Assistant will have a set of educational material to refer for answers.
  • Each user will have a fairly large (> 32k characters) context that will be needed to answer their queries.

Here are the conclusions I’ve come to, but can’t help but feel perhaps a different model, like Gemini, would be better suited:

  1. Dynamically create and destroy instances of the same Assistant for each user. The challenge here is that I would have to connect the same reference files for each, increasing costs and making the setup messy and complex.

  2. Dynamically attach the user’s context to a message in the thread. My concern with this is that in the course of the conversation, that message may be pruned. The only option I can figure is to attach the same file ID to each of the user’s messages, but each instance would incur a per gig per day cost. (My assumptions and understandings here may be flawed.)

  3. Have an instance of gpt-4 take in the user’s full context, generate an appropriate summary, and then feed this summary to the Assistant in addition to the user’s query. This would ensure only the most important information is passed to the Assistant, and within relevant context. But, this will be super slow.

Things to keep in mind:

  • The user’s file/context will be regularly updated and is quite large in some cases.
  • Performance needs to be reasonable. Assistants API is already slow enough with some lookups taking over 60 seconds (when referencing files for answers).

Any ideas? Should I just use a different setup for this? Any of the options above make sense?

No. Absolutely never should be this an option.

You would need to give the Thread / Message the content as a file/instructions/injected content, not the Assistant. The Assistant is a global entity. The Thread is scoped to the user.

Thanks for the response!

Yes, I get that, but maintaining context throughout the conversation is where my greatest uncertainties are. With the conversation being truncated automatically, I’m assuming you would need to reference the same FileId with every user message to ensure the Assistant takes the content into consideration. Or, is there a better / more efficient way of doing this? From what I’ve read in the docs, doing so would incur a per gig per day cost for each file, but unclear if that’s just the file (and not the references) or for each reference of a file…

1 Like

Ah, I understand.

I’m assuming that you have the user’s information kept in a separate database. In that case I would use Function Calling to retrieve it. You can instruct the model to call for it anytime the information is lost. You could even use the Function Calling to append a new user message with the file again so it can perform retrieval.

The documentation doesn’t provide any information regarding file loss at truncation. I’m not sure either. I don’t think these files just “disappear”, but I also don’t know if the Assistant will refer to them if the message that it was attached to is removed.

Yes, it’s for each and every file, even if the file is a replica. I do believe that this cost hasn’t actually been implemented yet though.

There have been numerous people in your exact predicament, and I think I now understand the issue.

You want to upload a file that persists throughout the conversation and are given only 2 options: The Assistant level, or the Message level. Why we can’t upload to the Thread is beyond me.

Exactly, you got me @RonaldGRuckus

Functional Calling or RAG won’t work for this because I need the Assistant to consider the full context of the user data in its responses. It’s not a huge file, but enough to fill the context window quite a lot, sometimes fully. Attaching the file allows for 2,000,000 tokens max, but if the file is small enough, might just be included in the context window for that one call (as far as I’ve understood). This seems to be the only reasonable option for now. Hopefully with gpt-4.5 or gpt-5 (whichever is next) the context window will be much larger…

OpenAI’s documentation is still lacking quite a bit of detail, so I hope that gets updated soon!

You can submit the full user data as a response in RAG. You could also approach it on a more granular level by labelling the keys as enums.

You’re right about them sometimes just sending the whole file data instead of vectorizing it. I believe the reason why they don’t tell us the amount is because they’re constantly adjusting it

1 Like