I am building a django webapp where multiple users can interact with the assistants api (i.e. separate threads and assistants).
Right now for each request the procedure is roughly:
- Re-authenticate to get a client object (
openai = OpenAI()
) - create a run based on a stored thread, assistant and vectorstore and stream the result to the user.
I feel the application is a bit slow and I wonder if this is the right approach. Alternatively I could store the client object in memory and deal with re-authentication after timeouts using exception handling.
I guess everyone has to make this decision at some point but I have been unable to find suggestions, so I wonder if anyone else have input?