I am trying to build an application that multiple users (1000+) can access at the same time. The way I am approaching this is to create a single assistant that has its instructions. Then every time a user wants to use it and get info or chat with it, I create a thread for that particular session. This would mean, at any given time, there could be 1000 threads that a single assistant would be working with and more as users grow.
I have looked at the rate limit documentation, and while my organization will increase the tier as we grow, my main point of concern is would a single assistant be enough to handle this? Or should I create multiple assistant with some kind of load balancing logic of my own when calling the APIs?