I am trying to use assistants API with retrieval and it is quite slow. It takes anywhere between 3 seconds to 15 seconds to respond. Sometimes even more.
This question is for the Open API team, which model will give me faster results on average? gpt-3.5-turbo-1106 OR gpt-4-preview-1106?
Also, I am on Usage tier 3. Will upgrading to Usage tier 4 reduce latency?
GPT3.5 will always be faster since it’s a “lighter” model. GPT4 is pretty ressource-intensive on the servers, hence why it’s slower.
Upgrading tiers will likely not change the outcome.
Yeah, its performing better (more reliable) when you have a lot of input tokens. GPT-3.5 is crashing mostly on every request currently. Guess the model is too crowded.