Which model is faster gpt-3.5-turbo-1106 OR gpt-4-preview-1106?

I am trying to use assistants API with retrieval and it is quite slow. It takes anywhere between 3 seconds to 15 seconds to respond. Sometimes even more.

This question is for the Open API team, which model will give me faster results on average? gpt-3.5-turbo-1106 OR gpt-4-preview-1106?

Also, I am on Usage tier 3. Will upgrading to Usage tier 4 reduce latency?

Thank you for your time


GPT3.5 will always be faster since it’s a “lighter” model. GPT4 is pretty ressource-intensive on the servers, hence why it’s slower.
Upgrading tiers will likely not change the outcome.

Actually, recently I had the feeling that GPT4 is performing quite better. I thought, maybe that’s because fewer people are using it yet?

Yeah, its performing better (more reliable) when you have a lot of input tokens. GPT-3.5 is crashing mostly on every request currently. Guess the model is too crowded.