Night and day different in Assistant's API latency between gpt-3.5 versus gpt-4-turbo

I just switched my Assistant from gpt-4-turbo-preview to gpt-3.5-turbo-1106 and I have to say the difference in the amount of time it takes to get a response dropped by 60 - 70%. I ended up going with gpt-4-turbo because the documentation used that specific model but after seeing how slow it was, I almost gave up on the assistants api entirely.

Many people have been complaining on the forums here about the slowness of the Assistants API and it’s likely they too are using the GPT-4. It feels like the tutorials should consider switching to use gpt-3.5. It would lead to a much better engineering experience for the vast majority of people.

That’s a nice idea.
I’ll move this to the feedback category.

1 Like

Facing the same issue, gpt-4-turbo-preview is so much better working with JSON responses though the performance really sucks…gpt-3.5-turbo-1106 is faster but not giving results back in json even emphasizing in prompts, it works sometimes but not always, inconsistent behavior