There’s always the option of converting your project to Completions. My bot uses it’s own chain of thought loop and uses only Completions. Answers using smaller models even involving functions are almost always sub 5 seconds or less. And humans probably don’t need a response in less than one second
Same issue. Last week was not doing file search consistently and I forced file search in the API call which seemed to fix it. But now responses are so slow, it’s just not usable.
Assistants API is junk. It’s been documented here by myself and many others. Use Chat Completions, ideally in a wrapper that allows you to switch to another provider when needed.
Just tested the API; normal responses experienced increased delays, but the tool calls had a huge delay.
No matter how they change the names of the models, they are all the same—experiencing the same problems again and again.
They will fail, no matter how big the model is, because it is just predictive and not truly intelligent. Increasing a model’s size typically increases computational costs because larger vectors used during inference require more resources, which often leads to improved prediction accuracy (to make people believe they have reached AGI, though the bubble will burst in the coming months).
This format, where they control the model from the backend, will never work and is not reliable.