I’ve been enjoying the much better uptime and speed of the models these days to serve my users. However, starting this week, GPT-4o is starting to remind me of the old days when the APIs were slow and frequently throwing errors. I’ve found the response time in GPT-4o to vary widely causing lots of request timeouts (Heroku only allows a request to go on for 30 seconds). Sometimes the response is instantaneous, sometimes it hangs for a long ass time. I have high rate limits on my organization’s endpoints. Anyone know what’s going on and why this seems to have changed recently?
@jxl38 you may want to check out APIpie.ai
They by default route to the fastest provider to try avoid this kind of issue.
I would really not suggest throwing more variables into your environment by using a routing API that may or may not use a suitable LLM for the task given. This technology is simply just not ready yet. It will become frustrating to debug and manage.
First suggestion is to stream the response. The size of the response can cause an apparent delay of >30 seconds as the tokens are gathered and waited for completion.
If you stream though the connection (should) start near immediately, or, as soon as the first token is available.
Second suggestion is to try running the same script locally. For whatever reason I have found that using Google Cloud Run the time for a response is insanely longer than using a different provider or running locally.