I am running a loop of 240K queries to gpt-4o-mini mini.
In the beginning it is 1s per iteration
Gradually it becomes about 20 seconds per iteration.
I don’t hit rate limit error.
It’s just slow.
why?
Are you using async client?
O1 series models use test time compute which means that your API requests can stay open for a relatively longer time than vanilla chat completion models.
What sort of deployment environment are you running the requests on?
240,000 simultaneous connections (or even batched ones) will require a massive number of open sockets.
sorry, I meant gpt-4o-mini - I am running on a mac with python. move to joblib Parallel to simulate some traces parallely. first it runs super fast hitting rate limit (good) but then after a while becomes super slow again. I am monitoring messages and tokens number. they have not changed. it must be something to do with an issue with the api.
Can you share the current code you’re running?
thanks!
Note these are not 240,000 simultaneous connection. Just n~10 threads each serial
can anyone help? no support from open AI. @sps