I am using Tier1 OpenAI gpt4o-mini via python threading (joblib) - first it works fine and fast. After few thousands of requests it become very slow, about 30 sec per query. queries are very simple. The code is open source and can be run. Just run experiment_2018.py
in :
The issue is not rate limit as I am trimming the context so the number of tokens do not grow over time. It must be an API issue.