There is no “gpt-3” model for you to use.
You seem to be needlessly using a response format parameter, and are passing the library a BaseModel. This necessitates setting up a strict response format on the API, taking several seconds. I would omit this, and you will also get higher-quality responses.
You can rotate through specific models gpt-4o-2024-11-20, gpt-4o-2024-08-06, gpt-4o-2024-05-13, and see if one provides faster responses at a particular time.
Sending no temperature parameter can be faster. max_tokens is not necessary; you set it higher than the responses typical of the AI anyway.
You can eliminate the use of the OpenAI SDK entirely to cut down the loading on someone else’s platform. Just make RESTful requests with a preinstalled library such as `requests`` to the API.