Inconsistent Response Speed with GPT-4.0 Mini Completion API

I am currently using the Completion API with GPT-4.0 Mini and have noticed that the response speed is not consistent.

  • Sometimes the responses are very fast.
  • Other times, the responses are noticeably slow.

When I use GPT-4.0, the response speed is consistently stable, so this issue seems specific to GPT-4.0 Mini.

Could you please investigate why GPT-4.0 Mini has variable performance and suggest any potential optimizations or fixes?

Thank you for your assistance!

Environment Details:

  • API: OpenAI Completion API
  • Model: GPT-4.0 Mini

Could be because mini is more popular due to its lower price so its infrastructure is under greater load, so the queues are more variable (and therefore the response times)

But I’m all up for getting the verified facts …

You will have to hope the staff chime in on this one though.