Super Slow Completion Times via API After Resolving Credit Quota Limits

I recently ran into quota limits because my OpenAI account ran out of credits. After adding funds to my account, it took another 30 minutes of encountering errors before requests started working again. First – the reasonable amount of latency here is like 2-5 mins to me. Others experience what I did?

However, what’s worse is that I’m now experiencing extremely slow completion times when using the API. (like right now at 10:46am GMT Jan 19 2025) For example, when I run the same request via the web interface (logged into my account), it completes in less than 30 seconds. But when I use the API, it takes like 5 minutes or more.

This slow performance may have started even before I reached $0 in credits, but it has persisted since funding my account again.

Has anyone else experienced this issue? Are there specific steps I can take to resolve the slow API response times?

My request looks like this and contains ~28231 tokens:
{
“messages”: [
{
“role”: “system”,
“content”: "\n You extract media titles from text, such as songs, books, movies, and TV shows… [omitting a few lines] "
},
{
“role”: “user”,
“content”: "Extract entities from …[omitting a lot of lines] ",
}
],
“functions”: [
{
“name”: “extract_entities”,
“parameters”: {
“type”: “object”,
“properties”: {
“media_title”: {
“type”: “array”,
“description”: “Complete titles of movies, books, TV shows etc with confidence scores”,
“items”: {
“type”: “object”,
“properties”: {
“name”: {
“type”: “string”
},
“confidence”: {
“type”: “number”,
“minimum”: 0,
“maximum”: 1
}
},
“required”: [
“name”,
“confidence”
]
}
}
},
“required”: [
“media_title”
]
}
}
],
“function_call”: {
“name”: “extract_entities”
},
“temperature”: 0.1,
“model”: “gpt-4o”
}

The issue was due to the model getting stuck in an infinite loop of a pattern within the json structure. Results in that quadratic runtime till it blindly hits the output token limit.

The proximal cause was me starting to set a low temperature – of 0.2 for this particular task. I don’t exactly understand how that leads to repetition, but it seems that I have two options to fix:

  • Slightly higher temperature – will still work for me since the output doesn’t need to be 100% sane, just good enough.
  • Pass in “frequency_penalty” of something like 0.2-0.4.
1 Like

have you tried creating a new key?