API calls to davinci text 3 very slow and random speeds for identical prompts

Any answers about this? Getting up to a minute per request via the API. This is for around 500 to 1000 tokens total.

They say it’s resolved, but it’s clearly getting worse and worse.

1 Like

I ahve the same issue since this morning, it started a little on Thrusday then I added a delay of 4sec between each call but since this morning I always get the error : 500 {‘error’: {‘message’: ‘Internal server error’, ‘type’: ‘auth_subrequest_error’, ‘param’: None, ‘code’: ‘internal_error’}}

what can I do to fix this ?

2 Likes

Openai have just posted that it has been investigated and resolved in playground and the api

API Requests, to Davinci Text 3, from the UK: On 5th February 2023, the API was returning responses within seconds. Today, responses are closer to 1 minute. This is a disaster for a planned demo that will now need to be rescheduled. So, the issue is not resolved for me.

Same. It is not good for showing of prototypes. Everyone will be looking at their watches…

Just got some Testers for my MVP and the API slowed down to a point where it times out constantly. What an unlucky timing. Until yesterday it was working just fine through extensive testing.

It’s horrible. Getting random 429 errors and it takes a very long time to generate even the simplest answers (1 minute seems like the new normal). I’m on a paid account. I have these issues for over a week now; before it was much faster and I rarely got these weird 429 errors.

I use AWS east coast servers. When I was testing from the cmd line 3-4 weeks back, I used to see frequent 429s. But now I rarely do. My app users are in Asia & USA. So, we cover the entire 24h span.

Also, wrt latencies, I routinely see 3-10secs. Which I thought was terrible:)

I can confirm this is the same here in Germany. It got a bit better today, and the response time often get back to normal in the evening time (central European time); but the response time is much worse than what it used to be. (davinci-3, completion api)

The slow API is still an issue when trying from California.

Can you show your code, please? We might be able to help.

it’s not a matter of code,
the speed has improved drastically now, but the quality is down, we are seing a lot more “junk” or hallucinations returned now for the same prompts, I guess they are balancing between quality of output and performance

For me the latency has gotten worse. It takes about 13 seconds to get a response from davinci. What is the issue?

Any progress about that ? Same problems here (France)

i agree huge problem it’s so slow and problem it reply sometimes random replies that have no relation with the topic

I’m experiencing the same problems (Michigan, USA).

Edit: I found a solution to the problem, see below.

The same problem with a maximum time of 10 minutes and a random response time of 60s or more

OK. As the title of this topic is “davinci text 3” then I assume everyone is on topic and discussing davinci-text-003.

Just got a completion to work with 600 tokens in 4.6 secs, but it is up and down, up and down,…

I found a hackish solution in Python. Although this thread is davinci-specific in the title, the underlying issue is not model-specific and I think it’s on-topic to use the gpt-3.5 model here. I use the multiprocessing library to launch the ChatGPT request in a separate process. I time-limit the execution of that process and kill it if it takes too long, then send a second request.

I tested this using the following demo code. In my test, the time-limited version (10 tries of 10 seconds max each) has a total runtime of 28 seconds, while the time-extended version (1 try of 100 seconds max each) has a total runtime of 127 seconds, almost entirely due to a single 95 second delay on one of the attempts. By limiting and killing these occasional extremely long delays and resending, I think we can avoid the worst of the problem for now.

Here is my demo code. Feel free to insert into your own projects.

Edit: a user report says that a 15 second delay before retrying results in much better real world performance. Code updated to reflect that change, though it may not make a difference in this test code.

import multiprocessing, time, timeit, openai

openai_key = ""
openai.api_key = openai_key


def sendChatGPTRequest(content, bot_model, queue):
    response =  openai.ChatCompletion.create(
        model=bot_model,
        messages=[{"role": "user", "content": content}],
        max_tokens=1024,
        n=1,
        temperature=0.5,
    )
    queue.put(response["choices"][0]["message"]["content"])

def limitedWait(s, queue):
    start = timeit.default_timer()
    while timeit.default_timer() - start < s and queue.empty():
        continue
    return not queue.empty()

def getChatbotResponse(content, bot_model, max_tries, wait_time):
    start_request = timeit.default_timer()
    max_tries = 1
    for i in range(0, max_tries):
        queue = multiprocessing.Queue()
        p = multiprocessing.Process(target = sendChatGPTRequest, args=(content, bot_model, queue,))
        p.start()

        if limitedWait(wait_time, queue):
            return (queue.get(), timeit.default_timer() - start_request)
        else:
            print("Trying again...")
            p.terminate()
    return (None, max_tries*wait_time)

if __name__ == '__main__':
    total_time = 0
    for i in range(20):
        outcome = getChatbotResponse("Please say 'Hello, World!'", "gpt-3.5-turbo-0301", 10, 15)
        print(outcome[0], "\nTime to generate:", outcome[1])
        total_time += outcome[1]
    print("total time:", total_time)
    total_time = 0
    for i in range(20):
        outcome = getChatbotResponse("Please say 'Hello, World!'", "gpt-3.5-turbo-0301", 1, 150)
        print(outcome[0], "\nTime to generate:", outcome[1])
        total_time += outcome[1]
    print("total time:", total_time)
1 Like