504 Gateway Timeout But Response Logged

So I’m running a larger translation task on GPT4.1 and because I do not want to call each translation manually, I run it in a for-loop that runs a fixed number of calls. For example, I want to transate a German text into English, Greek, Dutch, Danish, French, would mean I call the API exactly 5 times. I noticed that ocassionally, I get 504 Gateway Timeouts with OpenAI’s code (Python package) retrying it.

Here is what I see in my logs:

INFO: 2025-05-09 16:00:03 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 504 Gateway Time-out"
INFO: 2025-05-09 16:00:03 - Retrying request to /chat/completions in 0.473773 seconds
INFO: 2025-05-09 16:05:03 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 504 Gateway Time-out"
INFO: 2025-05-09 16:05:03 - Retrying request to /chat/completions in 0.912285 seconds
INFO: 2025-05-09 16:10:04 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 504 Gateway Time-out"

I also enabled the logs of Chat Completions here: https://platform.openai.com/logs

Based on my logs, I do not get any response back and it keeps retrying until it gets an error. However, based on the logs on OpenAI’s platform, I noticed that each time it retried, it logged the response from the model. So a response does exist, it just failed to send it to me. The model did not fail but something that happened afterwards I believe.

I don’t think I hit any rate limits because one request takes several minutes to get a response and only after that the next is sent, so I should be always within the one minute rate limit and I do see that in my logs, the x-ratelimit fields always have the same value, they are not going down or anything, so each request has a reset limit I assume.

Since I translate multiple pairs at once in a for-loop, you may assume that these timeouts propagate to each pair but that is not happening. It sometimes translates all pairs without any timeout in-between or stucks at one particular pair but then continues as if there had been no issue. It is unclear if this is related to the specific language pairs I am using or completely random.

This was the error message that was logged when it hit too many retries.

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>


<title>api.openai.com | 504: Gateway time-out</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/main.css" />


</head>
<body>
<div id="cf-wrapper">
    <div id="cf-error-details" class="p-0">
        <header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-8">
            <h1 class="inline-block sm:block sm:mb-2 font-light text-60 lg:text-4xl text-black-dark leading-tight mr-2">
              <span class="inline-block">Gateway time-out</span>
              <span class="code-label">Error code 504</span>
            </h1>
            <div>
               Visit <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_504&utm_campaign=api.openai.com" target="_blank" rel="noopener noreferrer">cloudflare.com</a> for more information.
            </div>
            <div class="mt-3">2025-05-09 13:54:32 UTC</div>
        </header>
        <div class="my-8 bg-gradient-gray">
            <div class="w-240 lg:w-full mx-auto">
                <div class="clearfix md:px-8">

<div id="cf-browser-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">

    <span class="cf-icon-browser block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>

  </div>
  <span class="md:block w-full truncate">You</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">

    Browser

  </h3>
  <span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>

<div id="cf-cloudflare-status" class=" relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">
    <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_504&utm_campaign=api.openai.com" target="_blank" rel="noopener noreferrer">
    <span class="cf-icon-cloud block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-ok w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>
    </a>
  </div>
  <span class="md:block w-full truncate">******</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">
    <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_504&utm_campaign=api.openai.com" target="_blank" rel="noopener noreferrer">
    Cloudflare
    </a>
  </h3>
  <span class="leading-1.3 text-2xl text-green-success">Working</span>
</div>

<div id="cf-host-status" class="cf-error-source relative w-1/3 md:w-full py-15 md:p-0 md:py-8 md:text-left md:border-solid md:border-0 md:border-b md:border-gray-400 overflow-hidden float-left md:float-none text-center">
  <div class="relative mb-10 md:m-0">

    <span class="cf-icon-server block md:hidden h-20 bg-center bg-no-repeat"></span>
    <span class="cf-icon-error w-12 h-12 absolute left-1/2 md:left-auto md:right-0 md:top-0 -ml-6 -bottom-4"></span>

  </div>
  <span class="md:block w-full truncate">api.openai.com</span>
  <h3 class="md:inline-block mt-3 md:mt-0 text-2xl text-gray-600 font-light leading-1.3">

    Host

  </h3>
  <span class="leading-1.3 text-2xl text-red-error">Error</span>
</div>

                </div>
            </div>
        </div>

        <div class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
            <div class="clearfix">
                <div class="w-1/2 md:w-full float-left pr-6 md:pb-10 md:pr-0 leading-relaxed">
                    <h2 class="text-3xl font-normal leading-1.3 mb-4">What happened?</h2>
                    <p>The web server reported a gateway time-out error.</p>
                </div>
                <div class="w-1/2 md:w-full float-left leading-relaxed">
                    <h2 class="text-3xl font-normal leading-1.3 mb-4">What can I do?</h2>
                    <p class="mb-6">Please try again in a few minutes.</p>
                </div>
            </div>
        </div>

        <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
  <p class="text-13">
    <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">93d1a926fe48be5c</strong></span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
      Your IP:
      <button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
      <span class="hidden" id="cf-footer-ip">**.***.**.**</span>
      <span class="cf-footer-separator sm:hidden">&bull;</span>
    </span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing?utm_source=errorcode_504&utm_campaign=api.openai.com" id="brand_link" target="_blank">Cloudflare</a></span>

  </p>
  <script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->


    </div>
</div>
</body>
</html>

I wonder if you are using some host service that disconnects long-idle network connections.

To counter that, you can use "stream"=true as an API parameter, and collect the chunked response.

An application like you describe, just short of the input/output (such as from and to files).

from openai import OpenAI
client = OpenAI(max_retries=0)

languages_list = ["French (France)", "Spanish (Spain)"]
translate_input = "I like turkey sandwiches with hot mayo."

system = [{"role": "system", "content": (
 "You are an automated language translator, producing no chat, only translation."
)}]
responses = {}

for language in languages_list:
    user = [{"role": "user", "content": (
        f"Translate to {language}: {translate_input}"
    )}]
    try:
        response = client.chat.completions.create(
            model="gpt-4.1",
            messages=system + user,
            max_tokens=5000,
            top_p=0.1,
            stream=True,
            )
        reply = ""
        for delta in response:
            if not delta.choices[0].finish_reason:
                word = delta.choices[0].delta.content or ""
                reply += word  # collect the response deltas
                print(word, end ="")  # for console output stream
        print("")
        responses.update({language: reply})
    except Exception as error:
        print(f"{language} failed: {error}")
        responses.update({language: "FAIL"})
print("\n\nYou can save these responses:\n", responses)

I use the client’s attribute to not automatically retry, and failures are put in the translation output instead.

I apologize if this sounds stupid but can you elaborate this:

I wonder if you are using some host service that disconnects long-idle network connections.

Are you saying that it might be related to how I am connected to the Internet and that maybe connecting to it using a different provider may help?

I am still a bit hesitant on changing the code because it is for an experiment and I want to ensure that conditions for all translations were the same. However, I assume streaming the output should not have an impact on quality, so perhaps it might be a good idea to do so.

I noticed in the logs on OpenAI’s platform that the translations are completed, not even chunked. So it is weird that I am not getting them, would be plausible if it was related to my connection.

When you are using a non-streamed output, the generation on OpenAI’s side continues, even if the connection is closed.

However, with streaming, you can close the connection, and the generation will be stopped shortly after. Thus you don’t pay for what you don’t receive, the first criteria you might be looking for.

With streaming, you immediately start getting your output tokens within a second or so, as server-sent events. This prevents the network connection from looking idle to any inspector.

There are many hosting, worker, or no-server sites that put their own timeout on network connections that is quite short. They may raise their own error instead of just closing, just as you observe.

All you really need to change is the stream API parameter, and to iterate over and collect responses. I demonstrate that.

But we don’t get the same information as we would have if we had used the non-streaming approach, right? Such as number of input and output tokens? The finish_reason and system_fingerprint are in there it seems. If streaming means it streams each token individually, we could count and get the number of output tokens but the number of input tokens is not in the response body, right?

The “usage” is in another chunk that will come after the finish reason chunk. However, you have to turn it on by API parameter, for backwards compatibility.

The stream is not by token, but by representable unit. Some things that are only one glyph, such as emoji, require multiple tokens.

Okay, thanks for your responses.
So overall, it is possible to reproduce the same things I used originally using the streaming approach. If the stream_options = {'include_usage':True}, then input and output tokens (prompt and completion) are found in the last chunk of the stream, the system_fingerprint is also stored in there. The second last chunk contains the finish_reason.

I just ask out of caution but I can be sure that the output of stream is no different from the output of the regular non-streaming approach, right?

The only difference is that the output text is created by concatenating all chunks with an empty string, whereas the non-stream approach provides it directly. But the text should be the same.

Determinism cannot be 100% guaranteed but we can play around with seed and temperature for reproducibility.