Async Request Hangs with Base64 Image Payload, Delayed Response After 10 Minutes

Pre-requisites

Environment Information

│ List of packages in environment: 
│
│  Name      Version   Build         Channel
│  openai    1.58.1    pyhd8ed1ab_0  conda-forge
│  prefect   3.0.0     pypi_0        pypi

Model: gpt-4o-2024-11-20
Image shape: (1280, 720, 3)

Issue Description

I am running a Prefect pipeline where one of the tasks involves asking a question via an asynchronous client (AsyncOpenAI). The _create_client method is invoked at the start of the program, and I use the AsyncClient to handle requests. Although my setup is more complex, the core issue can be summarized as follows:

Problem:
When the request contains a large base64-encoded image, the program hangs or freezes. However, if I wait 10 minutes, it eventually returns the correct response. Notably, the issue does not occur in the following situations:

Observations & Workarounds
1. When running the same request in a Jupyter notebook.
2. When using the standard synchronous OpenAI client instead of the asynchronous one.
3. When the image is removed from the request.
4. When shortening the base64 string manually.
5. When using the gpt-4o-mini-2024-07-18 model instead of gpt-4o-2024-11-20.
6. When uncommenting _refresh_client(), which refreshes the client instance before making the request.

Investigations & Dismissed Theories
1. Event loop mismatch: Initially suspected, but dismissed since removing the image fixes the issue.
2. Incomplete child event loop handling: Also dismissed for the same reason as above.

Code Snippets

Task Code:

class AIStep:
    @task(task_run_name="{self.step_name}")
    async def task(self, question, **kwargs):
        content = []
        content.append({'type': 'image_url', 'image_url': {'url': f"data:image/png;base64,{base64_image}"}})
        content.append({"type": "text", "text": question})
        reply = await self.llm_provider._prompt([{'role': 'user', 'content': content}])

Prompter Code:

class Prompter:
    async def _prompt(self, input_text, *, tools=None, **kwargs):
        # self._refresh_client()
        response = await self.client.chat.completions.create(
            model=self.model, messages=input_text, temperature=0.0, tools=tools
        )
        if len(response.choices) > 0:
            if len(response.choices) > 1:
                self.logger.warning(f"Too many responses {response.choices}")
            return response
        else:
            raise Exception(f"Call to OpenAI didn't yield any choices: {response} for {input_text[0:200]}")

Client Creation & Refresh Code:

def _create_client(self):
    from openai import AsyncOpenAI
    return AsyncOpenAI(api_key=self.cfg["token"])

def _refresh_client(self):
    self.client = self._create_client()
    self.logger.info(f"Client refreshed for: {self}")

Questions
1. Why does the response take 10 minutes to arrive?
Is there any known issue related to large base64-encoded image payloads or timeouts in the OpenAI API?
2. Am I being charged if I cancel the request before the 10-minute response?
Does the API meter usage based on received requests or only on completed responses?

Any insight on this behavior or guidance on how to better handle such cases would be appreciated!

Hello @cfil ,

  1. Long Response Time: The delay could be due to OpenAI’s handling of large base64-encoded image payloads. These payloads may take longer to process, especially when the image size is large or exceeds certain limits. It’s also possible that the issue is related to server-side timeouts or throttling for large requests. You may want to try optimizing the image size or reducing the data being sent to see if that helps speed up the response.

  2. Request Cancellation and Charges: OpenAI typically charges based on the processing of requests, not just the completion of responses. If you cancel a request before it completes, you might still incur charges depending on how much data was processed before cancellation. It’s best to consult OpenAI’s billing documentation for specific details about cancellation and charges.

  3. Suggestions: For better performance, you could consider using a more efficient image encoding method or compressing the image before sending it in the request. Additionally, testing with smaller payloads might help pinpoint whether the size of the base64 string is the root cause.

  1. 10 minutes seems excessive, also we tested the image and it was below 1MB so that’s definitely not the reason. Also, it’s always 10 minutes exactly, feels like an OpenAI specific issue.
  2. Again, it’s not a performance issue from our side (since the image is small) and we’re using the example they gave in the Vision documentation.

The remote API has hung without closing or responding. The default SDK timeout is 600 seconds. Then SDK library automatically retries a few times. Then response is fulfilled with a retry.

I would lower the client timeout to just that required to get the longest response wait anticipated.

You can also turn off retries with client parameter so you have a better understanding of individual fails.

Then do image downsizing yourself the same way the API does. An image 1280x720 would not incur any server-side downsizing, as the shortest dimension is below the maximum 768. (it would then be split into 6 tiles). This network optimization may be more impactful on even larger images.

Then report the issue and the rate of occurrence.

Hmm, but the logs don’t show anything about retries, shouldn’t there be some debug messages about it?

The downsizing doesn’t make sense to me:

  1. the image in their oficial documentation is 2560 × 1669 (1.1MB) and it works instantly… My image is smaller (762 KB).
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])
  1. The tiling is 512x512 according to the pricing page, so I send 4 tiles.
  2. If the timeout is 3 minutes why is it taking 10 minutes without any debug messages that inform me it’s doing retries?

Retries

Certain errors will be automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors will all be retried by default.

You can use the maxRetries option to configure or disable this.

Timeouts

Requests time out after 10 minutes by default. You can configure this with a timeout option

Documentation answer questions.


Vision docs also explain the maximum biggest dimension and maximum smallest dimension resizing done by the server, before the input to the tiling mechanism (where you pay by detail:high tiles).

Since you’ve got the image on your side first for base64, you could try other optimizations - higher compression PNG, re-saving JPG with lower quality.

Then make sure you are using the matching mime type in theimage_url

You have to work around an API that is hanging inconsistently on the same input. Then you’d make a bug report, but there is no quality reporting mechanism that gets staff attention.