Inconsistent Responses with PDF File Upload in OpenAI Chat Completion API

Hello,

I’m developing an application that analyzes PDF files using OpenAI’s Chat Completion API. I’m experiencing inconsistent results that are causing difficulties and would appreciate any help.

Problem:

  • When uploading the same PDF file and requesting analysis with the same code, sometimes the analysis works properly, while other times I receive responses like “Please upload a PDF” or “Please attach a file.”
  • This issue shows an unpredictable pattern where out of 10 requests with the same PDF, some succeed while others fail.

Details:

  • API Used: chat.completions.create endpoint (GPT-4o model)
  • Code Example:
Copy# Step 1: Upload PDF file
with open(pdf_path, "rb") as f:
    file_response = client.files.create(file=f, purpose="user_data")

file_id = file_response.id

# Step 2: Ask question with Chat Completion
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "file_id": file_id,
                    },
                },
                {
                    "type": "text",
                    "text": "Please analyze the content of this PDF.",
                },
            ],
        }
    ],
)

Solutions Attempted:

  1. Added delay time (sleep()) after file upload to allow for processing
  2. Implemented retry logic
  3. Used different API keys
  4. Created new OpenAI projects

However, while these methods may provide temporary solutions, the issue continues to occur intermittently.

Questions:

  1. What is the root cause of this inconsistency?
  2. Is this a known issue that OpenAI is aware of?
  3. Are there better ways to solve or mitigate this problem?
  4. Is there more detailed technical documentation on the PDF analysis process?

Any help or insights would be greatly appreciated. Thank you.

4 Likes

I’m seeing the exact same issue. Had a prompt that included the file in base64 format as part of the prompt and the issue cropped up this week where roughly half of the responses from OpenAI indicated a missing file. I tried changing the prompt to upload the file and reference the file id (as you’re doing) and still see the same issue. I’m logging all my requests and can confirm the exact same prompts using the exact same PDF can produce both success and failure cases from the AI.

We’re also seeing this problem. Our first few batches of base64-encoded PDFs processed successfully, but hundreds of subsequent batches failed with ‘no content provided’.

We have not hit the 100GB Org limit. Our full dataset is only ~34 GBs.

Also, A 10GB end-user limit, mentioned here, doesn’t seem to apply - we received no errors. And the ‘no-content’ responses started happening well before we uploaded 10GB of content.

Getting the same problem as you have described.

( python )

completion = client.beta.chat.completions.parse(
            model="gpt-4.1",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": "Here is the PDF to extract [redacted] from:"},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "file",
                            "file": {
                                "filename": "file.pdf",
                                "file_data": f"data:application/pdf;base64,{base64_string}"
                            },
                        },
                    ]
                }
            ],
            response_format=[redacted] # pydantic structured format
        )

This code has strange non-deterministic behavior… usually it works fine but sometimes the model doesn’t seem to see the PDF at all… it just returns some structured format containing “No PDF provided” or something like that. When switching to a different PDF being uploaded, this issue seems to happen more often, compared to just uploading the same PDF 10 times. Maybe it has something to do with query caching? I really don’t think it is my code.

2 Likes