Direct PDF file input now supported in the API

No need for pre-processing or storing PDF files — send files directly to the Responses and Chat Completions APIs. We extract both text and images from PDFs, and models with vision capabilities (o1, 4o, etc) can generate text with that context.

https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

https://x.com/OpenAIDevs/status/1902114937624830106

10 Likes

Great! Will this be coming to the Azure Deployments as well?

1 Like

Hello @edwinarbus , that seems great but is it already live ? I get an error when I try to use it (with chat completion) following the doc in https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Error: 400 Invalid chat format. Content blocks are expected to be either text or image_url type.

It worked fine for me. Here’s an example python code:

from openai import OpenAI
import os

test_pdf = r"HOGRI GABAergic CaMKIIα+ Amygdala Output Attenuates Pain and Modulates Emotional-Motivational Behavior via Parabrachial Inhibition 2022.pdf"
MINI = "gpt-4o-mini-2024-07-18"

prompt = """Look at the methods section of the article. Does this study use optogenetics? If so, what are the wavelengths and stimulation parameters used for light stimulation? 
Note that different parameters might be used for different experiments. List the relevant experiments by i. title; ii. short summary of experimental procedure (1-3 sentences); iii. light stimulation parameters: wavelength, pulse duration, number of pulses, pulse frequency; iv. page in which this information appears."""

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

file = client.files.create(
    file=open(test_pdf, "rb"),
    purpose="user_data"
)

completion = client.chat.completions.create(
    model=MINI,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "file_id": file.id,
                    }
                },
                {
                    "type": "text",
                    "text": prompt,
                },
            ]
        }
    ]
)

print(completion.choices[0].message.content)
2 Likes

Hi!
Any estimation on when this will be available for azure deployments?
Just tested today and it worked perfectly with Openai’s API but not with Azure

thanks for this great feature @edwinarbus, I added support to my Open Source Discourse-based Chatbot:

1 Like

I’m also getting error 400. Using chat completion and the Base64 example shown here: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Hope this will eventually get working as we really need this. Right now, we are extracting PDF text used for analysis - but would be much better if images can be used with text.

@edwinarbus Feature request for PDF inputs. I’d like to specify the dpi of the converted images, to control input token usage. For example, my use case does not need 300 ppi and would work fine with 100 dpi.

1 Like

I don’t think you have to worry about that.

Imagine a page at 1000 dpi as vision input:

Pretty much the token billing you receive 765 → 873 (maybe a bonus 100 tokens of hidden instructions):

Today’s challenge: actually try to get billed more for a page of image-only PDF than its downsize formula for vision “tiles”.

have you solved this issue? I’m getting error 400 too, and I asked ChatGPT, it seems that " Because type: "file" is not supported in the Chat Completions API message schema."

This is a huge quality-of-life update! Direct PDF input opens up so many use cases for document analysis, contract review, or even building research assistants.

Curious how well the API handles multi-column or scanned PDFs — has anyone tried it on more complex formats yet?

Something appears to have broken with this.

gpt-4o no longer seems to be able to “see” a PDF that’s attached to a conversation. This was not a problem until a week or so ago.

Here is a short notebook where you can reproduce the behaviour.

4 Likes

Seeing the exact same behavior as described by @m.k.lilley

Attaching a PDF to a prompt via base64 encoding of the file data or a reference to an uploaded file_id can produce both successful responses or responses indicating the AI cannot find the file data. Both successful and unsuccessful responses can result from the exact same file and the exact same prompt. I’ve observed this is gpt-4.1 and gpt-4.1-mini.

2 Likes

I’m not having this problem with gpt-4o and Chat Completions FYI. I just ran a test.

But I’m not using a file id, i’m passing the base64 data inline as part of the call.

Like here:

https://platform.openai.com/docs/guides/pdf-files#base64-encoded-files

I just swapped over to gpt-4o and sent two requests through. I’m currently using file upload and including file_id in my prompt. First request worked fine. Second request (using the same file as a new upload) returned “I cannot access or read PDF files directly. Please provide the text or details from the document for assistance.”

2 Likes

We were excited to use this feature, but are also having severe issues.

Our first few batches (using the batch API) of base64-encoded PDFs processed successfully, but hundreds of subsequent batches returend a response of ‘no content provided’, ‘unable to read PDF’, or some similar variant.

We have not hit the 100GB Org limit. Our full dataset is only ~34 GBs.

Also, A 10GB end-user limit, mentioned here, doesn’t seem to apply - we received no errors. And the ‘no-content’ responses started happening well before we uploaded 10GB of content.

This has costed us >700$ of faulty processing - we’d appreciate assistance with this ASAP.

does Responses API support input files for .xlsx or .csv?

  • user message context: No. input_file is only for PDF files, as described
  • file search tool: No. Flat tables of text do not perform well in semantic search on vector store document chunks and csv and Excel is disallowed.
  • code interpreter: Yes. The AI can write Python scripts to access and process data if it is interpretable by extraction library modules or can be loaded to dataframes or other structured data.
1 Like