Direct PDF file input now supported in the API

edwinarbus · March 19, 2025, 1:17am

No need for pre-processing or storing PDF files — send files directly to the Responses and Chat Completions APIs. We extract both text and images from PDFs, and models with vision capabilities (o1, 4o, etc) can generate text with that context.

https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

https://x.com/OpenAIDevs/status/1902114937624830106

o11 · March 23, 2025, 1:50pm

Great! Will this be coming to the Azure Deployments as well?

nicolas.renalto · March 24, 2025, 5:29pm

Hello @edwinarbus , that seems great but is it already live ? I get an error when I try to use it (with chat completion) following the doc in https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Error: 400 Invalid chat format. Content blocks are expected to be either text or image_url type.

roni80 · March 27, 2025, 12:09pm

It worked fine for me. Here’s an example python code:

from openai import OpenAI
import os

test_pdf = r"HOGRI GABAergic CaMKIIα+ Amygdala Output Attenuates Pain and Modulates Emotional-Motivational Behavior via Parabrachial Inhibition 2022.pdf"
MINI = "gpt-4o-mini-2024-07-18"

prompt = """Look at the methods section of the article. Does this study use optogenetics? If so, what are the wavelengths and stimulation parameters used for light stimulation? 
Note that different parameters might be used for different experiments. List the relevant experiments by i. title; ii. short summary of experimental procedure (1-3 sentences); iii. light stimulation parameters: wavelength, pulse duration, number of pulses, pulse frequency; iv. page in which this information appears."""

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

file = client.files.create(
    file=open(test_pdf, "rb"),
    purpose="user_data"
)

completion = client.chat.completions.create(
    model=MINI,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "file_id": file.id,
                    }
                },
                {
                    "type": "text",
                    "text": prompt,
                },
            ]
        }
    ]
)

print(completion.choices[0].message.content)

c.iniesta · April 1, 2025, 2:16pm

Hi!
Any estimation on when this will be available for azure deployments?
Just tested today and it worked perfectly with Openai’s API but not with Azure

merefield · April 2, 2025, 2:00pm

thanks for this great feature @edwinarbus, I added support to my Open Source Discourse-based Chatbot:

jeffvpace · April 7, 2025, 3:17pm

I’m also getting error 400. Using chat completion and the Base64 example shown here: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Hope this will eventually get working as we really need this. Right now, we are extracting PDF text used for analysis - but would be much better if images can be used with text.

bigchew · May 4, 2025, 1:31pm

@edwinarbus Feature request for PDF inputs. I’d like to specify the dpi of the converted images, to control input token usage. For example, my use case does not need 300 ppi and would work fine with 100 dpi.

_j · May 5, 2025, 1:33am

I don’t think you have to worry about that.

Imagine a page at 1000 dpi as vision input:

Pretty much the token billing you receive 765 → 873 (maybe a bonus 100 tokens of hidden instructions):

Today’s challenge: actually try to get billed more for a page of image-only PDF than its downsize formula for vision “tiles”.

870027163 · May 13, 2025, 9:40am

have you solved this issue? I’m getting error 400 too, and I asked ChatGPT, it seems that " Because type: "file" is not supported in the Chat Completions API message schema."

Fenil_Kasundra · May 13, 2025, 1:05pm

This is a huge quality-of-life update! Direct PDF input opens up so many use cases for document analysis, contract review, or even building research assistants.

Curious how well the API handles multi-column or scanned PDFs — has anyone tried it on more complex formats yet?

m.k.lilley · May 25, 2025, 11:31am

Something appears to have broken with this.

gpt-4o no longer seems to be able to “see” a PDF that’s attached to a conversation. This was not a problem until a week or so ago.

Here is a short notebook where you can reproduce the behaviour.

jrstrope · May 30, 2025, 2:57pm

Seeing the exact same behavior as described by @m.k.lilley

Attaching a PDF to a prompt via base64 encoding of the file data or a reference to an uploaded file_id can produce both successful responses or responses indicating the AI cannot find the file data. Both successful and unsuccessful responses can result from the exact same file and the exact same prompt. I’ve observed this is gpt-4.1 and gpt-4.1-mini.

merefield · May 30, 2025, 4:36pm

I’m not having this problem with gpt-4o and Chat Completions FYI. I just ran a test.

But I’m not using a file id, i’m passing the base64 data inline as part of the call.

Like here:

https://platform.openai.com/docs/guides/pdf-files#base64-encoded-files

jrstrope · May 30, 2025, 4:51pm

I just swapped over to gpt-4o and sent two requests through. I’m currently using file upload and including file_id in my prompt. First request worked fine. Second request (using the same file as a new upload) returned “I cannot access or read PDF files directly. Please provide the text or details from the document for assistance.”

dylanIA · May 30, 2025, 7:01pm

We were excited to use this feature, but are also having severe issues.

Our first few batches (using the batch API) of base64-encoded PDFs processed successfully, but hundreds of subsequent batches returend a response of ‘no content provided’, ‘unable to read PDF’, or some similar variant.

We have not hit the 100GB Org limit. Our full dataset is only ~34 GBs.

Also, A 10GB end-user limit, mentioned here, doesn’t seem to apply - we received no errors. And the ‘no-content’ responses started happening well before we uploaded 10GB of content.

This has costed us >700$ of faulty processing - we’d appreciate assistance with this ASAP.

bq · June 3, 2025, 10:29pm

does Responses API support input files for .xlsx or .csv?

_j · June 3, 2025, 10:41pm

user message context: No. input_file is only for PDF files, as described
file search tool: No. Flat tables of text do not perform well in semantic search on vector store document chunks and csv and Excel is disallowed.
code interpreter: Yes. The AI can write Python scripts to access and process data if it is interpretable by extraction library modules or can be loaded to dataframes or other structured data.

Topic		Replies	Views
Is attaching a file to a prompt possible through API as it is in the UI? API	12	12205	March 18, 2025
Files API issue when sending pdf files to extract out info Bugs	2	65	June 4, 2025
Is uploading PDF file to o1 supported? API	3	274	February 25, 2025
Chat-completions API: multi-file input, only uses last file? Bugs	4	210	April 16, 2025
Chat Completions vs Responses and pdf file (new PDF file vision upload modality added to CC) API chat-completion , playground , gpt-4-vision	5	1635	June 6, 2025

Direct PDF file input now supported in the API

Related topics