No need for pre-processing or storing PDF files — send files directly to the Responses and Chat Completions APIs. We extract both text and images from PDFs, and models with vision capabilities (o1, 4o, etc) can generate text with that context.
It worked fine for me. Here’s an example python code:
from openai import OpenAI
import os
test_pdf = r"HOGRI GABAergic CaMKIIα+ Amygdala Output Attenuates Pain and Modulates Emotional-Motivational Behavior via Parabrachial Inhibition 2022.pdf"
MINI = "gpt-4o-mini-2024-07-18"
prompt = """Look at the methods section of the article. Does this study use optogenetics? If so, what are the wavelengths and stimulation parameters used for light stimulation?
Note that different parameters might be used for different experiments. List the relevant experiments by i. title; ii. short summary of experimental procedure (1-3 sentences); iii. light stimulation parameters: wavelength, pulse duration, number of pulses, pulse frequency; iv. page in which this information appears."""
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
file = client.files.create(
file=open(test_pdf, "rb"),
purpose="user_data"
)
completion = client.chat.completions.create(
model=MINI,
messages=[
{
"role": "user",
"content": [
{
"type": "file",
"file": {
"file_id": file.id,
}
},
{
"type": "text",
"text": prompt,
},
]
}
]
)
print(completion.choices[0].message.content)
Hi!
Any estimation on when this will be available for azure deployments?
Just tested today and it worked perfectly with Openai’s API but not with Azure
Hope this will eventually get working as we really need this. Right now, we are extracting PDF text used for analysis - but would be much better if images can be used with text.
@edwinarbus Feature request for PDF inputs. I’d like to specify the dpi of the converted images, to control input token usage. For example, my use case does not need 300 ppi and would work fine with 100 dpi.
have you solved this issue? I’m getting error 400 too, and I asked ChatGPT, it seems that " Because type: "file" is not supported in the Chat Completions API message schema."
This is a huge quality-of-life update! Direct PDF input opens up so many use cases for document analysis, contract review, or even building research assistants.
Curious how well the API handles multi-column or scanned PDFs — has anyone tried it on more complex formats yet?
Seeing the exact same behavior as described by @m.k.lilley
Attaching a PDF to a prompt via base64 encoding of the file data or a reference to an uploaded file_id can produce both successful responses or responses indicating the AI cannot find the file data. Both successful and unsuccessful responses can result from the exact same file and the exact same prompt. I’ve observed this is gpt-4.1 and gpt-4.1-mini.
I just swapped over to gpt-4o and sent two requests through. I’m currently using file upload and including file_id in my prompt. First request worked fine. Second request (using the same file as a new upload) returned “I cannot access or read PDF files directly. Please provide the text or details from the document for assistance.”
We were excited to use this feature, but are also having severe issues.
Our first few batches (using the batch API) of base64-encoded PDFs processed successfully, but hundreds of subsequent batches returend a response of ‘no content provided’, ‘unable to read PDF’, or some similar variant.
We have not hit the 100GB Org limit. Our full dataset is only ~34 GBs.
Also, A 10GB end-user limit, mentioned here, doesn’t seem to apply - we received no errors. And the ‘no-content’ responses started happening well before we uploaded 10GB of content.
This has costed us >700$ of faulty processing - we’d appreciate assistance with this ASAP.
user message context: No. input_file is only for PDF files, as described
file search tool: No. Flat tables of text do not perform well in semantic search on vector store document chunks and csv and Excel is disallowed.
code interpreter: Yes. The AI can write Python scripts to access and process data if it is interpretable by extraction library modules or can be loaded to dataframes or other structured data.