Direct PDF file input now supported in the API

No need for pre-processing or storing PDF files — send files directly to the Responses and Chat Completions APIs. We extract both text and images from PDFs, and models with vision capabilities (o1, 4o, etc) can generate text with that context.

https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

https://x.com/OpenAIDevs/status/1902114937624830106

8 Likes

Great! Will this be coming to the Azure Deployments as well?

Hello @edwinarbus , that seems great but is it already live ? I get an error when I try to use it (with chat completion) following the doc in https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Error: 400 Invalid chat format. Content blocks are expected to be either text or image_url type.

It worked fine for me. Here’s an example python code:

from openai import OpenAI
import os

test_pdf = r"HOGRI GABAergic CaMKIIα+ Amygdala Output Attenuates Pain and Modulates Emotional-Motivational Behavior via Parabrachial Inhibition 2022.pdf"
MINI = "gpt-4o-mini-2024-07-18"

prompt = """Look at the methods section of the article. Does this study use optogenetics? If so, what are the wavelengths and stimulation parameters used for light stimulation? 
Note that different parameters might be used for different experiments. List the relevant experiments by i. title; ii. short summary of experimental procedure (1-3 sentences); iii. light stimulation parameters: wavelength, pulse duration, number of pulses, pulse frequency; iv. page in which this information appears."""

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

file = client.files.create(
    file=open(test_pdf, "rb"),
    purpose="user_data"
)

completion = client.chat.completions.create(
    model=MINI,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "file_id": file.id,
                    }
                },
                {
                    "type": "text",
                    "text": prompt,
                },
            ]
        }
    ]
)

print(completion.choices[0].message.content)
2 Likes

Hi!
Any estimation on when this will be available for azure deployments?
Just tested today and it worked perfectly with Openai’s API but not with Azure

thanks for this great feature @edwinarbus, I added support to my Open Source Discourse-based Chatbot:

1 Like

I’m also getting error 400. Using chat completion and the Base64 example shown here: https://platform.openai.com/docs/guides/pdf-files?api-mode=chat

Hope this will eventually get working as we really need this. Right now, we are extracting PDF text used for analysis - but would be much better if images can be used with text.