No need for pre-processing or storing PDF files — send files directly to the Responses and Chat Completions APIs. We extract both text and images from PDFs, and models with vision capabilities (o1, 4o, etc) can generate text with that context.
It worked fine for me. Here’s an example python code:
from openai import OpenAI
import os
test_pdf = r"HOGRI GABAergic CaMKIIα+ Amygdala Output Attenuates Pain and Modulates Emotional-Motivational Behavior via Parabrachial Inhibition 2022.pdf"
MINI = "gpt-4o-mini-2024-07-18"
prompt = """Look at the methods section of the article. Does this study use optogenetics? If so, what are the wavelengths and stimulation parameters used for light stimulation?
Note that different parameters might be used for different experiments. List the relevant experiments by i. title; ii. short summary of experimental procedure (1-3 sentences); iii. light stimulation parameters: wavelength, pulse duration, number of pulses, pulse frequency; iv. page in which this information appears."""
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
file = client.files.create(
file=open(test_pdf, "rb"),
purpose="user_data"
)
completion = client.chat.completions.create(
model=MINI,
messages=[
{
"role": "user",
"content": [
{
"type": "file",
"file": {
"file_id": file.id,
}
},
{
"type": "text",
"text": prompt,
},
]
}
]
)
print(completion.choices[0].message.content)
Hi!
Any estimation on when this will be available for azure deployments?
Just tested today and it worked perfectly with Openai’s API but not with Azure
Hope this will eventually get working as we really need this. Right now, we are extracting PDF text used for analysis - but would be much better if images can be used with text.