Scanned pdf with API and ask questions

Hello, I want to give the ChatGPT API multiple scanned PDF files and ask questions about them. From what I’ve seen, I was only able to send images to the API. Can anyone help?

1 Like

Hi there and welcome to the Community!

Your understanding is correct. You would have to supply the PDF pages as images to one of the models that support vision (i.e. gpt-4-turbo or the newer gpt-4o models) in order to achieve that.

import openai
import base64
from PIL import Image
from io import BytesIO
from openai import OpenAI

# client = OpenAI()

# Load image from file
def load_image_as_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Example usage
image_path = "./Photos/10_I.png"  # Replace with the path to your local image
encoded_image = f"data:image/jpeg;base64,{load_image_as_base64(image_path)}"

# Use this encoded image as part of your request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "tell me this fields:Sendername, sender taxcode, sender address, Date, Receiverbankcountry, Receiver client address, Receiverbankcode, Receiverbankname, Receiveraccount, Receivername, Currency, Amount, Details of payment, Invoice number.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": encoded_image},
                }
            ],
        }
    ],
    max_tokens=500,
)

print(response.choices[0].message.content)

This is the code I’m using now, but I want to provide PDF file input because there are multiple images in the PDF. I want to upload more than one PDF. Is it not possible? i have to send images one by one?

I’d recommend having a look at this OpenAI cookbook:

2 Likes