Comparing PDF Files to get Changes

minaabdelmassih57 · November 30, 2023, 9:35am

Hello,

I know that it is possible to upload files to GPT-4 paid version and tell it to compare them in a prompt, I was wondering if I can do the same through Python API, as far as I know files can be uploaded to assistants and can be set to knowledge retrieval but I just want to compare the two files even if they had images in them (given they are PDF files) and everything and for it to tell me the differences.

Is this available or doable in any way?
Thanks in advance!

_j · November 30, 2023, 9:56am

Sure, if OpenAI can program it, you can program it, and you can do it a bit more efficiently and task-oriented.

For searchable PDFs that have text embedded, there’s a couple different python libraries that can extract text from PDFs.

For those PDFs that are primarily images, you’ll need to do some additional OCR on the pages, also possible to do with Python (a forum search for “python ocr” might be a good start).

Doing a per-page extraction, you can easily identify the magnitude of differences in code, even if just by length of text. You can then just send and inquire about those with differences (although insertions may change all following pages).

Then you just need to select the most performative and affordable model with the context length required to have the tokens of both texts loaded at the same time.

minaabdelmassih57 · November 30, 2023, 9:59am

Hello,

Thanks for your reply!

What you mentioned is what I had in mind at first indeed, but I was wondering if there was any direct way to compare the 2 PDF files directly just like the paid version but through the API.

I take it that there is no direct way to do this through the Python API, right?

Thanks so much for your help!

_j · November 30, 2023, 10:08am

The API is language-agnostic. Python is just one of many languages you can use to interact with the RESTful API for OpenAI models.

“Assistants” on the API has ‘code interpreter’, where you can upload files, and then have the AI use its own python writing skills to perform tasks. It may be able to perform some of the PDF parsing for you with its own python code and then get the returned descriptions to answer the same way.

If the AI can code it, the AI can also give a coding solution to you, and if the AI can barely code it, you can work on the project with the AI until it is immutable code that works 100% on your side.

“retrieval” and files upload will also allow PDFs and does some PDF to text. The resulting “files” then can be attached to messages. However, the operation of this is opaque and you would have to try it yourself to see how well the AI can answer.

Shwapx · May 22, 2024, 2:44pm

Did you manage to find good solution to do that?

maurice3 · June 2, 2024, 4:08pm

Not a working solution (yet), but I tried asking my openai assistant to calculate the checksum of an uploaded file so that I can compare it with a local calculation. Unfortunately the assistant (model=“gpt-3.5-turbo”) I used either hit a rate limit error 100% of the time, or the assistant did not know how to do the calculation. I finally just decided to put in an enhancement request to include the checksum when getting a list of uploaded files.

Topic		Replies	Views
Compare PDF files and shows changes API gpt-4 , chatgpt , api , assistants-api	6	2366	May 23, 2024
Can you explain how to analyze a PDF file in GPT-4? API	9	72322	December 13, 2023
What is the best way to parse a PDF file with ChatGPT? API	9	49519	November 16, 2024
How can I upload pdf files in chatgpt and ask for a summary of it? API chatgpt , api	6	34839	December 23, 2023
How can I make the assistant 'read' scanned documents that are in PDF format? API assistants-api , file-uploads	3	206	June 2, 2025

Comparing PDF Files to get Changes

Related topics