Hey everyone,
I’m working on a research project that involves an individual opening the ChatGPT WebApp, uploading a PDF that contains some text, and then asking ChatGPT to summarize it. Let’s say everyone has access to 4o.
To convince everybody that doing this will result in getting all the relevant information in the summary, I am planning to run it on 1000 PDFs on my own and analyzing the results. Since I have over 1000 PDFs to run, I have set up a pipeline in python using OpenAI’s API. Now the OpenAI API does not allow for PDF uploads (yes there is the new Assistants beta version but they mention it is not good for summarization yet). So I can use some simple methods like using PyPDF2 to read and extract the text from the PDF and feed that in as part of the prompt. However, I want to be rigorous here and not make any assumptions. So I want to know the exact method used by OpenAI to parse the uploaded PDFs in ChatGPT so that I can write a python script to mimic that.
For instance, in Gemini’s API documentation (ai.google.dev/gemini-api/docs/document-processing?lang=python), they say that you can use ``
doc_data = base64.standard_b64encode(doc_file.read()).decode(“utf-8”)" to extract the data from the PDF and include that as part of the prompt. I want something concrete like this for ChatGPT as well.
Can someone please guide me?
Thanks!