GPT-4 API for Educational Application

mohendy25 · December 25, 2023, 9:44am

I’m trying to build an application that could take in PDFs and images uploaded by professors, store them in the backend, and use these materials as the relevant context in a chatbot-like student interface. In general, how can I use the GPT-4 API to accomplish this? The purpose of the application is to not only use the text within PDFs as context, but also the images, graphs, and visual representation.

EricGT · December 25, 2023, 10:18am

I can not help you with your general question by can give some advice for this part

Internally a PDF file is more like an archive file in that it is a collection of different resources organized into a hierarchy. The images, graphs, and visual representations can be stored as different resources.

The first major note about the internals of a PDF file is that most of the text and images created using LaTeX will be as PostScript source code. This makes it very hard to extract the non-text part such as graphs into a meaningful means for use other than as a display representation.

However if is not uncommon to find images that were created external to LaTeX to be included as originally created, often as image files and such.

While there are many free PDF to text applications and sites, to do what you desire will most often require commercial software, e.g. Adobe Acrobat or IDR Solutions BuildVU.

platypus · January 24, 2025, 12:51pm

This is a very old topic and I am not sure if you are still active @mohendy25 but if it helps the rest of the community, I would just mention that another way of handling these documents in a multi-modal way (parsing not just text, but also tables, images and graphs), is to treat each PDF document page as an image, and pass it to Vision API, and effectively translate the non-text components to some standardized text format (often Markdown).

One general PDF parsing tool that I really like, that is open source, is Docling. It is completely self contained and uses two open source models for layout and table understanding, and outputs Markdown or JSON. It unfortunately doesn’t handle images and graphs, yet.

There are also managed document services like Unstructured.io, and both Azure and Google Cloud have their own Document Intelligence (or Document AI) services, if you want to go down that route.

Topic		Replies	Views
Can you explain how to analyze a PDF file in GPT-4? API	9	72424	December 13, 2023
Can API cut images (such as mathematical figures) from the PDFs? API gpt-4 , api , pdf	7	298	December 3, 2024
Scanned pdf with API and ask questions API chatgpt , api	3	1626	October 15, 2024
How to use GPT 4 api to parse PDFs that contains graphical data API gpt-4	1	2636	May 14, 2023
What is the best way to parse a PDF file with ChatGPT? API	9	50058	November 16, 2024

GPT-4 API for Educational Application

Related topics