GPT-4 API for Educational Application

I’m trying to build an application that could take in PDFs and images uploaded by professors, store them in the backend, and use these materials as the relevant context in a chatbot-like student interface. In general, how can I use the GPT-4 API to accomplish this? The purpose of the application is to not only use the text within PDFs as context, but also the images, graphs, and visual representation.

1 Like

I can not help you with your general question by can give some advice for this part

Internally a PDF file is more like an archive file in that it is a collection of different resources organized into a hierarchy. The images, graphs, and visual representations can be stored as different resources.

The first major note about the internals of a PDF file is that most of the text and images created using LaTeX will be as PostScript source code. This makes it very hard to extract the non-text part such as graphs into a meaningful means for use other than as a display representation.

However if is not uncommon to find images that were created external to LaTeX to be included as originally created, often as image files and such.

While there are many free PDF to text applications and sites, to do what you desire will most often require commercial software, e.g. Adobe Acrobat or IDR Solutions BuildVU.

2 Likes

This is a very old topic and I am not sure if you are still active @mohendy25 but if it helps the rest of the community, I would just mention that another way of handling these documents in a multi-modal way (parsing not just text, but also tables, images and graphs), is to treat each PDF document page as an image, and pass it to Vision API, and effectively translate the non-text components to some standardized text format (often Markdown).

One general PDF parsing tool that I really like, that is open source, is Docling. It is completely self contained and uses two open source models for layout and table understanding, and outputs Markdown or JSON. It unfortunately doesn’t handle images and graphs, yet.

There are also managed document services like Unstructured.io, and both Azure and Google Cloud have their own Document Intelligence (or Document AI) services, if you want to go down that route.