Can you explain how to analyze a PDF file in GPT-4?

It depends on what kind of analysis you want to perform. There are a number of ways to analyze a PDF depending on the complexity of the data and your skills.

  1. The obvious way is to simply copy paste your text into the OpenAI prompt. This is inefficient and likely and doesn’t work for very long documents. However, it will allow you to quickly gauge whether GPT meets you needs.
  2. Programmatically convert your PDF into text using python, then call the OpenAI api. This approach is best if you have a set of tasks you want to automate and/or have a large volume of files. The analysis here should take care not to exceed the token limit for GPT. To summarize a super long document, you’d need to split it into chunks. For structured/semi-structured data such as invoices, you can use the method elaborated here in this medium post for instance.
  3. Finally, you can use a dedicated platform that specializes in unstructured/semistructured data to process your data such as nnext.ai. This is best for data that has a regular format such as invoices, purchase orders, shipping notes, price-lists etc. NNext will allow you to upload a bunch of documents, convert them into a tabular format and allow you to search & query them in natural language or SQL.
1 Like