Accurately read PDF files?

Someone has already made this ChatPDF.com - Chat with any PDF using the new ChatGPT API - #17 by joao.occhiucci called ChatPDF.com.

Need to parse your pdf into meaningful chunks (e.g. paragraphs), create embeddings of these chunks. Create an embedding of your prompt. Perform a semantic search and rank order the results (You should be keeping a track on what embedding corresponds to what paragraph on what page (useful for source checking)) and include the most similar paragraphs in the chat prompt until X tokens are reached. Voila!

This goes into the method more deeply: Question answering using embeddings-based search | OpenAI Cookbook

2 Likes