Accurately read PDF files?

joshuasy10 · April 10, 2023, 1:25pm

Someone has already made this ChatPDF.com - Chat with any PDF using the new ChatGPT API - #17 by joao.occhiucci called ChatPDF.com.

Need to parse your pdf into meaningful chunks (e.g. paragraphs), create embeddings of these chunks. Create an embedding of your prompt. Perform a semantic search and rank order the results (You should be keeping a track on what embedding corresponds to what paragraph on what page (useful for source checking)) and include the most similar paragraphs in the chat prompt until X tokens are reached. Voila!

This goes into the method more deeply: Question answering using embeddings-based search | OpenAI Cookbook

Topic		Replies	Views
Converting PDF to Markdown with OCR API	14	20601	March 9, 2025
What is the best way to parse a PDF file with ChatGPT? API	9	48515	November 16, 2024
Using large PDFs to make a ChatBot API chatgpt , api	21	6424	December 15, 2023
What are the limitations of GPT-4 in analyzing PDF text? Prompting gpt-4	6	30690	March 12, 2024
Creating a bot using 100+ PDFS as the knowledge base API	19	15104	August 15, 2024

Accurately read PDF files?

Related topics