Best practice to store PDF Tables into Database and Use Open AI to answer to questions

Tahseen_Firoz · January 14, 2026, 3:20am

Hi Guys,
I’m working on a workflow where I use OpenAI models to read PDF documents and generate answers for a predefined set of questions.

My current approach is:

Extract the PDF content
Split the content into multiple chunks
Store these chunks in a database
Retrieve relevant chunks as context and pass them to OpenAI for answer generation

This works reasonably well for plain text. However, I’m running into challenges when the PDF contains multiple tables (sometimes complex or spanning multiple pages).

My question is:

What is the best way to extract, represent, and store tables from PDFs as chunks so that LLMs can easily understand and reason over them during context retrieval?

jeffvpace · January 14, 2026, 3:36am

When you say tables, do you mean images of tables?

Tahseen_Firoz · January 14, 2026, 3:48am

Normal tables in PDF with multiple columns…

jeffvpace · January 14, 2026, 4:03am

Hmmm, not sure how to handle this because non-image tables in PDFs often look like tables but are fundamentally just graphical elements (text and lines) that mimic a table’s appearance, not a true data structure.

Topic		Replies	Views
Questions Answering based on the PDF Community chatgpt , pdf	3	123	January 9, 2026
Tabular data for finetuning a model API fine-tuning , pdf	1	1688	December 23, 2023
Are there any examples of representing tables (html or json) format as vectors? API	4	380	July 30, 2025
Best Approach to Extract Key Data from a Structured PDF with LLM Prompting gpt-4 , api	4	11598	April 11, 2025
Using large PDFs to make a ChatBot API chatgpt , api	21	6862	December 15, 2023

Best practice to store PDF Tables into Database and Use Open AI to answer to questions

Related topics