I’m looking to extract highly specific information from a three-page PDF document. Specifically, I want to retrieve six key data points: product name, manufacturer, contained substances, legal mentions, etc.
The document is divided into four distinct sections, each addressing a different topic, and the information I need is scattered across the entire text.
I’m trying to identify the most efficient strategy to obtain answers that are precise, coherent, concise, and especially free of hallucinations.
Here are the options I’m considering:
-
Send the full text in a single prompt to the LLM. Each section is about 1000 tokens, so the total fits within a 4000-token context window.
-
Embed the full document and then ask a separate question.
-
Embed each section separately, then query the model either once or up to six times (one per item of interest).
-
If the total data size exceeds 3000–4000 tokens, a retrieval-augmented generation (RAG) approach would be required. In that case, what chunking strategy would you recommend? And should I prompt with one general question or split it into six targeted ones?
I’d also really appreciate it if you could share the strengths and weaknesses of each approach, especially in terms of inference cost (compute, time) and risk of hallucination.