Hi everyone,
I’m working on extracting long lists from multi-section documents (think large inventories or extensive feature lists). These items often appear in different sections; for example, primary entries might be listed in one area, while others are scattered throughout in additional endorsements or extensions.
I’m running into two main challenges:
- Completeness: Lists with over 150-200 items often come through incomplete despite prompt tuning.
- Consistency Across Runs: Using the same prompt repeatedly sometimes gives different results, which makes the extraction unreliable.
I’ve tried dividing sections and using structured prompts, but I’d love any advice on methods for stable, comprehensive extraction across varied document sections. Additionally, I’m using integrations with Azure Document Intelligence and embedding models for improved accuracy. Has anyone tackled similar use cases with success?