Strategies for Consistent and Complete Long List Extraction with GPT-4 2024-10-01-preview Model

hamraev · November 10, 2024, 1:06am

Hi everyone,

I’m working on extracting long lists from multi-section documents (think large inventories or extensive feature lists). These items often appear in different sections; for example, primary entries might be listed in one area, while others are scattered throughout in additional endorsements or extensions.

I’m running into two main challenges:

Completeness: Lists with over 150-200 items often come through incomplete despite prompt tuning.
Consistency Across Runs: Using the same prompt repeatedly sometimes gives different results, which makes the extraction unreliable.

I’ve tried dividing sections and using structured prompts, but I’d love any advice on methods for stable, comprehensive extraction across varied document sections. Additionally, I’m using integrations with Azure Document Intelligence and embedding models for improved accuracy. Has anyone tackled similar use cases with success?

Topic		Replies	Views
Help getting LLMs to list things EXHAUSTIVELY? GPT builders	2	750	May 3, 2024
Creating long lists of things Prompting	9	2013	November 17, 2023
Trouble extracting all information from long context document API gpt-4	6	1547	October 29, 2024
Increasing verbosity towards end of structured response Prompting	3	645	June 22, 2021
Large input summarized text Prompting	3	2229	December 17, 2023

Strategies for Consistent and Complete Long List Extraction with GPT-4 2024-10-01-preview Model

Related topics