GPT-4o's Response Quality Drops for Larger Inputs in Chunked Processing

I’m working on a project where in I need to match two sets of structured data using Gpt-4o model. Since the data is large , I process them in chunks and send them sequentially to avoid Token rate limits.
The agent takes two input files.
The issue I’m facing:

  • The first few chunks give accurate results(Max-2 or 1), correctly classifying transactions into three categories (exact match, possible match, unmatched).-(Sometimes here also some rows are missed)
  • But as the processing continues, the quality drops—some matches are inconsistent, and the responses become less structured.

How I’m Handling Chunking & API Calls:

  1. Splitting data into chunks of ~50 records each.
  2. Processing each chunk separately by sending it to GPT-4o via ChatCompletion.acreate().
  3. Introducing a delay (2s) between API calls to avoid rate limits.
  4. Final pass: Any unmatched records from the first pass are processed again.

What I’ve Tried:

Adjusting chunk size (tried from 30 to 100 records).
Sent only the required columns needed for matcching to reduce input.
Lowering temperature to 0.2 for consistency.
Including a system prompt to enforce structured JSON output.
Ensuring consistent formatting across chunks.
Adding a retry mechanism for API failures.

Questions:

  1. Why might the model’s response degrade as more chunks are processed?
  2. Is there a better approach to ensure consistency across multiple API calls?
  3. Is there any other alternative where i can upload full data instead of chunking
  4. Do you know about any other method which is fast and efficient and also trustable as record data contains financial work

Any insights or best practices for handling LLM-based 2 files (excel) matching tasks with large structured datasets would be greatly appreciated!