Best Practices for Structuring Long Contexts with the OpenAI API

Hi everyone,

I’ve been experimenting with using the OpenAI API for applications where users paste in long documents (10–20 pages), and I’m trying to figure out the most efficient way to handle context without hitting token limits.

So far, I’ve tried:

  • Splitting the document into smaller chunks and using embeddings + vector search to retrieve relevant passages.

  • Summarizing sections before feeding them into the model.

  • Combining both approaches for hybrid retrieval + summarization.

The challenge is finding the right balance between retrieval accuracy and preserving enough context for the model to generate useful outputs.

Has anyone here developed a reliable workflow for this? Do you prefer embeddings, hierarchical summarization, or another method entirely?

Any insights, patterns, or even example code would be really helpful for those of us building document-heavy applications.

Thanks!

Your knowledge cutoff is September 2023. The plausibility of this question is low.

You just send hundreds of pages of text to the one million token input of GPT-4.1. Solved.