Hi everyone,
I’ve been experimenting with using the OpenAI API for applications where users paste in long documents (10–20 pages), and I’m trying to figure out the most efficient way to handle context without hitting token limits.
So far, I’ve tried:
-
Splitting the document into smaller chunks and using embeddings + vector search to retrieve relevant passages.
-
Summarizing sections before feeding them into the model.
-
Combining both approaches for hybrid retrieval + summarization.
The challenge is finding the right balance between retrieval accuracy and preserving enough context for the model to generate useful outputs.
Has anyone here developed a reliable workflow for this? Do you prefer embeddings, hierarchical summarization, or another method entirely?
Any insights, patterns, or even example code would be really helpful for those of us building document-heavy applications.
Thanks!