Using gpt-4 API to Semantically Chunk Documents

jr.2509 · April 12, 2024, 9:11am

There’s a very very nascent idea that I have been toying with in my mind over the past few days. What if we could just get the model to return the boundaries of the semantic chunk, i.e. the first few and last few words that would make the chunk uniquely identifiable.

With that information you could then likely just apply a regular script to extract the actual text of the chunks. If that was possible, then a single or reduce number of API calls might be enough and thus would save time and costs.

Topic		Replies	Views
RAG is not really a solution Community api , rag	113	36998	April 25, 2026
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4752	January 26, 2024
Building first RAG system API	17	3658	July 6, 2025
Preparing data for embedding API	32	15975	May 30, 2023
The length of the embedding contents API	47	36204	November 2, 2023

Using gpt-4 API to Semantically Chunk Documents

Related topics