Using gpt-4 API to Semantically Chunk Documents

SomebodySysop · May 5, 2024, 5:47pm

I agree, but I and @jr.2509 discovered, the hard way, that the models can be pretty unreliable at accurately recreating the first and last sentences – not to mention the regex issues when there are errant spaces, linefeeds, tabs, etc… Using the line numbers has proven to be the most accurate and consistent way to identify blocks of text – at least so far.

Topic		Replies	Views
Document Sections: Better rendering of chunks for long documents Prompting vector-db , semantic-search	66	31465	April 1, 2025
The length of the embedding contents API	48	33860	December 13, 2023
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4454	January 26, 2024
⬛ Splitting / Chunking Large input text for Summarisation (greater than 4096 tokens....) API	24	45094	December 12, 2023
Poor quality response on trained LLM with pdf files Community gpt-4	29	6096	May 1, 2024

Using gpt-4 API to Semantically Chunk Documents

Related topics