Using gpt-4 API to Semantically Chunk Documents

SomebodySysop · June 6, 2024, 8:34am

By “methodology”, I am referring to one of the methods described in the videos I posted here: Using gpt-4 API to Semantically Chunk Documents - #112 by SomebodySysop

Level 1: Character Splitting - Simple static character chunks of data
Level 2: Recursive Character Text Splitting - Recursive chunking based on a list of separators
Level 3: Document Specific Splitting - Various chunking methods for different document types (PDF, Python, Markdown)
Level 4: Semantic Splitting - Embedding walk based chunking
Level 5: Agentic Splitting - Experimental method of splitting text with an agent-like system. Good for if you believe that token cost will trend to $0.00
Bonus Level: Alternative Representation Chunking + Indexing - Derivative representations of your raw text that will aid in retrieval and indexing

Topic		Replies	Views
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4617	January 26, 2024
Building first RAG system API	17	1727	July 6, 2025
Preparing data for embedding API	33	15244	December 16, 2023
The length of the embedding contents API	48	35315	December 13, 2023
Poor quality response on trained LLM with pdf files Community gpt-4	29	6770	May 1, 2024