How to perform dynamic chunking based on the content

20euai044 · November 30, 2023, 6:31am

working with gpt api, I need to send top 5 chunks based on similarity.

sometimes ,face this issue where some content is lost due to chunking size. Is there any possible way to achieve dynamic chunking so we can overcome this issue. Thus, based on content the chunking can vary and loos of content is reduced.
Think about it , If you got answer pin it up , could help a lot of folks.
also appreciate any research papers on this…

_j · November 30, 2023, 9:29am

You can embed the data in two different vector databases with different chunk size and splitting techniques.

From the other context you have, you can use just the database chunk size that is preferable.

You can also run two embeddings simultaneously, and then do a quick search to see if the small one significantly overlaps the big one to discard it, even mix-and-match.

You also do not “need” to send top-5 chunks. You should use a similarity threshold. If your database has airplane part prices, and I ask for dog grooming tips, the amount of your database needed is zero.

Topic		Replies	Views
How to deal with real and recommended chunk size? GPT builders	0	981	July 5, 2024
Top chunks for larges context API chatgpt , api	3	3849	January 31, 2024
Automating Chunking for Customized GPT Knowledge in Vector Databases API embeddings , vector-db	2	1870	April 27, 2024
How to Optimize Text Chunking for Improved Embedding Vectorization? API vector-db , semantic-search	6	11914	December 15, 2023
Issues with Data Chunk Overlapping During Chunking – Retrieval Accuracy Problems Community gpt-4 , chatgpt , assistants-api , chunking_strategy , vector-store	1	476	December 4, 2024

How to perform dynamic chunking based on the content

Related topics