What is the chunking strategy used by the Assistant?

ccl · November 7, 2023, 8:56pm

Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

Referencing https://platform.openai.com/docs/assistants/tools/knowledge-retrieval

There are several chunking strategies that are used in the wild (ie. fixed sized, context aware, etc), all with different tradeoffs. What chunking strategy (or strategies) does the assistant knowledge retrieval use?

kesar · November 7, 2023, 9:09pm

I would love to know more about the chunking strategy too

tommedema · November 8, 2023, 2:31am

Same question here, as well as how pricing works since we have no control over how much of the chunked text is used in model invocations

snez · December 6, 2023, 6:18am

This is a very important question that should be clarified in the documentation. We currently use a custom Qdrant based solution with carefully separated QA pairs, indexed individually. This approach significantly improves answer quality, however the Assistants API expects to merge your data and pass it as large files, only to be re-splitted based on an unknown chunking strategy. We wouldn’t think of migrating until this is clarified in the documentation.

kingsframe · March 17, 2024, 4:31pm

anyone know this? This is kinda of blocker that will force me to stop using assistants. Because I cannot compare and improve my results.

In general, if the assistant outputs suboptimal answer, I would try to improve it by checking how the data is chunked, what is the vector db output, or is my data not processed correctly.

If I can’t control these variables, then I will be forced to build my own RAG pipeline.

deepwell · December 3, 2024, 9:08am

The API does return references to the documents used. But I’m not convinced that it really reports all documents it used and one certainly cannot know which chunks it used. That makes fine tuning the documents and prompts difficult.

Also, it seems there is no way of telling when an uploaded document was fully “digested” (fully chunked and embedded and available for RAG)

deepwell · December 5, 2024, 10:34am

I just stumbled over this in the documentation:

" Inspecting file search chunks"

You can get granular information about a past run step using the REST API, specifically using the include query parameter to get the file chunks that are being used to generate results.

https://platform.openai.com/docs/assistants/tools/file-search#customizing-file-search-settings

Topic		Replies	Views
Assistants with knowledge base: How to determine atomic piece of information during chunking for more accurate retrieval? API assistants , assistants-api	0	1333	November 10, 2023
How does chunking work in OpenAI Assistant API's vector store? API chatgpt , assistants-api	1	262	March 13, 2025
Assistant's Retrieval Chunks in Playground: Can the Size be Controlled? API assistants	1	1406	November 18, 2023
Get retrieved text chunks from file_search tool? API assistants-files	2	1042	June 7, 2024
File retrieval in assistant uses huge amount of input tokens API assistants-api	11	2954	June 12, 2024

What is the chunking strategy used by the Assistant?

Related topics