What is the chunking strategy used by the Assistant?

Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

Referencing https://platform.openai.com/docs/assistants/tools/knowledge-retrieval

There are several chunking strategies that are used in the wild (ie. fixed sized, context aware, etc), all with different tradeoffs. What chunking strategy (or strategies) does the assistant knowledge retrieval use?

11 Likes

I would love to know more about the chunking strategy too

Same question here, as well as how pricing works since we have no control over how much of the chunked text is used in model invocations

1 Like

This is a very important question that should be clarified in the documentation. We currently use a custom Qdrant based solution with carefully separated QA pairs, indexed individually. This approach significantly improves answer quality, however the Assistants API expects to merge your data and pass it as large files, only to be re-splitted based on an unknown chunking strategy. We wouldn’t think of migrating until this is clarified in the documentation.

1 Like

anyone know this? This is kinda of blocker that will force me to stop using assistants. Because I cannot compare and improve my results.

In general, if the assistant outputs suboptimal answer, I would try to improve it by checking how the data is chunked, what is the vector db output, or is my data not processed correctly.

If I can’t control these variables, then I will be forced to build my own RAG pipeline.

4 Likes

The API does return references to the documents used. But I’m not convinced that it really reports all documents it used and one certainly cannot know which chunks it used. That makes fine tuning the documents and prompts difficult.

Also, it seems there is no way of telling when an uploaded document was fully “digested” (fully chunked and embedded and available for RAG)

I just stumbled over this in the documentation:

" Inspecting file search chunks"

You can get granular information about a past run step using the REST API, specifically using the include query parameter to get the file chunks that are being used to generate results.

https://platform.openai.com/docs/assistants/tools/file-search#customizing-file-search-settings