What is the chunking strategy used by the Assistant?

Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

Referencing https://platform.openai.com/docs/assistants/tools/knowledge-retrieval

There are several chunking strategies that are used in the wild (ie. fixed sized, context aware, etc), all with different tradeoffs. What chunking strategy (or strategies) does the assistant knowledge retrieval use?


I would love to know more about the chunking strategy too

Same question here, as well as how pricing works since we have no control over how much of the chunked text is used in model invocations

1 Like

This is a very important question that should be clarified in the documentation. We currently use a custom Qdrant based solution with carefully separated QA pairs, indexed individually. This approach significantly improves answer quality, however the Assistants API expects to merge your data and pass it as large files, only to be re-splitted based on an unknown chunking strategy. We wouldn’t think of migrating until this is clarified in the documentation.

anyone know this? This is kinda of blocker that will force me to stop using assistants. Because I cannot compare and improve my results.

In general, if the assistant outputs suboptimal answer, I would try to improve it by checking how the data is chunked, what is the vector db output, or is my data not processed correctly.

If I can’t control these variables, then I will be forced to build my own RAG pipeline.

1 Like