Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.
There are several chunking strategies that are used in the wild (ie. fixed sized, context aware, etc), all with different tradeoffs. What chunking strategy (or strategies) does the assistant knowledge retrieval use?
I would love to know more about the chunking strategy too
Same question here, as well as how pricing works since we have no control over how much of the chunked text is used in model invocations
This is a very important question that should be clarified in the documentation. We currently use a custom Qdrant based solution with carefully separated QA pairs, indexed individually. This approach significantly improves answer quality, however the Assistants API expects to merge your data and pass it as large files, only to be re-splitted based on an unknown chunking strategy. We wouldn’t think of migrating until this is clarified in the documentation.