I’m wondering if the API can fetch the text chunks that were retrieved by the file search tool for a given thread message? I’ve tried the playground and the Python API but seems they are only able to display the file name(s) but not the specific chunks.
I wanted to see the actual chunks since it would allow the users to diagnose the RAG’s output, also it would allow us to compare the performance (i.e. answer faithfulness) between different RAG systems.
Vector stores cannot be used outside of Assistants.
You can set the chunk size and overlap yourself when creating. You can set the number of chunk results to return, default 20.
Then it not really RAG, it is a search function. The search is a 256 dimension 3-large embeddings search with the query the AI wrote. There is no similarity threshold, just max chunk count.
Typical search return is larger than the AI can return. You can ask a typical question, or ask for a specific search query sent to myfiles_browser, and then ask for a short report from which you can infer where in the documents the results came from.
This may be more obfuscated if you didn’t provide plain text, like the OpenAI API yaml specification of 1MB in this case, where I asked how to make speech. The actual assistant may be less helpful in dumping out the knowledge without some prompt engineering.