I read this in Open AI Assistant tools documentation
The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:
1. it either passes the file content in the prompt for short documents, or
2. performs a vector search for longer documents
In case on longer documents how is it going to work. Will it follow RAG retrieval methodology
Break long docs in chunks → convert each chunk to vector using an embedding model → store these vectors in vector store → retrieve the chunk of text whose vector is most similar to use query embedding vector and pass it into prompt of model.
If above is the methodology which embedding model and similarity metric are being used ?
If not then what is the methodology used for longer documents ?