What model does OpenAI use for the embeddings? When the documents get chunked after they are references and put into the vector store?? Thanks for the info folks…
Hello,
By default if you’re using
embeddings=OpenAIEmbedding()
It will use : text-embedding-ada-002
Otherwise you can choose your model by adding this :
embeddings = OpenAIEmbeddings(deployment="text-embedding-3-small")
Hope this helped
[
Creating vector stores and adding files
You can create a vector store and add files
I mean when its being chunked and processed from your file, i dont mean the embeddings api.
We unfortunately know almost nothing when it comes to File Search… At the moment it is kind of a black box when it comes to the inner workings of it like the embedding model, the search query formulation, the search results, and the search parameters. The File Search itself performs really quite well, even with (in our case) over 3000 different files and manages to provide a relevant answer like 90% of the time. However, we need more information on how it actually works and we need to be able to tweak the parameters. So, I can unfortunately not answer your question.
Can this be configured/modified through the Assistants/Files UI on the platform?
You can configure the chunking of the files when uploading them to the vector stores for the embedding/vectorization, and you can customize how many max results the file search can return in the UI as well as the API. However, that is all we can do for “customizing” the file search at the moment. You can try to steer it more into a general direction using prompting for how to formulate the msearch query (what the assistant uses to query the semantic and keyword search tool) and what it does with the results it received, but it’s not very reliable and ideal, since there is no good way to check the search procedure.