I’m asking that while using assistants with storage (Vector-Store) what is the best file formatting that the model could analyze correctly and faster than others? I’m using the .json
format that had 40661 lines but it is too slow in retrieving data
note that the vector store has only one file for testing, production may have +10 files with more data
I know that’s not what you expect as the answer but from my personal experience, the best format to store data for rag engine is a combination of a relational database plus a vector database plus a robust rest API to allow AI to do the queries over the data.
Storing inside a file works at the stage of a proof of a concept but then when you go to production you end up with this. So maybe you should consider this option from the very beginning.
As for the file for the markdown was good for me. But then it depends on the data you’re storing.
Are you talking about the time between when a document is uploaded and when the vector store is made available, or instead, are you discussing the time for language generation.
The format of data doesn’t slow down the speed of employing the embeddings-based vector store. The data is extracted from any file into text that is readable by the AI, and is available quite quickly to the AI by only additional embeddings call at runtime being on the search query emitted by AI.
The problem with tabular data such as JSON is that when that knowledge is chunked, it may lose context of where in a hierarchy it is contained when sliced up. It might not be broken at a point related to the contained data object or the level of necessary understanding of individual items. Also, a semantic search has little use on data that is all essentially similar and many elements are in one chunk – you have a low chance of any query like “answer about everyone named smith” working.
Plain text is the most inspectable to you - it should have little additional processing before reaching the AI, and doesn’t have additional formatting or container to confuse embeddings or increase commonality.
thanks for your response @sergeliatko
could you explain this more?
Do you mean that my vector store or assistant files be .sql files or .db
or do you mean that I upload many files as the assistant treats them as a database?
combination of a relational database plus a vector database plus a robust rest API
Basic approach would be (access as layers from outside to inside):
- Load balancer/reverse proxy as entry point
- REST API to process the request where you define your endpoints/operations , preprocess request and decide what of the backend features you will be using and how to transform the results returned to the client
- Relational database where your structured data is stored, ideally accessible as a db service via internal API
- Vector database where your embeddings are stored (traceable to the entities in the relational DB and backwards), also setup as a service accessible by #2 and #3
- AI service that is used by #2 when needed
Example of progress:
User asks assistant to check the sales stats for a product but messes up the product name
Assistant hits endpoint /stats?product=a, #1 send request to #2, who sends the SQL to #3 and gets “no such product” response, after which the #2 either directly searches products similar to A in #4 or checks with #5 what to do next (reports error to product department AI, and receives function call response to search products in #4), then if only one product match was found, #2 gets sales stats for it and returns to the assistant, or sends results to #5 to be instructed to send “precision request” response back to the assistant with the list of possible product names…
Sounds complex, actually, but in reality not so complex as you could imagine. Just pay attention to the API definitions and edge cases.
The benefits: pretty much no limit to what is doable with this approach.
P.S. “assistant” is the front door AI that interacts with the user, and has only the API definitions of #2. That’s all it needs to know about. It’s more like an “adapter” between a human and your app available via API. The true magic is in the code of #2 and features of #5.
See example of custom GPT connected to an app using this approach (screen shot from the phone)
And here are the tricky parts of that:
The unconfirmed bookings are actually a third party database of form submissions and are called reservations through the API, so the assistant went to third-party database checked the unconfirmed booking then went into my app database found if the guest had tickets paid, found two orders, then went back to third party database and updated the first and the second reservation so that they no longer count it as “pending”. Sales stats come from orders table in the app. Notice as I ask for ticket sales not for orders, neither I do precise what status should be assigned in the third party database, the assistant knows it from the API definitions.
Have you tried to use a plain text doc? In my experience that brought better results than json. Not sure if that’d be possible in your scenario though.