Hi, thanks for your reply. Well, I don’t think it’s that clear-cut how it should work. Because it’s logical that not everyone is an experienced programmer, and people naturally look for the easiest solution. I’m not a programmer myself, but I enjoy following new tech and I work at a company with about 20 people, where I wanted to try building a RAG system on my own.
I managed to do it using Baserow as the data source, Qdrant as the VS database, and I built the whole flow in N8N, everything self-hosted on my own server. It all works, but since it’s not my core business, I do get a bit tired of managing query prompts, scoring, monitoring, etc. So I started looking for an easier solution, even a paid one.
ChatGPT Enterprise does have this option, but with big file limitations, and besides, it’s quite expensive to roll out for 20 people at a company our size. The money might be acceptable, but the file number limitation is restrictive. I don’t want to merge files for many reasons. So I thought, well, nothing to do about it — there just isn’t another solution than the one I already have.
And then I came across vector store, which basically does almost everything. Yes, it’s a bit of a black box, but I don’t mind that — if something works, it’s fine for me, since I’m not a programmer as I said. And because I already have all my data nicely organized in Baserow, I quickly spun up a test version using the store, and I ran into the issue that it found the record, but didn’t return the important field, simply because it was cut off in a bad chunk.
So yes, I can upload multiple types of source files per record into the vector store, and yes, that probably solves it. But still, it’s a pity — because actually very little would be needed. If only every chunk carried its file_id with the files, and when returning a response, all chunks with that file_id would be retrieved. There could easily be a context window per file if we wanted to enforce that.
I can of course put this in attributes, but that doesn’t help me in this case. Or rather, it does, but then I’d need to stitch the answers together myself. It would just be one additional function that could make the store more attractive to more people. I understand that professionals don’t need it, and those who don’t understand the tech won’t care either. But I think there are quite a few people in between — people like me who enjoy it, but don’t want to spend time managing flows, scoring, and so on.