Preach, Brother! Ain’t it the truth. I mean, if I just wanted a simple question in answer out chatbot, I would have been through months ago. It’s the what-ifs that get you bogged down. What if the user asks the question this way instead of that way? What if their question is more of a keyword search than a semantic search? What options can you give, besides “Sorry!”, if the search is not successful? I mean, it goes on and on and on…
I only skimmed through this thread, but from what I can see, the initial poster has skipped a lot of steps in the process. I mean, the learning process. I’ve seen a lot of people throwing out a lot of good suggestions and ideas, but it’s like making suggestions in Arabic when one only knows French.
So, I would suggest to @ abhi3hack to understand what an embedding is, and the difference between embedding and fine-tuning: https://www.youtube.com/watch?v=9qq6HTr7Ocw&t=110s&ab_channel=DavidShapiro~AI
If you understand and decide embeddings are the way to go, take a look at this flowchart and make the time to understand what each step of the process is, and how it is accomplished.
And, if you don’t understand the flowchart, watch this tutorial, which steps through the same process: https://www.youtube.com/watch?v=Ix9WIZpArm0
Then, when you truly understand this entire “chatbot” “chat with your data” process, come back and ask: So, how can I do this without uploading my private data to a remote vector store. The answers you get back will make a lot more sense.
I’m no expert at this. I’ve got tons more to learn. I’m just sharing what I did to get to the point where I could competently code chat applications from point A to Z.