I’m working on a QnA system for the business I work at. We have a big knowledge store of long documents, that change every now and then. I’ve build a script to split the documents in smaller documents (up to around 1000 tokens) then, using the embeddings API, I create the embeddings and store them. Then the user asks a question, which is also embedded and then use the cosine similarity to find the appropriate top k text fragments and call the completion API. So far so good.
However, I was wondering, how would I create a system where the user can either ask follow-up questions, where the question won’t have enough context and consequently the embedding-search mechanism wouldn’t work on the question alone, or ask questions on completely different topics.
I’ve thought of maybe summarizing the previous context and adding it to the prompt but I fear that could ‘harm’ the search if the question is on a different topic.
My question is, what is the best way to approach this problem? Multiple API calls are not and issue, and fine-tuning wouldn’t be a problem either.
Thanks.