langchain. They’ve already done a LOT of work around this. There is one function i tested ConversationRetrivalChain i think where it takes your history, boils it down into one question then sends that to another chain qa_chain maybe where they search the documents, ask a few leading questions and supply that to another LLM then take that answer and feed it back into the message list. and you can choose for it to be in memory or not. Pretty slick
here is my five cents:
the “course” word should be enough to populate similar results and it may have to do with the similarity threshold/score value you have in place when making the embedding query.
First get more matching results I’d say, then adjust your prompt as suggested in the thread. Good luck!
@Klassdev If you are using similarity algorithms, try to find out first top 5 scored embeddings and query it against OpenAI and show the result.
I am also seeing issues with processing large files with lot of texts. I see that if I ask specific questions or ask to generate summary the results are not accurate. My guess is that the context that is getting passed is missing data and that is due embeddings not being accurate. I am thinking preprocessing data and arranging it before creating embeddings might help or adding meta data also might help.
Does anyone have any suggestions on how to improve accuracy when dealing with 200 pages+ document size?
Are you overlapping your chunks? If not then you should consider doing that, it involves using some of the previous chunk and some of the next chunk and using that on at the start and end of the chunk you are currently embeddings, so lets say 25% of the last chunk at the start and 25% of the next chunk at the end along with 50% of the current chunk. what you are effectively doing is creating a rolling window of chunk context that slides through the data. This means that semantic meaning that would normally get broken at the boundaries of a chunk and now smoothly continued in at least 1 chunk.
Yes, we are overlapping. Is it needed to overlap 25%? Here are my observations with my latest implementation.
Do you have any suggestions on how I could deal with the below problem?
- If I ask a specific question then answers is accurate.
- When I want to summarize I need certain thigns to be part of the summary. So, I ask a multiple of questions at one time. And, this is where I see issues that the response misses info.
For example, if I ask “what is the person’s name?” “what is their age?” “what is there height?” etc. individually then it works. But, if I ask them together then sometimes it will miss the height etc.
For my summarizer I have at least 100 questions and I am using a 16K model.
Vector retrieval can become unreliable when attempting to pullback multiple aspects from the data, keep it to single datapoints for the best results.
Yup, that’s what I am implementing already. Wanted to see if there was a more optimized way of asking multiple questions together
Maybe in the future if embedding models get more advanced, the issue right now is that you are running a semantic query past a fairly simple model and it’ll loose the meaning when presented with multiple sub requests in a single embedding. Embeddings capture the sentiment of a section of information, not the individual sentiments from a larger group, at least, not in any way we currently understand, maybe there are some researchers working on this somewhere.
This is a great explanation and makes ton of sense.
I was able to get higher accuracy but I am running into an issue with API response time now. It varies between 30s - 120s. I’ve seen a lot of people talking about openAI slowing down their API on purpose.
If I cluster all questions at once then it’s not bad as this is a single call but if I ask them one at a time this response time is a deal breaker.