Can a good prompt prevent 'hallucination'?

Need help fixing hallucinations from a tool I am building that summarizes a news topic from within a series of news articles.

My process is: load multiple new articles → chunk data using recursive text splitter (10,000 characters with 1,000 overlap) → remove irrelevant chunks by keywords (to reduce noise) → create embedding using OpenAI → store vectors with FAIIS → receive top 10 vectors using query with similarity search → send each to LLM (temp=0) with prompt “please summarize into bullet points anything about the Toronto teachers strike” → concatenate summaries and send to LLM with prompt “please provide a verbose summary of the Toronto teachers strike”

This is working pretty well… but i am seeing random bullets points showing up that have nothing to do with the Toronto teachers strike. Like “-the Toronto Maple Leafs beat the Calgary Flames 3-0”.

I can see this is happening because my docs, after splitting, will have a small segment from a different news topic (like the sports section) at the beginning of the text, followed by the section I want.

Has anyone run into this and found a way through prompting to ensure any bullet points are only to do with the topic I want?

Hi and welcome to the Developer Forum!

That is a classic garbage in garbage out problem.

You may may use of introspection if you are using GPT-4, in this method you pass the models reply back to the model with a prompt asking it to check for any errors or inconsistencies and to check it it answered the original question as well as it could, it will usually pickup these kinds of errors.

4 Likes

Ensure you have a clean dataset. If you’re using Langchain or LlamaIndex, you can apply post-processing to the response. Sometimes, requesting verbosity can lead to the generation of responses that are out of context or unnecessary. What’s exactly you are prompting? Is your prompt follow below

“please summarize into bullet points anything about the Toronto teachers strike”
1 Like

Thank you for the response! Yes, the prompt is “please summarize into bullet points anything about the Toronto teachers strike from the text below: {context}” (though I’ve tried many variations). One thing I’ve been thinking if the word “summarize” make the LLM think it needs to take the whole context and summarize it?

To execute this process, you’ll need a combination of tools and services, including web scraping tools, text processing libraries, OpenAI’s API, FAIIS, and more.

1 Like

This answers the headline, but maybe not the actual question? I was asking a how-to question about git and was getting output with suggestions for nonexistent features. I resorted to what anthropomorphizers call the flattery pattern, and said something like “Eric is an expert in git who gives careful and accurate answers to his colleagues. How would he answer {question}?”, which finally got me some technically accurate output.

3 Likes

I’m having the same issue. The questions are seeming to confuse the context with the prompt model. I have had past success , but with the recent update I’m back at square 1. Thanks for the feedback. I see something here I will try out.