Hallucination vs Confabulation

Out of curiosity, why has ‘hallucination’ been chosen as the word to describe these errors? It seems more appropriate to describe them as confabulations, but my background is in neuroscience/neurology rather than computer science. I’m sure I’m missing something simple.

Grateful for any insights.



I like that. It makes more sense to me to now you point it out.


And here is ChatGPTs thoughts on the topic.

Yes, it would be more accurate to say that AI models, especially language models like GPT-4, confabulate rather than hallucinate. Confabulation refers to the generation of plausible-sounding but potentially inaccurate or fabricated information, which is a common characteristic of AI language models when they produce responses based on limited or incomplete knowledge. This term better captures the nature of AI outputs as it emphasizes the creation of coherent, yet possibly incorrect, information rather than suggesting the experience of sensory perceptions in the absence of external stimuli, as hallucination implies.


I think your idea here is great,
Confabulations are definitely a more accurate term than hallucinations, I think the term hallucinations where chosen because it’s a general, more readily understandable term by must humans. Confabulations are more correct but most people, especially those who are not native speakers, including myself, will have to look it up to know exactly what it means. :laughing:

Keep in mind that GPT doesn’t have “thoughts” it attempts to produces an output that is most likely gain positive feedback from you, it does this based on probability, if we ask it a very leading question, like this one:

would it be more accurate to say that AI models, especially language models like GPT-4, hallucinate rather than confabulate?

By telling the model “hallucinate rather than confabulate”, we’re telling it what to answer, and it does exactly that:

In the context of AI language models like GPT-4, the term “hallucinate” is sometimes used to describe situations where the model generates text that is not factually accurate or consistent with reality. For example, if a language model generates a news article about a fictitious event, that could be considered a form of “hallucination.” However, the term “confabulate” is typically used to describe situations where a person generates false memories or information in an attempt to fill gaps in their memory. This concept doesn’t really apply to AI models.

I still agree with you tho :laughing:


Started running this down in the research papers.

At present found this

Survey of Hallucination in Natural Language Generation by Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Wenliang Dai, Andrea Madotto, Pascale Fung (PDF)

Researchers started referring to such undesirable generation as hallucination [125] 1.

Footnote: 1

The term “hallucination” first appeared in Computer Vision (CV) in Baker and Kanade [5] and carried more positive meanings, such as superresolution [5, 112], image inpainting [48], and image synthesizing [226]. Such hallucination is something we take advantage of rather than avoid in CV. Nevertheless, recent works have started to refer to a specific type of error as “hallucination” in image captioning [13, 159] and object detection [4, 83], which denotes non-existing objects detected or localized incorrectly at their expected position. The latter conception is similar to “hallucination” in NLG.

Feel free to run down references such as [5] and [125]. The footnote is enough for me.

However the Wikipedia article for AI Hallucination does note

In artificial intelligence (AI), a hallucination or artificial hallucination (also occasionally called confabulation [1] or delusion [2]) is a confident response by an AI that does not seem to be justified by its training data.[3]


Good job searching the litterature!

Out of curiosity, can you describe your method for finding that paper?

I especially liked this figure in the section about visual hallucinations:

1 Like

I can’t remember the exact Google query used but it was like this because the paper was near the top of the list.

natural language hallucination ai word origin pdf

In earlier research learned of the Wikipedia page for AI hallucination so have also been using AI with the word hallucination when searching.

As can be seen just four ideas combined gets one to the paper,

  1. natural language
  2. hallucination ai
  3. word origin
  4. pdf

I openly share this but no matter how much I share it seems to remain a secret.

To easily find research papers on the internet add the keyword pdf to the list of words, especially if the title of the paper is given as you will get much higher quality results.


for “Survey of Hallucination in Natural Language Generation”
search for "Survey of Hallucination in Natural Language Generation" pdf (results)


This is interesting, based off of the historical context, could the processes that lead to the initial hallucination cases (reporting things that aren’t there in an image, computer vision?) be considered something entirely different than the processes that result in confabulation in LLMs? Would it be useful to separate these terms while we’re trying to understand these errors more deeply?


My personal view is no. Both words have an established meaning. Reusing a word with an established meaning and/or adding another meaning to it just compounds the problem.

Many decades ago I took Latin and learned how many modern words with languages that derived from Latin were created, such as Galaxy. (OK, Galaxy derives from Greek but the idea is the same) Then it was obvious how many words over the century came about. But now more often the art of coining words from Latin bases is fading away and instead people are reusing existing words. One word that is way overused is the word functor that one is a pain to use as a keyword because it is connected to so many things with very different meanings.

What is even worse is one letter operators in programming languages, how many know that the comma (,) is really a binary operator. Most just think of it as a line ending or separator but if you write parsers it really is an operator just as much as (+) is an operator.


This is why I love bringing so many different fields of study together.
Great collaboration and I just learned something this Saturday morning!


Yeah I agree with Eric on this, i don’t think it makes sense to separate them, but if you can create some demarcation criteria that separates them, I’ll say go for it :laughing:

1 Like

While we’re at it:

I’d like to share this absolutely amazing tool for searching research papers. I found yesterday so I’ve only done minimal testing:

It does minimum keyword extraction with BERT, searchers arxiv and finds similar keywords and papers, then it throws it up on a local web interface, it’s very interesting.


Yes, I have noticed its tendency to please. And I know I should make my questions more neutral. It makes me realize that, when talking to people, I tend to load my question towards the desired answer.

I have been practicing my critical thinking. I guess I need to try harder.

1 Like

It seems to me that the original use of hallucination was originally more objective rather than suggesting it was intrinsically wrong as per this quote.

So under the right conditions it hallucinations are “creative”. But in the context of some prompts that creativity is not welcome. They are just out right wrong or even a lie. And we already recognize this in our own world. “She was creative with the truth”, euphemistically meaning she lied.

So based on that, I think hallucination might be a useful generalized term. Which becomes “creative” when we want fiction or another creative output. And “confabulation” when we are looking for facts.


Don’t be too hard on yourself, even if you had phrased your question in a perfectly neutral way GPT would still have applied different weights to your input according to pre-training data and order of appearance :laughing:

1 Like

By the way, when a patient is confabulating, it’s bizarre and fascinating (I consider myself a compassionate doctor, and this isn’t something I play with when I’m interacting with patients, but there are times where you need to legitimately test for it). You’ll hear this incredibly compelling, fabricated story if you just start it off in pretty much any direction. To me, it almost seems to reflect the LLM that we all have (when the processes that ensure/incorporate reason and context aren’t functioning appropriately but the language center is completely spared/preserved).

1 Like

I thought of the perfect term the other day, something like “wild imagination”, because it’s exactly like when you say to a person “you have a wild imagination”.

1 Like

I like the idea of using a term (even just personally) that makes it clear LLM outputs aren’t really a reliable source for truthful information (yet…). I feel like confabulation is more appropriate semantically (compared to hallucination) for inaccurate or untruthful LLM outputs and that the opposite could be considered true for things related to computer vision. Especially after learning about an original/very early use thanks to the scholar @EricGT .

I think “untruth” would be reasonable but Orwell gave that one a bad wrap :sweat_smile:

1 Like

My background is also in neuroscience/neuropsychology. I think it’s more likely that the computer scientists didn’t know that confabulation is exactly the term they were reaching for. Wonder if it ever changes or if this sticks.

1 Like

I don’t think the term hallucination is the best fit for the job, but still believe is good enough for the general public, specially for being a wider known term.

Confabulation might cover the neuroscience side, but it doesn’t relate to what’s happening in the model.

I think a better fit could be the term Pareidolia