Scientific Criticisms of LLMs are Somewhat BS

This post is meant for discussion and sharing of ideas. I just felt compelled to pushback on the tidal wave of party poopers writing papers on LLMs suggesting the fun is over. 2024 is still the ground floor.

I am a trained scientist. My expertise is in research on humans not AI (although I’m now doing both). I just have to post here because I’m disappointed with the level of LLM criticisms being published.

As a researcher I understand the mindset of scientists publishing on LLMs and I find them annoying these days. If you are like me, and find these critiques annoying let me help you understand why. Forgive me for using clear words when really, to be accurate I shouldn’t sound so definitive :slight_smile: I no longer do lab research. I conduct and apply research findings in the real world and have lost the interest in taking like I don’t know anything (that is how scientists prefer to speak).

Why are the AI researchers being so annoying:

(1) Scientists disprove the last thoughts. You only publish what adds to the knowledge and therefore it must prove something new / disprove prior research. Every new paper is basically saying ‘ha, ha, you were wrong.’ Annoying.

(2) Scientists must run controlled experiments to prove anything at all and thus the best research is artificially constrained and does not mimic reality. This means they find strong evidence that when a human has 10 arms they cannot play the piano and therefore more arms doesn’t help you play the piano. Never happens in real life buddy: annoying.

(3) Only scientists in the same subfield can properly read and understand scientific papers written by a single set of authors which leads to a lot of misunderstandings. Scientists write without proper explanation or interest in explaining themselves to anyone who doesn’t already understand them: annoying.

All this to say that I am getting a little annoyed at all these inflammatory journal titles about LLMs.

For example, highly constrained unrealistic lab experiments on LLM training using synthetic data seem to underperform, according to the scientists.

I call BS.

If the scientists can’t figure out how to generate useful synthetic data and LLMs that can work with that data these scientists shouldn’t be publishing on the impact of synthetic data.

I am going to sound definitive again: there is a way to generate and consume synthetic data in order to train superior LLMs. Of course there is. It’s obvious. LLMs work like playing ping pong. The LLM hits back what you serve to it.

Again, just because the method used in one study, or even most studies show this quality gap - Does - Not - Mean - there is no viable method to get from point A to point B.

Always keep in mind that the collective wisdom of science at any given moment is wrong, or, scientific research would cease. So as long as we need more scientific research for any topic, the implication is we have been wrong for a long time and are collectively working to fix that.

1 Like

I think you might be over generalizing what scientist actually do. What discipline where you trained in?

And could you give some examples about articles that you’re mad about, it’s very hard to to understand this criticism without concrete examples :laughing:


Well you what know the main causes of that? Is not so hard to figure it out

Thanks for the comments and questions. How about rather than discussing specific articles we can discuss inflammatory jargon such as “hallucination” and “catastrophic forgetting.”

Jargon is efficient but a somewhat lazy form of communication. Nonetheless, LLM researchers continue to use well known terms and redefine them for use within their own communities. That happens a lot in every field but with AI, they are using words from human experience. So rather than using a term like “over generalization” they call it hallucination. In every day life, hallucinating carries extremely negative connotations. I find that somewhat irresponsible.

These emotionally charged words are first of all, fictional terms and could have been labeled anything at all. These terms strike fear in non-scientific audiences and my main criticism of the publishing AI researchers continuing to pump out the rarest of rare times these tools fail to operate accurately with ease, is that they are not helping as much as they could be. They are being noticed and therefore rewarded and so it continues

I agree I over generalized but I also warned I would. Mad is not the best term though it’s more of being annoyed. I’m annoyed because their work turns into fear and slows adoption.

I believe communication of expertise requires ethical judgment about the audience and how that communication will be perceived. It’s not an easy balance. Scientists don’t write for the news, they write for each other. But the authors are to blame for the reactions to their words. We can’t control what others think and do but we can control what we publish.

I would love your perspective.

In a way or another allot people on tecnology and science are scientists.
Well, imagine the scientific community as a diverse ecosystem. In this ecosystem, you’ve got two main species: the logic and neutral ones, which are like the rare gems, and the emotional intelligence and ethics aficionados, who are a bit more common.
Now, the logic and neutral scientists, are the ones who thrive on pure reason and efficiency.
On the other hand, you’ve got the emotional intelligence and ethics folks. They’re not necessarily slowing things down, but they tend to approach things with a bit more on a peculiar way or limiting to much some times , for example Claude AI , it can’t make a horror book or similar…
Well this is my logical view , can be relative or hard to understand but is my opinion.

1 Like

Over generalization isn’t the same as a hallucination, this isn’t just jargon, these terms have specific technical meanings within AI research, which aren’t the same as the ones applied within psychology.

The terms “hallucination” refers to generating incorrect information, and “catastrophic forgetting” describes losing old information when learning new data. The usage of these terms is intended for clarity within the research community, not to evoke fear in the general population.


First, I’m not sure I’d categorize these as inflammatory, but I’m curious to hear what terms you feel would be better.

Personally, I’ve used confabulate more-or-less interchangeably with hallucinate.

1 Like

Thank you for sharing. I like your big picture view and that’s not at all what I thought you would say. So to summarize, you are suggesting there is a marketplace of ideas. One side is targeting objective facts and the other is trying to infer meaning and the two are colliding.

I agree this is a good explanation of what is emerging and the fact that this is not centrally controlled implies little can be done about it. I agree there too.

It is shaping the future of AI work and public perception so WE certainly can be aware and look past the emotions and spot the gaps.

Understanding the momentum helps us all predict the future and explain the past :nerd_face:

1 Like

I see, well I interact with typical office workers every day and have had 12 months of discussions with resistant average people.

Words like hallucination in the context of AI cause them anxiety. Fear immobilizes. It’s an inhibitor. Think of the child wanting to cross the street who first stops to look both way to avoid getting hit. So when AI is disused using terms that evoke concepts like illegal drugs and mental disorders such as schizophrenia, it’s going to cause adoption barriers and the news will grab the fear and sell more clicks.

Fear tricks us into thinking we are being safe through inaction when really, most of us are safe already, and inaction with AI is self defeating (so fear here is actually causing risk for the non-adopting adult worker).

First of all, to me, hallucinations are just a side effect of user error. So it’s not exactly fair to me to blame it on the AI. As if the method of generative AI is inherently flawed and will always cause hallucinations. This is only true for the 1 prompt: 1 reply interaction with a general model for a general task.

With a proper set up “hallucinations” can drop to near zero. So while yes, AI output contains incorrect information in higher amounts than anyone wants, there are nearly infinite ways to reduce this side effect of user/developer error from near zero to undetectable levels.

I would have to think about an ideal term but what the LLM is actually doing is over generalizing. That’s not unique to LLMs. Amazon, Netflix, social media adds, Google search, political polls all do this.

Maybe it should be called “generative errors” since that is what it is actually doing. I tell people “it guesses wrong sometimes.” I translate these terms for end users because understanding reduces fear. I then explain that going into great detail with an LLM father and farther from the first answer is when it becomes more likely to guess wrong.

The current terms make people believe asking it anything at all is likely to yield fiction presented as fact and that is not true.

Further, it’s incredibly easy to check output for errors: copy, paste into a new conversation after first typing Is this accurate: “ ”

Hallucination as a term invokes fear and helplessness. That is just one term keeping adoption slow.

Slow adoption turns out to be good for you and me. But I still don’t like it, especially given it is in part due to AI scientists and AI spokespeople using scary words :nerd_face:

I would argue that what is striking any fear in people is the over-hyped forecast of the imminent arrival of killer AI bots, not if GPT 3.5 gets a maths question wrong! :slight_smile:

(I’m personally completely fine with the term “hallucination” and understand exactly what it means and its gravity - or lack of it!)

1 Like

Yes, I know. Every field defines their jargon using a special dictionary they create. That doesn’t chance the fact that words already have preexisting meaning prior to their use a jargon in a new subfield.

AI will be plagued with misunderstandings and probably always has been because it continues to use preexisting terms from psychology.

Neurons in AI are incredibly simple compared the neurons in the human brain for example. The billions of neurons in an LLM like GPT-4 are dumb compared to less than 100 healthy human adult brain neurons. Yet - because it’s the same word - many if not most educated AI researchers don’t realize this fact and believe that because LLMs have billions of neurons and the human brain has billions of neurons, AGI is right around the corner.

This false equivalency scares the public and is burning out the AI insiders.

GPT-3.5 (and the follow ons) are THE breakthrough. We don’t need to wait, we need to implement.

Words matter and having to say “X and Y occur, but first let me define X and Y for you because they don’t mean what you think they mean” is poor communication.

Communication is about the receiver, not the sender. It is the job of the sender to make it simple and easy for the receiver to understand what you mean, not what you say.

Those last two sentences are the heart of my post here and I’m glad I’m raising some awareness because it doesn’t sound like my message is obvious to everyone :grinning:

So write a paper demonstrating the opposite with detailed examples, analysis and proof?

1 Like

Your main argument here is that the term “hallucinations” invoke fear and anxiety in the average person, I don’t see it like that and I think the argument you’re making is derogatory towards people with mental health issues.

1 Like

This isn’t what it is doing though.

This isn’t a useful term though because it’s not descriptive of the type of error.

It really doesn’t.

It’s really not a scary word, it’s a descriptive word that conveys what is occurring inside the model.

You don’t need to like it nor do you need to use it, but it’s the accurate term used in the scientific literature and you didn’t get to police that.

Closing the topic now as it seems there is nothing left to be said on the matter.

1 Like