Academic Reference Errors

I had to write a submission on Childhood Trauma and used AI to help develop the submission. It looked good at the time properly set out etc
Unfortunately all the references I have checked are false - they are not real journal articles, the pubmed identifier is incorrect and the internet link does not function - I also checked the contents of the Journal for instance January 2002 and could not find the ref. I have had other difficulties with accuracy but this seems to be by far the most important. It seems to stem from the way information data is returned as if it has gathered a different journal title to a different author and a different journal to make something that certainly looks coherent but is in fact not of the real world ?

I can give you a few examples below and you can check yourself … this isn’t a competition as to who can prove this right or wrong but what are the very real problematics if this is indeed inaccurate - if I am wrong I’ll be relieved because my submission was 56 pages long and with a very high importnace to myself professionally and personally .

  1. “Childhood Trauma and the Development of Learning Disabilities: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Learning Disabilities in 2010. This review article discusses the link between childhood trauma and the development of learning disabilities and provides an overview of the interventions that may be effective in supporting children who have experienced trauma.

  2. “The Impact of Childhood Trauma on Attention and Executive Functioning: A Review” by Kristin L. Schneider, MD, and colleagues, published in the Journal of Attention Disorders in 2011. This review article discusses the impact of childhood trauma on attention and executive functioning and provides an overview of the mechanisms underlying this relationship.

  3. “Childhood Trauma and the Development of Attention Deficit Hyperactivity Disorder: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Attention Disorders in 2012. This review article discusses the link between childhood trauma and the development of attention deficit hyperactivity disorder (ADHD) and provides an overview of the interventions that may be effective in supporting children who have experienced trauma.

  4. “The Impact of Childhood Trauma on Language and Communication: A Review” by Kristin L. Schneider, MD, and colleagues, published in the Journal of Communication Disorders in 2013. This review article discusses the impact of childhood trauma on language and communication and provides an overview of the mechanisms underlying this relationship.

  5. “Childhood Trauma and the Development of Language and Communication Disorders: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Communication Disorders in 2014. This review article discusses the link between childhood trauma and the development of language and communication disorders and provides an overview of the interventions that may be effective in supporting children who have experienced trauma.

  6. “The Impact of Childhood Trauma on Social and Emotional Development: A Review” by Kristin L. Schneider, MD, and colleagues, published in the Journal of Child Psychology and Psychiatry in 2015. This review article discusses the impact of childhood trauma on social and emotional development and provides an overview of the mechanisms underlying this relationship.

  7. “Childhood Trauma and the Development of Social and Emotional Problems: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Child Psychology and Psychiatry in 2016. This review article discusses the link between childhood trauma and the development of social and emotional problems and provides an overview of the interventions that may be effective in supporting children who have experienced trauma.

  8. “The Impact of Childhood Trauma on Cognitive and Motor Development: A Review” by Kristin L. Schneider, MD, and colleagues, published in the Journal of Developmental and Behavioral Pediatrics in 2017. This review article discusses the impact of childhood trauma on cognitive and motor development and provides an overview of the mechanisms underlying this relationship.

  9. “Childhood Trauma and the Development of Cognitive and Motor Delays: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Developmental and Behavioral Pediatrics in 2018. This review article discusses the link between childhood trauma and the development of cognitive and motor delays and provides an overview of the interventions that may be effective in supporting children who have experienced trauma.
    Childhood Trauma and Imprisonment

  10. “The Impact of Childhood Trauma on Adult Incarceration: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Trauma & Dissociation in 2010.

  11. “Childhood Trauma and the Risk of Adult Incarceration: A Review” by Kristin L. Schneider, MD, and colleagues, published in the Journal of Trauma & Dissociation in 2011.

  12. “The Relationship Between Childhood Trauma and Adult Incarceration: A Review” by Maria T. Wong, PhD, and colleagues, published in the Journal of Trauma & Dissociation in 2012.

  13. “Childhood Trauma and the Risk of Involvement in the Criminal Justice System: A Review” by Tracey A. Wade, PhD, and colleagues, published in the Journal of Trauma & Dissociation in 2013.

I have seen similar behavior from ChatGPT. The references for my request were completely made up. No actual references were even close to what ChatGPT listed. This is not a matter of an incorrect page number, but there were no articles by the authors cited in the journals listed that were even close. I noticed that the reference titles were sort of a riff on the question posed to ChatGPT, and the same pattern is evident in your references. Each of your references includes the words “Childhood Trauma” and many mention “Adult Incarceration.” I’m betting these were words in your question and that makes the references look plausible, but they are probably all fake. I’ve seen a few other posts mentioning this problem, but only one had a response that said ChatGPT doesn’t have access to the Internet, and might produce incorrect responses. There was no acknowledgement of the seriousness of the problem.

Hey, welcome to the community.

I recommend checking out the ChatGPT FAQ…

Can I trust that the AI is telling me the truth?

ChatGPT is not connected to the internet, and it can occasionally produce incorrect answers. It has limited knowledge of world and events after 2021 and may also occasionally produce harmful instructions or biased content.

We’d recommend checking whether responses from the model are accurate or not. If you find an answer is incorrect, please provide that feedback by using the “Thumbs Down” button.

1 Like

Yes, ChatGPT hallucinates. This is well documented.

ChatGPT, like all GPT models, is a prediction engine (in simplistic terms), like autocompletion when you type.

Think of it like this (a kind of rough metaphor, not exact of course):

When you prompt ChatGPT, it tries to complete a huge jigsaw puzzle from a very large data set of puzzle pieces.

So, when the underlying model has all the puzzle pieces (and the temperature of the completion is low enough), ChatGPT will reply with a nice coherent, accurate solution because it has all the correct pieces.

However, when ChatGPT does not have enough accurate pieces to solve the (jigsaw-like) puzzle, then it will use other (less accurate) pieces to solve the puzzle, because the model is data-driven. ChatGPT does not “know” that it’s grabbing “lower quality pieces” and so it hallucinates answers confidently. ChatGPT is not “aware”, it is simply trying to predict what the completion text is based on a large language model, it is not an AI expert system.

Because ChatGPT does not seem to have the capability to self-evaluate the quality of all the puzzle pieces when it formulates a reply, ChatGPT (like all GPT models) will hallucinate a reply. For example, if the Internet was full of false information about a topic and that data was used for the model development, then of course ChatGPT will provide false completions based on false date. These models use the data in their network and when the data is not accurate or complete, the language prediction is also not complete; and you get these “very confidently hallucinated” completions.

You can easily see this when you ask ChatGPT to write code. ChatGPT will often suggest non-existent libraries or parameters which “seem feasible” but in reality do not exist or have not been implemented. Often the code offered by GPT is excellent. Often the code is fantasy.

Today, for example, I was working on generating API completions (using various models) to populate a DB with this prompt:

Prompt

Where was the moon located on March 2, 2020?

I tested this prompt across a wide range of models and temperatures, and in these tests the completions were “all over the map”. Some completions has the moon in Virgo, Aries, Leo, etc constellations, other completions has “crazy-ish” answers. Some were very conservative, pointing me to the NASA web site for an answer, etc.

This is why, in my view, Dall-E creates very interesting images and fiction writers love ChatGPT.

However, when you are working with ChatGPT (and the API) on technical topics which require highly accurate replies, you MUST VERIFY EVERYTHING.

ChatGPT is a TOOL which hallucinates completions when it does not have “perfect” underlying data. I do not have visibility into the underlying OpenAI models, the code, or how the engineers decided on various trade-off decisions to insure completions.

However, I have done a lot of testing with science and engineering prompts using various API models which are available and ChatGPT directly.

YOU MUST VERIFY EVERYTHING IF YOU ARE WORKING IN A TECHNICAL FIELD.

Hope this helps.

2 Likes

Yes, this is how GPT-models / apps work (at this time).

Look at this example.

Using text-davinci-003 with default params, I used this prompt, saving the results to a DB. This time is currently 11 PM EDT, so it is very dark on the east coast of the US, but look at how the model replies :slight_smile:

Prompt

Where is the sun today?

In the Rails console, I grab all these completions, and you can easily see the variety of answers:

Rails Console

irb(main):040:0> completions = Completion.where(prompt:"Where is the sun today?", 

Unformatted Results: completions

["\n\nThe Sun is currently shining in the sky across the entire Earth.",
 "\n\nThe sun is currently shining everywhere on Earth that it is daytime.",
 "\n\nThe Sun is currently located in the constellation Aquarius.",
 "\n\nThe sun is currently over the central United States region.",
 "\n\nThat depends on what time of day it is. Generally, the sun is positioned in the sky above the Earth, changing its position throughout the day as the Earth rotates.",
 "\n\nThe sun is currently located at approximately −23.27°S and 150.19°E in the middle of the Pacific Ocean near the Great Barrier Reef.",
 "\n\nThe sun is currently located over the eastern United States, central Canada, and the Atlantic Ocean.",
 "\n\nIt is impossible to answer this question with certainty as the sun is constantly moving across the sky. However, if you are looking for more specific information, you can use an online tool like SunCalc to see where the sun is at any given moment in any location.",
 "\n\nThe sun is currently in the constellation of Capricornus in the southern hemisphere, although it is visible throughout the entire world.",
 "\n\nIt is not possible to predict where the sun will be in the sky at any given time. However, it is possible to estimate the time of day when the sun is at its highest point in the sky and its direction in the sky. To do this, visit a website such as timeanddate.com, enter your location, and click the \"Sun\" tab to view the sun's position in the sky at any given time.",
 "\n\nThe sun is currently above the equator and is moving westward throughout the day. It will be high in the sky in the Northern Hemisphere during the day and low in the sky during the night. It will be high in the sky in the Southern Hemisphere during the night and low in the sky during the day.",
 "\n\nIt depends on what time of day it is. In the morning, the sun rises in the east and sets in the west. In the afternoon, the sun is usually in the southern sky, and at night it is in the west.",
 "\n\nThe sun is currently in the western part of the United States, including parts of California, Nevada, Arizona, New Mexico, Texas, Utah, and Colorado.",
 "\n\nThe sun is rising in the east and setting in the west, so it is currently somewhere in the sky above the Earth."]

Summary:

These default text-davini-003 replies are mostly “hallucinations”, because this is a large language model simply “generating text”. Most of the replies are totally incorrect based on the current time, with is 11AM GMT +5. Some of the replies are kinda “poetic” or “philosophical”, some “funny”. The replies are just “all over the map”.

These GPT-3 models are not designed as AI expert systems, however many folks are using them as if they were expert systems. These GPT-3 models generate text, chuck-by-chuck, and anything which requires accuracy must be fully validated when generated by these models.

Note: The “strangest thing to me is” when we see people emotionally or logically attached to their conversations with these hallucinating, auto-completion engines. It should be clear by the example above, that these GPT-3 generators are fun but not to be taken as “facts” and must be fully verified and confirmed if accuracy is a requirement.

If a scholar is trying to use these models to write technical papers, they are going to end up embarrassed if they submit these GPT generated papers for peer review when someone checks for accuracy.

Last post / test on this for today:

Prompt:

What was the phase of the moon over New York City on March 1st, 2010?

Results:

completions = Completion.where(prompt:" What was the phase of the moon over New York City on March 1st, 2010
?", model:'text-davinci-003').pluck(:reply)

irb(main):004:0> completions
=> 
["\n\nOn March 1st, 2010, the Moon was in its waxing crescent phase.",
 "\n\nOn March 1st, 2010, the moon was in its waning gibbous phase.",
 "\n\nOn March 1st, 2010, the moon was in its waning gibbous phase over New York City.",
 "\n\nThe moon was in its waning crescent phase on March 1st, 2010 over New York City.",
 "\n\nOn March 1st, 2010, the moon was in its second quarter, about half illuminated.",
 "\n\nOn March 1st, 2010, the moon was in the waning gibbous phase over New York City.",
 "\n\nOn March 1st, 2010 the moon was in a waning gibbous phase over New York City.",
 "\n\nOn March 1st, 2010, the phase of the moon over New York City was a waning crescent.",
 "\n\nOn March 1st, 2010, the moon was in the waning gibbous phase.",
 "\n\nOn March 1st, 2010, the moon was in its waxing crescent phase over New York City.",
 "\n\nThe moon was in its waning gibbous phase on March 1st, 2010 over New York City.",
 "\n\nThe phase of the moon over New York City on March 1st, 2010 was a Waxing Gibbous.",
 "\n\nOn March 1st, 2010, the phase of the moon over New York City was waxing gibbous.",
 "\n\nOn March 1st, 2010, the moon was in its Last Quarter phase, which is when the left half of the moon is illuminated."]

HTH

ruby_coder
Thank you for this thoughtful answer. I guess I was surprised that ChatGPT generates pretty coherent technical responses, but the references were just fake. There are stories like "ChatGPT passes medical licensing exam, " or I saw an article where a physician used ChatGPT to generate insurance prior authorization responses complete with references. These make one think it is technically accurate, but this is clearly not the case. I have been chastened and will verify everything.
Thanks,
DRG

1 Like

This is the nature of these GPT-3 models @DRG

They reply with the confidence of a scholar and the language elegance of a excellent English speaker (in this case); and so they can hallucinate very convincing (and often very wrong) replies.

Just think of all the people saying to their friends “ChatGPT said this, and ChatGPT said that…” arguing with their friends and colleagues based on the hallucinations of a “kind of” psychotic assistant.

Welcome to 2023 …

:slight_smile:

1 Like

chatGPT is not meant for providing any “true output”

You should assume everything it says is false, not the other way around

1 Like

Not occasionally but always ChatGPT-4 is providing false references. I am trying it but it is not providing any correct references for the text and repeated the same many times. It fails to provide authentic data and lack the ability of ‘cross-reference checking’ or ‘varification’ from the original sources or real time internet data. Texts genrated by it are not be reliable. What ChatGPT suggests to do for the same, make it totally useless as we and other researchers are already doing it mannualy for academic / research work from past many years. So, not be trusted and useless for academia. Even after taking paid subscription I am totally frustated. So not to use for any academic or research work, It may land anyone in trouble.