Ground Truths- Curious About LLM Accuracy Research & Methods of Detecting/Avoiding Confabulation

colin.m.smith115 · April 22, 2023, 2:01am

Subject: Curious About LLM Accuracy & Methods of Detecting/Avoiding Confabulation - Also Humbly/Desperately Seeking Plugin Access Tips

Hey OpenAI Community,

I’m grateful to get involved in this group- thank you for all of the incredible insights, I’ve learned a lot in my first few hours since joining.

I’ve got a question about how OpenAI handles tagging parts of text in LLM inputs/outputs. How do you tell the difference between ground truth knowledge and what the LLM predicts as the next text output? I’m generally curious about how confabulation is avoided. I feel like external knowledge bases might be the key here, and maybe there’s a way to tag LLM output to show what’s trustworthy and what needs ground truth verification. Any thoughts or resources on this would be super helpful!

Also, I know I need to be patient, but I’m also determined to get access to the plugins feature in a legal and moral way. If you have any tips or tricks on how to get off the waiting list faster or even some good vibes to send my way, I’d really appreciate it.

Thanks for being kind and helpful (this is just a cool thing to be a part of, and I’m excited to keep on learning).

Cheers,

Colin

colin.m.smith115 · April 22, 2023, 2:30am

I wonder if a potential approach could incorporate weighting content in LLM output, something like the following: 1) Generating the sentence with components like A+B+C; 2) Identifying that component ‘C’ requires ground truth knowledge; 3) Replacing ‘C’ with an output that maintains the same structure but is derived from an external knowledge base, ensuring it closely aligns with verified ground truth information rather than a generic LLM output.

Recognizing when ground truth knowledge is necessary seems quite challenging, and I welcome any thoughts or insights on this approach.

Topic		Replies	Views
Evaluating the confidence levels of outputs generated by Large Language Models (GPT-4o) Community gpt-4	5	1874	June 8, 2025
Can a good prompt prevent 'hallucination'? Prompting chatgpt , api	6	4706	November 4, 2023
Scaling RAG chatbot system to millions of documents API gpt-4 , prompt-engineering , rag	18	7499	February 28, 2024
Building chatbot that needs to respond to user messages that are censored API	8	415	December 28, 2025
Is an LLM which both generates and critiques its output a contradictory practice? Prompting gpt-4	3	281	November 23, 2024

Ground Truths- Curious About LLM Accuracy Research & Methods of Detecting/Avoiding Confabulation

Related topics