Response accuracy and retrieval accuracy

joyasree78 · May 25, 2023, 7:12pm

Does anyone know which may be an acceptable response and retrieval accuracy from DAVINCI. For example if I ask it 100 questions and it responds 80 of them correctly. Is 80% an industry standard benchmark for these models. Same question for the retrieval part, if in 100 times, 90 times it is able to retrieve the correct context, is 90% an acceptable industry standard number

curt.kennedy · May 25, 2023, 7:21pm

I think it depends on the domain of knowledge your questions are coming from.

For general “well known” things, I would expect near 100% accuracy. Since it’s trained on the internet, books, etc. Any domain where “wisdom of crowds” is the correct answer.

For your own domain knowledge, specific facts, etc. Then it could be as bad as 0%.

For technical knowledge, like STEM (science, technology, engineering, math), it can also be really bad.

If it’s your own domain knowledge, it’s best to use embeddings to retrieve your context and let the LLM answer from your knowledge.

If it’s STEM related. You could try Chain of Thought techniques, to get it to reason each step, before supplying the answer.

For “wisdom of crowds”, just use the raw LLM.

novaphil · May 25, 2023, 8:28pm

By “industry standard” do you main AI Industry? Because each industry where AI is applied will have a very different benchmark of what is acceptable (medical vs tech bloggers for example). Also remember these aren’t fact machines that are programmed to be 100% accurate, they are language models that are often accurate. And the prompts (and embeddings) dramatically differ the output, so saying “GPT (globally) is correct 80% of the time” does not translate to your specific implementation being correct 80% of the time.

Might help to take a step back and share what you are looking to benchmark or what your concerns are.

joyasree78 · May 25, 2023, 11:24pm

So, I was trying to see if there is a comparison against traditional conversational AI which I think are mostly in the range of 70-80%

Topic		Replies	Views
RAG Evolution with Reasoning Models Community api	7	180	April 21, 2025
Measuring accuracy and precision Prompting	1	2495	March 23, 2022
What's your process for automated testing for AI agents? Community api	12	6281	January 22, 2025
Whats best way to not get Hallucinated responses API	1	93	July 21, 2024
What is the best metrics to calculate how correctly the llm is giving answer API	0	847	May 15, 2023

Response accuracy and retrieval accuracy

Related topics