I wanted to solicit ideas for a new RAG benchmark I’d like to create. OpenAI recently released SimpleQA which shows just how horrible ungrounded LLMs are at answering fact based questions. The leader, o1-preview, scores a 42.7%. Through proper grounding you can easily get these score up into the 90…

Ideas for a RAG benchmark

stevenic November 10, 2024, 3:57am 4

Ideally this benchmark is similar to SWE-Bench where any RAG provider is free to put their system to the test and compete for a spot on the leader board.

Topic		Replies	Views
RAG is failing when the number of documents increase API	35	17685	December 17, 2024
RAG Evolution with Reasoning Models Community api	10	218	April 30, 2025
Scaling RAG chatbot system to millions of documents API gpt-4 , prompt-engineering , rag	18	5952	February 28, 2024
How can RAG systems be improved for more complex queries API	3	3536	October 31, 2023
We've been building the open source ultimate RAG backend and are launching our V2 Community gpt-4 , plugin-development , api	9	2242	January 5, 2025

Ideas for a RAG benchmark

Related topics