Performance of GPT-4o on the Needle in a Haystack Benchmark

linkarzu · May 18, 2024, 11:04am

The ONLY reason I cancelled chatgpt-4 (turbo) and moved over to Claude (opus) was due to how easy it lost track of conversations and I had to keep reminding it of the initial text I had provided, and also reminding it of the small details.

I didn’t have extremeley long conversations with it, because I quickly realized it struggled with really long ones (that’s what she…), but they were not short either, they were moderate. And when it started messing up, I just used to clear all my conversations and start fresh. But I had to do that quite often.

So this benchmark is really important for me, unfortunately, the only one I found so far is the one shared above http://nian.llmonpy.ai and it compares is to Claude (sonnet) which makes no sense, as it should be comparing it to Claude (opus)

I guess I’ll have to wait until it’s released to the public for free and test for myself and I’ll just simply go with what’s best, claude or openai, I don’t care, I just want it to get the job done.

Topic		Replies	Views
GPT-4-Turbo models perform better the older GPT-4 models in LMSys benchmark API gpt-4 , api	14	6574	May 13, 2024
List of fresh gpt-4o benchmarks, please add Community gpt-4o	1	3450	May 16, 2024
GPT-4o vs. gpt-4-turbo-2024-04-09, gpt-4o loses API gpt-4	38	14879	June 11, 2024
Gpt4 comparison to anthropic Opus on benchmarks Community gpt-4 , api	9	41215	June 8, 2024
GPT-4-Turbo and GPT-4-O benchmarks released! They do well compared to the marketplace Community gpt-4	7	26347	May 17, 2024

Performance of GPT-4o on the Needle in a Haystack Benchmark

Related topics