I’m curious if anyone has tested the new GPT-4o model using the “Needle in a Haystack” benchmark. If not, perhaps the developers could share their insights on how it might perform in comparison to the GPT-4 Turbo. Any feedback or data would be greatly appreciated!

Performance of GPT-4o on the Needle in a Haystack Benchmark

qrdl May 14, 2024, 7:57am 2

posted a few here

For your query - https://twitter.com/LouisKnightWebb/status/1790265899255017893

but also note

https://twitter.com/JoshPurtell/status/1790102029773246861

I find the latter quite curious. It echo’s my openai-eval PR around multistep math problems, such that it has to provide the answer and no other text.

1 Like

Topic		Replies	Views
GPT-4-Turbo models perform better the older GPT-4 models in LMSys benchmark API gpt-4 , api	14	6680	May 13, 2024
List of fresh gpt-4o benchmarks, please add Community gpt-4o	1	3503	May 16, 2024
GPT-4o vs. gpt-4-turbo-2024-04-09, gpt-4o loses API gpt-4	38	15058	June 11, 2024
Gpt4 comparison to anthropic Opus on benchmarks Community gpt-4 , api	9	42363	June 8, 2024
GPT-4-Turbo and GPT-4-O benchmarks released! They do well compared to the marketplace Community gpt-4	7	26995	May 17, 2024

Performance of GPT-4o on the Needle in a Haystack Benchmark

Related topics