Performance of GPT-4o on the Needle in a Haystack Benchmark

posted a few here

For your query - https://twitter.com/LouisKnightWebb/status/1790265899255017893

but also note

https://twitter.com/JoshPurtell/status/1790102029773246861

I find the latter quite curious. It echo’s my openai-eval PR around multistep math problems, such that it has to provide the answer and no other text.

1 Like