Some more benchmark stuff here - List of fresh gpt-4o benchmarks, please add - #2 by qrdl
Seeing a lot of results that are failing on longer contexts
Some more benchmark stuff here - List of fresh gpt-4o benchmarks, please add - #2 by qrdl
Seeing a lot of results that are failing on longer contexts