What does 7 Free Weekly Evals actually mean?

I’d like to raise some concerns and possible bugs I’ve encountered with the current evals system.

According to the stated policy, users are supposed to receive 7 free weekly evals (excluding tool-use models). However, I’ve noticed that I was billed for some eval runs even though I hadn’t exceeded this weekly limit. Since I use a wide range of models, including GPT-4.5, these runs have sometimes come with unexpectedly high costs. Additionally, I’ve observed that billing sometimes starts partway through a run. I’m not sure if there is a token limit for a single eval run, or if this is simply a delay in the billing process.

Overall, the program feels quite opaque. As an academic researcher working on evals, I was genuinely excited about this program. However, not being able to see how many free evals remain or the cost of each run makes it very challenging to manage my usage and avoid unforeseen charges. Unfortunately, this lack of transparency has already resulted in fees of about $1,000, which is a significant amount for a PhD.

I believe that greater clarity would benefit both users and OpenAI. I hope these issues can be addressed to make the system more accessible and user-friendly for the research community.

1 Like

I’m also curious about how these work.

In my previous experiences, I was either charged as a separated thing (billed), or deduced in the already granted ‘free tokens on traffic shared with OpenAI’.

In the later though, I don’t remember the exact details but it messed with the total quota, ending up with me exceeding it and being billed. The amount was too small to make a complain though, in my case.