I am starting to use OpenAI Evals and have few questions:
- Are there prior results of running OpenAI Evals on GPT-3.5 and GPT-4 somewhere?
- When to use OpenAI evals vs HELM? Both frameworks are different but want to know when is one preferred over the other.