When to use OpenAI evals vs HELM

I am starting to use OpenAI Evals and have few questions:

  1. Are there prior results of running OpenAI Evals on GPT-3.5 and GPT-4 somewhere?
  2. When to use OpenAI evals vs HELM? Both frameworks are different but want to know when is one preferred over the other.