Metaheuristic behavior platform

Hello,

I’d like to share my project for evaluating LLM-based APIs. In a few word you can see MHBP as UI for running prompts which is located (for example) in OpenAI’s ‘evals’ repository.

Current capabilities:

  • store definitions of APIs’ end-points
  • import and store prompts - supported sources of prompts are git repository, a direct url to json, local (via UI)
  • define evaluation with API end-point, list of prompts for evaluation
  • run an evaluation, analyze answers (right now only full equality is supported. i.e. class: evals.elsuite.basic.match:Match) and collect errors for further analysis
  • visual presenting of errors stored in db
  • OpenAI’s format of git repository

Roadmap

  • add triggers which will be fired depending on level of errors
  • bundle evaluations in collection
  • support more formats for importing prompts - Coqa, MedMCQA, others

I will appreciate any feedback about MHBP

Apache 2.0 License

Github - https://github.com/sergmain/mhbp
Wiki - https://github.com/sergmain/mhbp/wiki

Sergio Lissner

Very cool! Can you merge in some of the PRs from the openai/evals project?

It would be interesting if you provided a graph of how the APIs are performing over time. I see quite a lot of posts here and there about “did GPT3/4/etc get dumber?” I think you’d attract a lot of eyeballs on an opensource graph about that.

Kind of an intelligence dashboard.

Or at the very least I know I’d retweet it quite a lot :slight_smile: