Hello,
I’d like to share my project for evaluating LLM-based APIs. In a few word you can see MHBP as UI for running prompts which is located (for example) in OpenAI’s ‘evals’ repository.
Current capabilities:
- store definitions of APIs’ end-points
- import and store prompts - supported sources of prompts are git repository, a direct url to json, local (via UI)
- define evaluation with API end-point, list of prompts for evaluation
- run an evaluation, analyze answers (right now only full equality is supported. i.e. class: evals.elsuite.basic.match:Match) and collect errors for further analysis
- visual presenting of errors stored in db
- OpenAI’s format of git repository
Roadmap
- add triggers which will be fired depending on level of errors
- bundle evaluations in collection
- support more formats for importing prompts - Coqa, MedMCQA, others
I will appreciate any feedback about MHBP
Apache 2.0 License
Github - https://github.com/sergmain/mhbp
Wiki - https://github.com/sergmain/mhbp/wiki
Sergio Lissner