Metaheuristic behavior platform

sergmain · May 1, 2023, 2:15am

Hello,

I’d like to share my project for evaluating LLM-based APIs. In a few word you can see MHBP as UI for running prompts which is located (for example) in OpenAI’s ‘evals’ repository.

Current capabilities:

store definitions of APIs’ end-points
import and store prompts - supported sources of prompts are git repository, a direct url to json, local (via UI)
define evaluation with API end-point, list of prompts for evaluation
run an evaluation, analyze answers (right now only full equality is supported. i.e. class: evals.elsuite.basic.match:Match) and collect errors for further analysis
visual presenting of errors stored in db
OpenAI’s format of git repository

Roadmap

add triggers which will be fired depending on level of errors
bundle evaluations in collection
support more formats for importing prompts - Coqa, MedMCQA, others

I will appreciate any feedback about MHBP

Apache 2.0 License

Github - https://github.com/sergmain/mhbp
Wiki - https://github.com/sergmain/mhbp/wiki

Sergio Lissner

qrdl · May 1, 2023, 10:01am

Very cool! Can you merge in some of the PRs from the openai/evals project?

It would be interesting if you provided a graph of how the APIs are performing over time. I see quite a lot of posts here and there about “did GPT3/4/etc get dumber?” I think you’d attract a lot of eyeballs on an opensource graph about that.

Kind of an intelligence dashboard.

Or at the very least I know I’d retweet it quite a lot

Topic		Replies	Views
Scenario-based interaction with ChatGPT Community chatgpt , api	0	1095	May 23, 2023
Managing prompts in production Prompting api , prompt , prompt-engineering	14	6330	September 2, 2025
Evals product in Playground - Announcement and feedback Feedback playground , evals	8	790	October 1, 2025
LLM and Prompt Evaluation Frameworks Prompting prompt-engineering , prompting , evals	13	13347	November 18, 2025
Prompt Wars: An AI-Powered Language Battleground Community project	24	6394	February 7, 2025

Metaheuristic behavior platform

Related topics