Fail-closed certification for LLM outputs (TruthCert/Burhan) + benchmark focus on “false-ship rate”

Hi all — I’m sharing Burhan/TruthCert, an open protocol + repo scaffold for “fail-closed” use of LLM outputs in high-stakes workflows (coding + evidence extraction/HTA).
Core idea: outputs only SHIP if they pass a published policy (multi-witness attempts + arbitration, scope lock, provenance, versioned validators). Otherwise they REJECT.
Primary metric: false-ship rate = shipped-but-wrong artifacts (the harm metric), alongside reject rate and cost per correct shipped.
Repo (spec + templates + benchmark scaffolding + Zenodo DOI):

DOI: Truthcert