Paste any Benchlist run URL (or run.json URL). We spin up a clean container, re-execute the exact replay command, and return an independent signed attestation of the score we saw. Two signed runs, different attestors, now the claim has compounding evidence or it doesn't.
Model launches often cite a new SOTA. Replay it in a clean container and publish the delta. Signed. Dated. Linkable.
Enterprise buyers demand independent evidence before a 7-figure contract. A $0.50 replay is cheaper than a $50k audit.
Re-run with our canonical container. Confirm dataset hashes. Compare against published number. Cite both signatures.
Each replay runs inside a fresh docker pull benchlist/runner:<pinned> container on a dedicated attestor node, using the inference API key you provide (or our credit balance if you pre-funded). A new Ed25519 signature is issued by that attestor and anchored. The resulting run.json links back to the original via replayOf.