Ship your release with a receipt. $99.
Mistral, DeepSeek, Qwen, Liquid, Hermes, AI21, Cohere, Snowflake, Databricks. If you publish open-weight models, your launch numbers compete with frontier vendors who own their narrative. Benchlist gives you a third-party signed leaderboard receipt within 4 hours of paying — embeddable badge for your HuggingFace card, OG-tagged proof page that previews in every Slack thread your model lands in.
Get receipts for one model release. Includes n=50 runs on 8 canonical benchmarks (GSM8K, MMLU-Pro, GPQA, ARC-Challenge, HellaSwag, Winogrande, OpenBookQA, MATH-500), one Ed25519-signed proof page per benchmark, and an embeddable SVG badge for your model card. Delivered within 4 hours of payment or your money back.
Stripe Checkout · USD · receipt emailed
How it works.
/verify/<id>. Browser-replayable. Reproducible offline.<img src="https://benchlist.ai/badge/<model>.svg"> into your model card. Live attestation count, last-checked, links to all 8 receipts.Why it matters.
"Self-reported" is the stigma.
Benchmarks in vendor announcements are taken with a grain of salt. Buyers wait for third-party validation. A Benchlist receipt is the third-party validation, signed and replayable.
Mistral, DeepSeek, Qwen all face this.
Open-weight labs without a frontier-lab brand have to overcome scepticism on every release. A receipt levels the playing field. The score speaks for itself, signed.
Hugging Face downloads ≠ trust.
Download counts measure marketing, not capability. A signed attestation measures capability. Side by side on your model card, the receipt is the convincing piece.
See a real receipt.
This is what every benchmark in your launch certificate looks like once issued. Score, sample count, Wilson 95% CI, Ed25519 signature, attestor pubkey, and a 'Re-run for $0.50' button anyone can click. Hosted at /verify/<id>, OG-tagged so it previews cleanly in Slack and X.
Common questions.
What if my model has a custom chat template / system prompt?+
What if benchmarks are contaminated in my training set?+
Can I dispute a number that looks wrong?+
What does $99 actually cover?+
Need extra benchmarks (BigCodeBench, SWE-bench, MTEB)?+
Submit your release.
Drop in your model URL and a contact email. We'll respond within an hour with a Stripe link for the $99 launch certificate. Once paid, we run the 8-benchmark package within 4 hours and email you the receipt URLs.