Prove your hosted Llama is actually Llama.
Quant variants, custom kernels, speculative decoding, silent provider-side updates. Anything between your customers and the canonical model can shift benchmarks 2 to 15 points without changing the model name. Benchlist attests every hosted model on every supported provider, then surfaces the drift against the canonical first-party API.
What you get.
Every model you serve, every supported benchmark, run on the canonical sample sets we already use. Ed25519 signed, browser-replayable, indexed at /providers/your-id.
We pin the canonical first-party API as your reference. When your hosted version drifts more than 2pp on any benchmark, we fire a webhook (HMAC-signed) into your incident channel inside 60s of detection.
Drop <benchlist-provider id="your-id"> into your site or model card. Live attestation count, last-checked timestamp, drift indicator. Updates daily without a redeploy.
Run the same model at FP16, FP8, AWQ, or Q4 and we'll publish them all under the same provider page with the precision tag. Buyers see exactly which variant matches their latency/cost requirement.
Daily attestation runs catch silent provider-side updates. If your kernel team ships an optimisation that boosts speed but drops accuracy, you find out before your customer does.
Pipe attestations into Snowflake, Datadog, or your own observability. Public API at /v1/best?provider=your-id + signed webhooks + on-demand CSV at /providers/your-id.
Why this matters now.
Most production-grade providers run FP8 or AWQ-quantised variants by default for cost and speed. Vendors quote unquantised FP16 numbers in announcements. The customer assumes parity. Benchlist bridges the gap honestly.
Frontier APIs swap underlying models behind the same name without a version bump. Customers who pinned the model name find their evals drifting and don't know why. A signed daily attestation from a third party is the only honest fix.
Enterprise buyers demand provenance. "Trust us" doesn't pass legal review at large institutions. A third-party signed attestation that proves your hosted variant matches the canonical reference closes deals.
Pricing.
- 1 model, 1 benchmark per attestation
- Ed25519 signed receipts
- Hosted at
/verify/<id> - Pay only when you run
- ✓ Unlimited multi-model attestations
- ✓ Drift alerts to Slack/webhook
- ✓ Customer-facing badge widget
- ✓
/providers/your-iddedicated page - ✓ Daily heartbeat runs
- ✓ Snowflake/Datadog CSV export
- ✓ HMAC-signed webhook callbacks
- ✓ Quant-variant tagging (FP16/FP8/AWQ/Q4)
Start a 30-day pilot.
Drop your provider's name and a contact email. We'll spin up your /providers/your-id page within 24h, attest your top 10 hosted models against the canonical references for free, and send you a Slack/webhook stub. If the drift data is useful, you convert to Provider Verified at $499/mo. If it isn't, we delete the data and move on.