Provider Verified · $499/mo

Prove your hosted Llama is actually Llama.

Quant variants, custom kernels, speculative decoding, silent provider-side updates. Anything between your customers and the canonical model can shift benchmarks 2 to 15 points without changing the model name. Benchlist attests every hosted model on every supported provider, then surfaces the drift against the canonical first-party API.

Or start 30-day pilot → Browse providers →

Cancel anytime Unlimited multi-model attestations Drift alerts via Slack/webhook

What you get.

Unlimited attestations across your catalog.

Every model you serve, every supported benchmark, run on the canonical sample sets we already use. Ed25519 signed, browser-replayable, indexed at /providers/your-id.

Drift alerts straight to Slack.

We pin the canonical first-party API as your reference. When your hosted version drifts more than 2pp on any benchmark, we fire a webhook (HMAC-signed) into your incident channel inside 60s of detection.

Customer-facing badge widget.

Drop <benchlist-provider id="your-id"> into your site or model card. Live attestation count, last-checked timestamp, drift indicator. Updates daily without a redeploy.

Quant-variant transparency.

Run the same model at FP16, FP8, AWQ, or Q4 and we'll publish them all under the same provider page with the precision tag. Buyers see exactly which variant matches their latency/cost requirement.

Continuous heartbeat.

Daily attestation runs catch silent provider-side updates. If your kernel team ships an optimisation that boosts speed but drops accuracy, you find out before your customer does.

CSV + API exports for your stack.

Pipe attestations into Snowflake, Datadog, or your own observability. Public API at /v1/best?provider=your-id + signed webhooks + on-demand CSV at /providers/your-id.

Why this matters now.

FP8 isn't FP16.

Most production-grade providers run FP8 or AWQ-quantised variants by default for cost and speed. Vendors quote unquantised FP16 numbers in announcements. The customer assumes parity. Benchlist bridges the gap honestly.

Silent updates erode trust.

Frontier APIs swap underlying models behind the same name without a version bump. Customers who pinned the model name find their evals drifting and don't know why. A signed daily attestation from a third party is the only honest fix.

Procurement asks for receipts.

Enterprise buyers demand provenance. "Trust us" doesn't pass legal review at large institutions. A third-party signed attestation that proves your hosted variant matches the canonical reference closes deals.

Pricing.

Starter

$25

pay-as-you-go · per attestation

1 model, 1 benchmark per attestation
Ed25519 signed receipts
Hosted at /verify/<id>
Pay only when you run

View credit packs

Recommended

Provider Verified

$499

per month · cancel anytime

✓ Unlimited multi-model attestations
✓ Drift alerts to Slack/webhook
✓ Customer-facing badge widget
✓ /providers/your-id dedicated page
✓ Daily heartbeat runs
✓ Snowflake/Datadog CSV export
✓ HMAC-signed webhook callbacks
✓ Quant-variant tagging (FP16/FP8/AWQ/Q4)

Start 30-day pilot

Start a 30-day pilot.

Drop your provider's name and a contact email. We'll spin up your /providers/your-id page within 24h, attest your top 10 hosted models against the canonical references for free, and send you a Slack/webhook stub. If the drift data is useful, you convert to Provider Verified at $499/mo. If it isn't, we delete the data and move on.