API reference

REST + webhooks. One call to post.

Read endpoints are CORS-open and unauthenticated. Writes (posting a test, webhook config, crypto redemption) require a Bearer key. All payloads are JSON; all timestamps are ISO-8601 UTC; all hashes are SHA-256 prefixed.

Base URL + versioning

https://api.benchlist.ai/v1

Static JSON mirrors of registry data are served under https://benchlist.ai/api/*.json — no auth, edge-cached (s-maxage 300s). Use them for dashboards; use the versioned API under api.benchlist.ai/v1 for writes and anything real-time.

All versioned endpoints return application/json with UTF-8. Breaking changes land behind /v2; /v1 is supported for ≥ 18 months after /v2 launches.

Authentication

Reads need nothing. Writes require a Bearer key, which is free to obtain — email verification only. Your first attested test is on us; after that, $5 per test or a credit pack.

Authorization: Bearer bl_live_...

Getting a key:

  • Free signup — drop your email at /submit. Verify, receive the key, post one free attested test.
  • Top up with card/pricing → Stripe Checkout for a pack ($25 / $100 / $500 / $2,000) or a single test ($5).
  • Top up with crypto/pricing“Pay with ETH”. Send on Base, Ethereum, or Arbitrum; paste tx hash; credits arrive after on-chain verification.

Rotate with POST /v1/keys/rotate. Keys scope to a single publisher; you can issue sub-keys per environment.

Error codes

  • 400 invalid payload shape
  • 401 missing / bad Bearer key
  • 402 insufficient credits
  • 403 publisher suspended (dispute upheld)
  • 404 unknown service / benchmark / run id
  • 409 duplicate submission (by transcriptMerkleRoot)
  • 422 on-chain verification failed (bad tx, wrong amount, reverted)
  • 429 rate-limited; retry after Retry-After header
  • 5xx transient — automatic retry with idempotency key

All errors return { "error": "...", "detail": "...", "request_id": "req_..." }. Include the request_id when contacting support.

Rate limits

  • Reads: unmetered (edge-cached).
  • POST /v1/run: 60/min per publisher · 500/day.
  • Webhook registrations: 10/publisher.
  • Crypto verify: 30/min per IP.

Enterprise tier lifts all limits; email dev@remlabs.ai.

POST /v1/run

Submits a (service, model, benchmark) tuple for attestation. The runner queues the job on an available attestor, executes the benchmark, builds the Merkle tree, generates the proof, submits to Aligned Layer, and settles on Ethereum L1.

POST https://api.benchlist.ai/v1/run
Authorization: Bearer bl_live_...
Content-Type: application/json

{
  "service":   "anthropic-claude",   // service id (see /v1/services)
  "model":     "claude-opus-4-7",    // your model identifier
  "benchmark": "mbpp",               // benchmark id
  "runs":      3,                    // number of runs to average; 1-10
  "proof_system": "sp1",             // optional: sp1 | risc0 | halo2 | groth16 | plonk
  "attestor":  "auto",               // optional: attestor id or "auto"
  "webhook":   "https://me.com/hk",  // optional: override default webhook
  "idempotency_key": "2026-04-23-a"  // optional
}

Response — 202 Accepted

{
  "run_id":     "run-8f3a...",
  "status":     "queued",
  "est_seconds": 180,
  "charge":     { "credits": 1, "usd": 5.00 },
  "verify_url": "https://benchlist.ai/verify/run-8f3a..."
}

Poll GET /v1/run/:id or subscribe to run.verified webhook.

curl -X POST https://api.benchlist.ai/v1/run \
  -H "Authorization: Bearer $BENCHLIST_KEY" \
  -H "Content-Type: application/json" \
  -d '{"service":"anthropic-claude","model":"claude-opus-4-7","benchmark":"mbpp","runs":3}'
import os, httpx

r = httpx.post(
    "https://api.benchlist.ai/v1/run",
    headers={"Authorization": f"Bearer {os.environ['BENCHLIST_KEY']}"},
    json={"service": "anthropic-claude", "model": "claude-opus-4-7",
          "benchmark": "mbpp", "runs": 3},
    timeout=30.0,
)
print(r.json()["verify_url"])
const r = await fetch("https://api.benchlist.ai/v1/run", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.BENCHLIST_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    service: "anthropic-claude",
    model: "claude-opus-4-7",
    benchmark: "mbpp",
    runs: 3
  })
});
const { verify_url } = await r.json();
console.log(verify_url);
import "net/http"; import "bytes"

body := bytes.NewBufferString(`{"service":"anthropic-claude","model":"claude-opus-4-7","benchmark":"mbpp","runs":3}`)
req, _ := http.NewRequest("POST", "https://api.benchlist.ai/v1/run", body)
req.Header.Set("Authorization", "Bearer "+os.Getenv("BENCHLIST_KEY"))
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)

GET /v1/run/:id

Polls the current state of a submitted run. Status transitions: queued → running → committed → proving → submitted → verified (or failed).

GET https://api.benchlist.ai/v1/run/run-8f3a...

{
  "run_id":    "run-8f3a...",
  "status":    "verified",
  "score":     87.3,
  "verification": {
    "mode": "aligned",
    "alignedBatchId": "0x7b3c...",
    "alignedVerifierContract": "0xeF2A435e5EE44B2041100EF8cbC8ae035166606c",
    "onchainBlock": 22184921,
    "onchainTx": "0x1a2b...",
    "verifiedAt": "2026-04-23T18:04:12Z"
  },
  "transcriptMerkleRoot": "sha256:e7f9...",
  "datasetHash": "sha256:a1b3...",
  "methodologyHash": "sha256:c3d5..."
}

GET /v1/runs

Lists recent verified runs across the registry. Supports query params ?service=, ?benchmark=, ?publisher=, ?status=, ?since=ISO8601, ?limit=1..500.

curl "https://api.benchlist.ai/v1/runs?benchmark=longmemeval&status=verified&limit=10"

For a permanent, cache-friendly mirror of the whole corpus, use /api/runs.json.

GET /v1/services

GET
/v1/services?category=memory&q=mem
GET
/v1/services/:slug

Returns {services: [...]}. Each service has id, name, vendor, category, oneliner, homepage, publisher, openSource, license, tags. Static mirror: /api/services.json.

GET /v1/benchmarks

GET
/v1/benchmarks
GET
/v1/benchmarks/:slug

Each benchmark has id, name, category, metric (accuracy | score | latency-ms | tokens-per-second | wer), maxScore, datasetHash, methodologyHash, runnerRepo. Mirror: /api/benchmarks.json.

GET /v1/publishers

Static mirror: /api/publishers.json.

GET /v1/attestors

The registered attestor set with public keys, staked ETH, and reputation scores. Mirror: /api/attestors.json.

GET /v1/categories

The 16 service categories. Mirror: /api/categories.json.

POST /v1/submit (signup)

Free signup — drop an email, we mint an HMAC-signed API key, mail it to you, mirror a notification to ops. No card required. Your first attested test is on us.

POST https://benchlist.ai/api/v1/submit
Content-Type: application/json

{
  "kind":    "signup",
  "contact": "you@company.com"
}

// → 200 OK
// {
//   "ok": true,
//   "id": "sub_a1b2c3",
//   "issued": true,
//   "key_delivered_to": "you@company.com",
//   "forwarded": [{ "channel": "resend:user", "ok": true, "status": 200 }, ...],
//   "request_id": "req_..."
// }

The same endpoint handles list, run, waitlist, dispute, and contact kinds. For signup specifically, when RESEND_API_KEY is configured, the key is delivered by email and not included in the response body; when Resend is not configured, the response body includes key so dev environments still work.

POST /v1/keys

Two actions for lifecycle: verify (check a key is real + unexpired) and rotate (mint a new key for the same publisher; old key keeps working until a revocation list ships).

Verify

POST https://benchlist.ai/api/v1/keys
Content-Type: application/json

{ "action": "verify", "key": "bl_live_…" }

// → { "valid": true, "payload": { "email": "you@company.com", "iat": 1713000000000, "plan": "free_first_test", "nonce": "…" } }

Rotate

POST https://benchlist.ai/api/v1/keys
Authorization: Bearer bl_live_<current>
Content-Type: application/json

{ "action": "rotate" }

// → { "rotated": true, "key": "bl_live_…", "payload": { ... } }

POST /v1/checkout

Creates a Stripe Checkout session. Use this for the card flow; the response redirects to Stripe’s hosted page.

POST https://benchlist.ai/api/v1/checkout
Content-Type: application/json

{ "plan": "developer" }   // or credits_25 | credits_100 | credits_500 | credits_2000

// → { "id": "cs_...", "url": "https://checkout.stripe.com/c/pay/...", "plan": "..." }

POST /v1/crypto

Two-mode endpoint for the native-ETH crypto flow. Omit txHash to get the receiving address + the live ETH amount (rate pulled at request time); include it to verify a sent transaction on-chain. ERC-20 tokens are not accepted — native ETH only.

Init

POST https://benchlist.ai/api/v1/crypto
{ "plan": "test_1" }

// → {
//   "mode": "init",
//   "amountUsd": 5,
//   "amountEth": 0.001389,
//   "ethPriceUsd": 3599.87,
//   "credits": 1,
//   "asset": "ETH",
//   "receiver": "0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad",
//   "chains": [
//     { "chain":"base",     "chainId":8453,  "address":"0xb7d4…",  "explorer":"https://basescan.org/..." },
//     { "chain":"ethereum", "chainId":1,     "address":"0xb7d4…",  "explorer":"https://etherscan.io/..." },
//     { "chain":"arbitrum", "chainId":42161, "address":"0xb7d4…",  "explorer":"https://arbiscan.io/..." }
//   ]
// }

Verify

POST https://benchlist.ai/api/v1/crypto
{
  "plan":    "test_1",
  "chain":   "base",
  "txHash":  "0x123abc..."
}

// → {
//   "mode": "verify",
//   "verified": true,
//   "chain": "base",
//   "txHash": "0x...",
//   "sentEth": 0.001402,
//   "ethPriceUsd": 3599.87,
//   "blockNumber": 18234567,
//   "confirmations": 12,
//   "from": "0xsender...",
//   "to": "0xb7d4d49da62bc3af186de2ee78a59fd3002fdaad",
//   "plan": "test_1",
//   "credits": 1
// }

Verification logic (server-side): fetch eth_getTransactionByHash + eth_getTransactionReceipt → confirm tx.to == receiver and tx.value ≥ expectedEth × 0.98 (2% slippage tolerance for mempool drift) and receipt.status == 0x1. Returns 422 with a specific error string on mismatch. No contract-call parsing — native ETH only.

POST /v1/webhooks

POST https://api.benchlist.ai/v1/webhooks
Authorization: Bearer bl_live_...

{
  "url":    "https://your-app.com/hooks/benchlist",
  "secret": "whsec_...",              // 32+ bytes; used for HMAC
  "events": ["run.verified", "run.failed", "run.disputed"]
}

Event types

  • run.submitted — accepted into the queue
  • run.running — attestor picked it up
  • run.committed — Merkle root built
  • run.proving — ZK proof generation started
  • run.verified — Aligned batch confirmed on Ethereum L1
  • run.failed — runner / proof failed (refunded)
  • run.disputed — dispute filed by another party
  • run.dispute_upheld — dispute upheld; score annulled
  • credits.low — < 5 credits remaining

Signature verification

Every webhook POST carries Benchlist-Signature: t=<ts>,v1=<hmac>. Compute HMAC-SHA256 over ts + "." + raw_body using your secret; reject if mismatch or if ts is older than 5 minutes (replay protection).

import hmac, hashlib

def verify(body: bytes, header: str, secret: str) -> bool:
    parts = dict(p.split("=") for p in header.split(","))
    ts, sig = parts["t"], parts["v1"]
    expected = hmac.new(secret.encode(), f"{ts}.".encode() + body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(sig, expected)