Open-weight models, attested on your hardware.

Every score on this page came from a local Ollama daemon, Q4_K_M quants on a consumer GPU, running the same canonical sample sets as the cloud-API attestations. Each result is Ed25519-signed by the publisher's local attestor and replayable in your browser.

Local runs
148
Ed25519 signed
Models
6
open-weight, quantized
Benchmarks
13
deterministic graders
Perfect runs
24
100% scores

Local model leaderboard

Average score across each model's set of attested benchmark runs. Sorted by mean accuracy. n < 30 runs excluded from the per-(model, bench) matrix below.

#ModelRunsAvg scoreBenchesPerfect
1 deepseek-coder-v2-15.7b 20 67.2±2.9 13 7
2 llama3-8b-q40 22 65.5±2.8 13 7
3 mistral-7b-q4km 20 56.5±3.1 13 4
4 qwen3.6-27b-dense-q5km 22 54.7±2.9 13 6
5 qwen3.6-35b-q4km 32 19.2±1.9 13
6 glm-4.7-flash-30b-q4km 32 16.4±1.8 13

Top (model × benchmark) cells

Best attested run per (model, benchmark) pair. Click any cell to replay the signature in-browser.

Run your own local benchmark Pull a model in Ollama, run our open runner, get a signed receipt your community can re-verify. Local runner docs →