Open-weight models, attested on your hardware.
Every score on this page came from a local Ollama daemon, Q4_K_M quants on a consumer GPU, running the same canonical sample sets as the cloud-API attestations. Each result is Ed25519-signed by the publisher's local attestor and replayable in your browser.
Local runs
148
Ed25519 signed
Models
6
open-weight, quantized
Benchmarks
13
deterministic graders
Perfect runs
24
100% scores
Local model leaderboard
Average score across each model's set of attested benchmark runs. Sorted by mean accuracy. n < 30 runs excluded from the per-(model, bench) matrix below.
| # | Model | Runs | Avg score | Benches | Perfect |
|---|---|---|---|---|---|
| 1 | deepseek-coder-v2-15.7b | 20 | 67.2±2.9 | 13 | 7 |
| 2 | llama3-8b-q40 | 22 | 65.5±2.8 | 13 | 7 |
| 3 | mistral-7b-q4km | 20 | 56.5±3.1 | 13 | 4 |
| 4 | qwen3.6-27b-dense-q5km | 22 | 54.7±2.9 | 13 | 6 |
| 5 | qwen3.6-35b-q4km | 32 | 19.2±1.9 | 13 | |
| 6 | glm-4.7-flash-30b-q4km | 32 | 16.4±1.8 | 13 |
Top (model × benchmark) cells
Best attested run per (model, benchmark) pair. Click any cell to replay the signature in-browser.
qwen3.6-27b-dense-q5km
openbookqa
86
llama3-8b-q40
ARC-Challenge
84
deepseek-coder-v2-15.7b
ARC-Challenge
74
qwen3.6-27b-dense-q5km
CommonsenseQA
72
mistral-7b-q4km
openbookqa
72
qwen3.6-27b-dense-q5km
WinoGrande
70
deepseek-coder-v2-15.7b
GSM8K
70
qwen3.6-27b-dense-q5km
ARC-Challenge
68
llama3-8b-q40
openbookqa
68
mistral-7b-q4km
WinoGrande
68
deepseek-coder-v2-15.7b
openbookqa
66
llama3-8b-q40
CommonsenseQA
62
llama3-8b-q40
GSM8K
62
mistral-7b-q4km
ARC-Challenge
62
mistral-7b-q4km
CommonsenseQA
56
deepseek-coder-v2-15.7b
CommonsenseQA
54
deepseek-coder-v2-15.7b
WinoGrande
48
qwen3.6-27b-dense-q5km
GSM8K
40
llama3-8b-q40
WinoGrande
36
deepseek-coder-v2-15.7b
MMLU-Pro
26
glm-4.7-flash-30b-q4km
WinoGrande
24
glm-4.7-flash-30b-q4km
CommonsenseQA
22
qwen3.6-27b-dense-q5km
MMLU-Pro
22
glm-4.7-flash-30b-q4km
ARC-Challenge
16
mistral-7b-q4km
MATH-500
16
qwen3.6-35b-q4km
CommonsenseQA
16
llama3-8b-q40
MMLU-Pro
14
glm-4.7-flash-30b-q4km
openbookqa
12
qwen3.6-27b-dense-q5km
MATH-500
12
llama3-8b-q40
MATH-500
12
Run your own local benchmark
Pull a model in Ollama, run our open runner, get a signed receipt your community can re-verify.
Local runner docs →