Open-weight models, attested on your hardware.

Every score on this page came from a local Ollama daemon, Q4_K_M quants on a consumer GPU, running the same canonical sample sets as the cloud-API attestations. Each result is Ed25519-signed by the publisher's local attestor and replayable in your browser.

Local runs

148

Ed25519 signed

Models

open-weight, quantized

Benchmarks

deterministic graders

Perfect runs

100% scores

Local model leaderboard

Average score across each model's set of attested benchmark runs. Sorted by mean accuracy. n < 30 runs excluded from the per-(model, bench) matrix below.

#	Model	Runs	Avg score	Benches	Perfect
1	deepseek-coder-v2-15.7b	20	67.2±2.9	13	7
2	llama3-8b-q40	22	65.5±2.8	13	7
3	mistral-7b-q4km	20	56.5±3.1	13	4
4	qwen3.6-27b-dense-q5km	22	54.7±2.9	13	6
5	qwen3.6-35b-q4km	32	19.2±1.9	13
6	glm-4.7-flash-30b-q4km	32	16.4±1.8	13