A real, on-chain agent evaluation.
Before the eval ran, the grader’s policy was anchored. After, a manifest of the verdicts was anchored. Both receipts are co-attested by Bitcoin SV miners. Download either, drop it into /verify, and confirm the chain order in your browser.
A grader agent scored five student answers.
The grader’s rubric, instruction, tool permissions,
budget, and model config were hashed and anchored as a
policy_snapshot receipt before any grading began.
Then five short math answers were scored. The five verdicts were
Merkle-batched into a single evidence_bundle
receipt. The chain timestamps are one second apart — the
policy provably predates the verdicts.
- the grader was running under THIS rubric / instruction / tools / budget / model config when it produced the verdicts;
- any single verdict belongs to the same batch as the others, and matches the on-chain Merkle root;
- neither receipt has been edited or back-dated, because both are signed by an independent miner and confirmed in a block.
The grader’s policy, hashed and anchored.
Five components were hashed independently, then assembled
into a snapshot. The snapshot itself was canonicalized + sha256’d;
that hash was POSTed to /api/v1/anchors with
category: "policy_snapshot".
- system_policy_hash
b47192dd…2346cdd4sha256 of the rubric text- user_instruction_hash
2b3d30bb…79c5bc99“Grade the following five student submissions…”- tool_permissions_hash
4f53cda1…1202b945empty list (no tools)- budget_limits_hash
0f89a3ee…b6b64e52{max_items: 5, max_seconds: 60}- model_config_hash
25591aba…cc79c21fclaude-opus-4-7, temperature 0.0
Policy snapshot receipt
- Snapshot taken
- 2026-05-07 17:45:22 UTC
- Anchor SHA-256
5a7b55c6…68bc98ef- Bundle ID
a723cec3ac9a44b1- Transaction
41003a19…071817a4- Miner accepted
- 2026-05-07 17:45:23 UTC (gorillapool, SEEN_ON_NETWORK)
Five short math answers, graded deterministically.
The grader scored five student answers against the rubric. The grading function is hardcoded so the demo is hermetic and reproducible — replace it with a real LLM call to adapt to your own eval.
| Item | Expected | Student | Verdict | Score |
|---|---|---|---|---|
Q1 | 12 | 12 | correct | 1 |
Q2 | 24 | 21 | incorrect | 0 |
Q3 | 7 | 7 | correct | 1 |
Q4 | 84 | 12 * 7 = 84 | correct | 1 |
Q5 | 10 | 9 | incorrect | 0 |
Five verdicts, one on-chain anchor.
Each verdict was canonicalized + sha256’d. Those five
hashes (with their item labels) were sent to the API; the
server then hashed each {label, sha256_hex} pair
into a Merkle leaf, combined the leaves into a tree, and
anchored the root as a single evidence_bundle
receipt. Any one verdict can later be revealed (with its
inclusion path) without revealing the other four.
| Item | Verdict SHA-256 (sha256 of the canonical verdict JSON) |
|---|---|
Q1 | 10343a87…aa921669 |
Q2 | f68246b5…4b56894c |
Q3 | fbdb4ed5…53f8620c |
Q4 | e864923c…fb276519 |
Q5 | dbdd7436…d10593ce |
Result manifest receipt
- Anchored
- 2026-05-07 17:45:24 UTC
- Merkle root
84edc887…1e12b4f3- Bundle ID
2985f00ae0da4ac9- Transaction
c18b671f…74399c53- Leaf count
- 5
- Miner accepted
- 2026-05-07 17:45:24 UTC (gorillapool, SEEN_ON_NETWORK)
Confirm the chain order in your browser.
Drop either bundle into the verifier. The three pills populate from the receipt + a public block-explorer lookup — no Satsignal API call at verify time.
Miner-signed acceptance.
The receipt’s acceptance block carries
gorillapool’s signed accept-time and status. Pill
renders by gorillapool at <timestamp>.
WhatsOnChain says so.
Verifier fetches the tx from a public explorer and compares the OP_RETURN payload against the canonical doc hash. Pill renders On chain in tx <prefix>…<suffix>.
Manifest reconstruction.
For the manifest receipt, the verifier walks every leaf
hash and reproduces the Merkle root, then compares to
subject.root. For the policy snapshot, drop the
snapshot JSON to confirm its sha256 matches the anchor.
Build this in your own integration.
The two helpers below produced the receipts on this page. Plain Python, stdlib only, no Satsignal SDK. Drop them into an agent runtime, a CI step, or a one-off shell.
example_agent_snapshot.py
A ~30-line agent that hashes its five policy
components, optionally POSTs the snapshot to
/api/v1/anchors when
SATSIGNAL_API_KEY is set, then takes a
deterministic action. Replace the decide()
stub with a real agent loop and you have the same
policy-anchored receipt this demo opens with.
policy_snapshot.py
Stdlib helper for the
policy_snapshot primitive. CLI subcommands
hash-component / build /
verify. Selective-disclosure verify (one
component at a time) just works.