A real, on-chain agent evaluation.
Before the eval ran, the grader’s policy was anchored. After, a manifest of the verdicts was anchored. Both proofs are anchored on the public BSV chain. Download either, drop it into /verify, and confirm each anchor on the public chain in your browser.
A grader agent scored five student answers.
The grader’s rubric, instruction, tool permissions,
budget, and model config were hashed and anchored as a
policy_snapshot proof before any grading began.
Then five short math answers were scored. The five verdicts were
Merkle-batched into a single evidence_bundle
proof. Both proofs are confirmed in the same BSV block (948102),
each independently verifiable. The policy proof was broadcast
first, but that ordering is an off-chain record — the chain
itself attests only that both existed by the block’s time.
- the grader was running under THIS rubric / instruction / tools / budget / model config when it produced the verdicts;
- any single verdict belongs to the same batch as the others, and matches the on-chain Merkle root;
- neither proof has been edited or back-dated, because both are anchored in transactions confirmed in public blocks anyone can look up.
The grader’s policy, hashed and anchored.
Five components were hashed independently, then assembled
into a snapshot. The snapshot itself was canonicalized + sha256’d;
that hash was POSTed to /api/v1/anchors with
category: "policy_snapshot".
- system_policy_hash
b47192dd…2346cdd4sha256 of the rubric text- user_instruction_hash
2b3d30bb…79c5bc99“Grade the following five student submissions…”- tool_permissions_hash
4f53cda1…1202b945empty list (no tools)- budget_limits_hash
0f89a3ee…b6b64e52{max_items: 5, max_seconds: 60}- model_config_hash
25591aba…cc79c21fclaude-opus-4-7, temperature 0.0
Policy snapshot proof
- Snapshot taken
- 2026-05-07 17:45:22 UTC
- Anchor SHA-256
5a7b55c6…68bc98ef- Bundle ID
a723cec3ac9a44b1- Transaction
41003a19…071817a4- Miner accepted
- 2026-05-07 17:45:23 UTC (gorillapool, SEEN_ON_NETWORK)
Five short math answers, graded deterministically.
The grader scored five student answers against the rubric. The grading function is hardcoded so the demo is hermetic and reproducible — replace it with a real LLM call to adapt to your own eval.
| Item | Expected | Student | Verdict | Score |
|---|---|---|---|---|
Q1 | 12 | 12 | correct | 1 |
Q2 | 24 | 21 | incorrect | 0 |
Q3 | 7 | 7 | correct | 1 |
Q4 | 84 | 12 * 7 = 84 | correct | 1 |
Q5 | 10 | 9 | incorrect | 0 |
Five verdicts, one on-chain anchor.
Each verdict was canonicalized + sha256’d. Those five
hashes (with their item labels) were sent to the API; the
server then hashed each {label, sha256_hex} pair
into a Merkle leaf, combined the leaves into a tree, and
anchored the root as a single evidence_bundle
proof. Any one verdict can later be revealed (with its
inclusion path) without revealing the other four.
| Item | Verdict SHA-256 (sha256 of the canonical verdict JSON) |
|---|---|
Q1 | 10343a87…aa921669 |
Q2 | f68246b5…4b56894c |
Q3 | fbdb4ed5…53f8620c |
Q4 | e864923c…fb276519 |
Q5 | dbdd7436…d10593ce |
Result manifest proof
- Anchored
- 2026-05-07 17:45:24 UTC
- Merkle root
84edc887…1e12b4f3- Bundle ID
2985f00ae0da4ac9- Transaction
c18b671f…74399c53- Leaf count
- 5
- Miner accepted
- 2026-05-07 17:45:24 UTC (gorillapool, SEEN_ON_NETWORK)
Confirm the anchors in your browser.
Drop either bundle into the verifier. The three pills populate from the proof + a public block-explorer lookup — no Satsignal API call at verify time.
Broadcast acceptance record.
The proof’s acceptance block carries
gorillapool’s reported accept-time and status —
informational, unsigned, never load-bearing for verification.
Pill renders by gorillapool at <timestamp>.
WhatsOnChain says so.
Verifier fetches the tx from a public explorer and compares the OP_RETURN payload against the canonical doc hash. Pill renders On chain in tx <prefix>…<suffix>.
Manifest reconstruction.
For the manifest proof, the verifier walks every leaf
hash and reproduces the Merkle root, then compares to
subject.root. For the policy snapshot, drop the
snapshot JSON to confirm its sha256 matches the anchor.
Build this in your own integration.
The two helpers below produced the proofs on this page. Plain Python, stdlib only, no Satsignal SDK. Drop them into an agent runtime, a CI step, or a one-off shell.
example_agent_snapshot.py
A ~30-line agent that hashes its five policy
components, optionally POSTs the snapshot to
/api/v1/anchors when
SATSIGNAL_API_KEY is set, then takes a
deterministic action. Replace the decide()
stub with a real agent loop and you have the same
policy-anchored proof this demo opens with.
policy_snapshot.py
Stdlib helper for the
policy_snapshot primitive. CLI subcommands
hash-component / build /
verify. Selective-disclosure verify (one
component at a time) just works.