Human Verdict Labels Need Durable Tokens

durable tokens fail closed normalization prompt contracts review gates verdict contracts Jun 16, 2026

Some workflow failures look like parser bugs. The system sees a verdict, fails to route correctly, and the obvious suspicion is that the parser needs to be smarter.

Often the deeper problem is not parser capability. It is a loose contract between the text a human or model produces and the durable value the workflow stores.

When a verdict can move work forward, pause it, or send it back for rework, the system needs both surfaces to be explicit: the human-readable label and the machine token that carries the verdict into state.

Problem

A review gate may produce a label that is clear to a person: accepted, rejected, blocked, accepted with risks.

Automation cannot safely rely on general clarity. It needs an exact value it can parse, normalize, store, route, and test.

The failure mode appears when a producer prompt says, in effect, "return a verdict," but does not define the exact label, line format, accepted aliases, durable token, and fail-closed behavior. A downstream parser may already know how to handle the right label, but the upstream artifact is still free to drift.

That is how a correct parser can sit behind an unreliable gate.

Story / example

Consider a review workflow with a risk-bearing approval outcome.

For humans, the label might be:

ACCEPT WITH RISKS

For the workflow, the durable token might be:

accepted_with_risks

Those two values are related, but they are not interchangeable.

The human label belongs in the review artifact because people need to read the verdict. The durable token belongs in state because automation needs to route from it. The risk-bearing meaning must survive the conversion. It should not collapse into a clean accept, and it should not remain as ambiguous prose that later stages have to guess at.

The safer contract names the whole chain:

  1. The producer prompt defines the exact verdict line and allowed labels.
  2. The generated artifact emits one of those labels.
  3. The parser normalizes the label into a canonical machine token.
  4. The controller routes from the durable token.
  5. Tests prove the risk-bearing path reaches the intended boundary.

If any link is vague, the gate becomes fragile even if the code path is mostly correct.

Lesson

Human-readable labels and durable machine tokens solve different problems.

The label makes the artifact understandable. The token makes the workflow enforceable.

Both should be first-class parts of the contract. A prompt that produces verdicts should include exact examples of valid artifact text, the expected normalized token, accepted aliases if any, rejected paraphrases, and the fail-closed behavior for missing or ambiguous output.

The important rule is to preserve semantics during normalization. A verdict with risk attached is not the same as clean approval. A blocked verdict is not the same as an unknown verdict. A human phrase that looks close is not a state transition until it has been normalized through the contract.

Broader implication

This pattern applies anywhere a generated artifact feeds a governed workflow:

| Contract surface | Purpose | Common failure | | --- | --- | --- | | Producer prompt | tells the author what to emit | underspecified label or format | | Artifact text | gives humans a readable verdict | prose drifts from parseable form | | Parser or classifier | converts text into a token | aliases are unclear or too broad | | Durable state | records the enforceable result | risk-bearing states get flattened | | Routing logic | chooses the next legal action | untested token paths fall through |

The fix is not always more parser flexibility. More flexibility can make the system accept ambiguous text with false confidence.

The better fix is a tighter contract: exact labels, exact durable tokens, narrow normalization, and tests at the layer where the failure matters.

Closing

Prompt wording can be runtime infrastructure when automation depends on its output.

If a verdict controls the workflow, do not leave the relationship between human label and durable token implicit. Preserve the label for people. Preserve the token for the machine. Test the path between them.

Human verdict labels need durable tokens.

Related concepts

  • Audit Verdicts Are Runtime State
  • Prompt Guardrails Need Artifact Contracts
  • Generated Output Is an Interface Contract
  • Continue After Durable State Changes
  • Fail-Closed Workflow Design