Governed Pauses Are Not Failures
Jul 05, 2026
A command that stops for human review can look like a failure if you read only the process exit code. The durable fix is to classify structured outcomes deliberately, preserve evidence on every path, and assign exactly one owner for failure output.
The problem
The symptom was familiar: an interactive CLI command reached a human decision gate, paused correctly, and showed the operator a pause message—but the outer command still reported failure. The operator saw guidance once, then saw failure text again. The subprocess exited with a non-zero code, and the outer process exited with failure too.
The first instinct is to special-case the pause exit code at the facade. That treats the bug as a numeric exception. In this case, the deeper failure was semantic misclassification. Lower layers already represented the pause correctly: a governed pause with a named outcome, no failure code, and a paused internal exit. The defect sat one layer higher, where the interactive facade treated every non-zero adapter exit as failure.
A second defect was coupled to the first. Failure rendering had two owners. One phase wrote failure text directly, and the command wrapper rendered the returned failure again. The item therefore required both classification correction and output-ownership correction.
What actually happened
The item was tightly scoped in code but not trivial in contract. Only two product files ultimately changed, yet the specification had to resolve:
- which paused outcomes count as successful governed pauses
- whether a release-boundary pause shares the same facade semantics as a human-decision pause
- which layer owns failure output
- how structured evidence is retained on genuine failure paths
The structural fix was better than a shallow exit-code exception. The implementation introduced one classifier using outcome, failure code, and exit code, with an explicit allow-list for known governed pause outcomes. Unknown or inconsistent paused results still fail closed.
The implementation preserved architecture boundaries. It did not alter the controller, shared classifier, adapter contract, route posture, lifecycle taxonomy, or item-state model. Changing those layers would have widened a facade defect into a lifecycle redesign.
The test suite itself contained misleading assumptions. Existing tests treated the human-decision pause as failure and used exit-code examples inconsistent with the real adapter contract. The item corrected both production behaviour and tests that had encoded the defect.
The final result, within accepted scope:
- governed pauses return facade success and outer process exit 0
- genuine failures return exit 1
- structured post-gate data is preserved
- failure output has one owner
- unknown outcomes fail closed
- real CLI evidence confirmed the operator-visible path
Residual questions remained outside scope: explicit support for other pause types was intentionally fail-closed; item-state normalization and temporary artifact leakage were excluded; future governed pause types require deliberate addition to the allow-list rather than automatic acceptance.
The lesson
A governed pause cannot be classified from exit code alone.
For similar defects, begin by tracing the structured result from the authoritative lower layer through each facade and wrapper. Define a decision table covering normal success, governed pause, genuine failure, and unknown or inconsistent result.
Practical rules:
- Use an explicit allow-list for successful pause outcomes. Named pause outcomes succeed at the facade only when accompanied by the expected paused exit and no failure code. Broad category rules are unsafe for new lifecycle outcomes.
- Assign one render owner per output class. Failure rendering for the CLI path should belong to the command wrapper. Successful pause guidance may remain owned by the phase—but not both layers for the same failure.
- Preserve structured evidence. The interactive facade should keep the adapter's structured result rather than flattening it into a message. Genuine failure paths must still retain structured post-gate evidence.
- Name three different exit concepts separately. Internal pause exit code, facade result, and outer process exit are distinct contracts. Prompts, tests, artifacts, and terminal output should name them consistently.
- Audit tests against the real adapter contract. Existing tests around lifecycle interfaces may encode historical bugs. Check them against authoritative lower-layer behaviour before trusting them as regression proof.
- Include a real CLI smoke test. Layered unit tests are necessary but not sufficient for operator-visible CLI behaviour. One real command run should confirm pause rendering, absence of false failure, and outer exit 0.
Adjacent lifecycle features—such as other pending pause types—should stay out of scope unless proven inseparable.
The broader principle
Lifecycle-facing CLI defects can be small in code but contract-heavy. The correct fix is rarely a special-case for one exit code. It is deliberate outcome classification, preserved evidence, single render ownership, and fail-closed handling for unknown states.
When automation depends on CLI semantics, separate subprocess completion, semantic validity, durable persistence, and lifecycle authorization. A controlled human-gate pause is a successful lifecycle outcome, not an unknown delivery failure. Facade-level result handling must reflect that distinction.
Audits add value when they trace behaviour through all layers before selecting the layer to change. A baseline audit that proves lower layers are already correct prevents an unnecessary controller or lifecycle rewrite and isolates the defect to the facade or wrapper.
Specification review should force an outcome-to-facade decision table before implementation. Verification should distinguish adapter exit code from outer command exit code explicitly, with both focused seam tests and one real operator-path smoke.
How to apply it
Before changing a lifecycle-sensitive CLI interface, answer six questions:
- What is the authoritative outcome field?
- What is the internal exit code?
- What is the failure code?
- What is the outer process exit?
- Who owns render output for each class?
- What structured evidence must be preserved on success and failure?
Without those answers, a facade that reads only exit codes will misclassify governed pauses as failures.
Checklist for lifecycle-result handling:
- Trace structured results from the authoritative layer through every facade and wrapper
- Define a decision table: success, governed pause, genuine failure, unknown
- Use an explicit allow-list for pause outcomes that count as facade success
- State one owner for stdout guidance and one owner for stderr failure output
- Preserve structured evidence in returned result types on both success and failure paths
- Correct tests that encode the old defect rather than adding new tests around them
- Require one real CLI smoke at acceptance for operator-visible CLI changes
When a command pauses for human review and the operator sees failure anyway, stop patching exit codes. Find where structured outcomes are flattened—and classify them deliberately before anything upstream treats the pause as an error.