Fix the Report, Not the System

cli ux durable state failure reporting operator experience safety boundaries Jun 30, 2026

When an operational command feels broken, the safe fix usually improves what it reports and how it behaves — not the validation or the state machine underneath it.

The problem

A command that operators run early in a governed workflow felt broken. The complaints were real, but they were about experience, not safety. The text-editor step that collected human input could be polluted by helper comment lines that the system itself had seeded. The on-screen prompt described the input format more narrowly than the parser actually accepted. And when a downstream worker exited successfully but produced invalid output, the failure was reported as a generic error that hid which step had actually failed.

It would have been easy to treat this as one bug, or to "fix" it by loosening something underneath.

What actually happened

The audit found that the machinery underneath was already sound. Validation was strict. Invalid output already failed closed instead of being accepted. The step counter already meant exactly what it claimed — completed, durable steps — and nothing more.

In other words, the system was not unsafe. It was badly narrated. Three separate reporting and UX defects had stacked up around an otherwise correct state machine, and together they made a safe command feel unreliable.

That distinction mattered because the command sat next to lifecycle authority and human decision points. A change here could easily have drifted into relaxing validation, moving who owns a decision, or redefining what a durable step means — none of which were the actual problem.

The lesson

The fix was structural but bounded. Helper guidance was removed from the editable buffer and moved to terminal output, with a short informational pause before the editor opens so the operator can read it. Raw file input was preserved byte-for-byte. A distinct "output validation failed" event was added, so that a worker which exits successfully but emits invalid output reports that precisely, while genuine process failures keep their generic error. The step counter was left alone.

The work cleared up four boundaries that had been blurred:

user-authored input versus instructional scaffolding
what the parser accepts versus what the prompt displays
a subprocess failing versus a subprocess succeeding but emitting invalid output
completed durable steps versus the step that is currently failing

None of those required weakening a check.

A real tradeoff surfaced along the way. In-buffer guidance is genuinely helpful to a human staring at an empty editor — but it lives inside content that later becomes durable, user-authored input. Seeding instructions and stripping them out later is fragile. The accepted answer was terminal-only guidance plus a pause, rather than editable helper text that something downstream has to remember to remove.

The broader principle

The first command a person touches in an automated system is a trust boundary, but it is also a narrator. Much of what feels "broken" about such a command is not the logic. It is what the command says it is doing, what it puts in front of the operator, and how precisely it names a failure.

When the perceived defect is reporting, fix the report. Do not pay for a narration problem by weakening validation, moving authority, or redefining durable state. And name failures specifically: "the process crashed," "the process succeeded but its output was invalid," and "this earlier step completed" are different facts, and collapsing them into one generic error is itself a defect.

How to apply it

Treat any editor buffer that becomes saved input as durable-risk territory. Prefer a blank or user-only buffer; put guidance in the terminal, not in the content.
Specify the terminal and editor experience explicitly: what prints, what waits, what launches, what aborts on empty input — and what must not quietly become a confirmation gate.
Distinguish failure kinds in operator output. Subprocess failure, invalid-output failure, write failure, and stage-advance failure are not the same event.
Preserve existing counters and state semantics unless changing them is the actual task. Don't redefine durable state to solve a reporting confusion; add separate metadata for the active step instead.
When the work is about reporting, write tests that pin the exact operator-facing lines.
Decide up front whether a live, hands-on smoke test is required or merely optional. Static tests can prove sequencing, clean buffers, and failure messages, but they cannot fully prove a live editor takeover. Make that call deliberately rather than leaving it ambiguous.

A small UX or reporting change can be quietly important when it sits near lifecycle authority. The win is to improve the operator's experience without touching the guarantees underneath.