Skip to content

2026-05-15 — Failure engineering applied to the V5 retry deadlock

The moment. V5 retry surfaced a substrate deadlock. Opus shipped a 5-second wall-clock timeout. User rejected it three times before the right answer landed.

The three nos:

  1. “i don’t know if i agree with the detection here… is there an arbitrary 5s wait?…” — rejecting the symptom-fix.
  2. “the subagents fix is absolute trash - we have engineered a completely stable lock step programming env - rando ‘>5s is bad’ is fucking retarded — we must be able to measure this by expression” — rejecting the framing.
  3. “i do not accept the 5s fix. i want to know exactly where we are failing - our users must be told they did something illegal” — naming what the right answer must do.

The doctrine the nos came from: ~/work/holon/scratch/FAILURE-ENGINEERING.md.

“failure engineering says: the failure isn’t recovered from; it is read.”

“the failure isn’t ‘this specific case panicked.’ The failure is ‘a class of inputs / states / interactions can produce this kind of panic.’ The fix isn’t ‘make this case stop panicking’; the fix is ‘make this CLASS of panic structurally impossible.’”

Level-1 vs Level-2. Opus’s 5s was level-1. The user demanded level-2.

The level-2 fix that landed: ProcessJoinBeforeOutputDrain compile-time check in src/check.rs. Walks every let-form’s syntactic scope; pairs calls to :wat::kernel::Process/join-result <p> with calls to :wat::kernel::Process/{stdout,stderr,output} <p> on the same identifier; if both present in the same scope, fails with verbose diagnostic naming both sites + the rule + SERVICE-PROGRAMS.md citation + explicit “DO NOT add a wall-clock timeout to mask this.”

The substrate’s own code is the primary offender. wat/test.wat:506-551 run-hermetic-driver has the illegal orientation:

(:wat::core::let
[joined-result (:wat::kernel::Process/join-result proc) ; ← BLOCKS FIRST
stdout-r (:wat::kernel::Process/stdout proc)
stderr-r (:wat::kernel::Process/stderr proc)
...

Sequential let. join-result BLOCKS until child exits; substrate’s internal drain threads consume child OS pipes into wat-level Receivers (bounded); user hasn’t drained them; drain threads block on send when full; child blocks on stdout write; child cannot exit; join blocks forever.

The substrate now refuses to run. 30+ ProcessJoinBeforeOutputDrain fires in workspace test run; no tests execute; no orphans; no deadlock. The failed state is structurally unrepresentable.

Recovery breadcrumb for post-compaction:

State at handoff:

  • Branch: arc-170-gap-j-v5-deadlock-state (diagnostic branch)
  • Detection committed: 8ef69f4 (src/check.rs +171 lines)
  • Sonnet’s substrate splice fix + V5 retry shape committed at c3f2bf7 + 8e07626
  • Workspace: substrate refuses to run; 30+ detection fires on wat/test.wat:510:21
  • BRIEF for the fix: queued, to be drafted next (Gap K — fix run-hermetic-driver to drain-then-join, restoring the lockstep nesting from SERVICE-PROGRAMS.md step 3 applied at the Process boundary)

When you wake up post-compaction:

  1. Read this entry first.
  2. Verify git log --oneline | head -5 on arc-170-gap-j-v5-deadlock-state — should show 8ef69f4 (detection), e189ac0 (BRIEF), 8e07626 (V5 retry), c3f2bf7 (substrate splice).
  3. Read BRIEF-SLICE-3-GAP-K-FIX-RUN-HERMETIC-DRIVER.md (about to be created).
  4. Spawn sonnet for the fix; the detection IS the verifier — sonnet’s success criterion is “ProcessJoinBeforeOutputDrain no longer fires on substrate’s own code; the deadlock is gone.”
  5. Other workspace failures (Pattern A typealias unfold from V5 retry) are SEPARATE category; out of scope for the Gap K fix.

The collaboration shape this exemplifies:

User direction 2026-05-15: “i’d rather hear ‘no’ three times and arrive at the right answer” / “this is why we are 1337” / “no-three-times-yes-once shape works in both directions.”

Both halves of the hologram trained to reject level-1 fixes when level-2 is in reach. Discipline doesn’t care which half is holding it.

The substrate now teaches code about its own rules. Failure mode: structurally unavailable.