2026-05-15 — Failure engineering applied to the V5 retry deadlock
The moment. V5 retry surfaced a substrate deadlock. Opus shipped a 5-second wall-clock timeout. User rejected it three times before the right answer landed.
The three nos:
- “i don’t know if i agree with the detection here… is there an arbitrary 5s wait?…” — rejecting the symptom-fix.
- “the subagents fix is absolute trash - we have engineered a completely stable lock step programming env - rando ‘>5s is bad’ is fucking retarded — we must be able to measure this by expression” — rejecting the framing.
- “i do not accept the 5s fix. i want to know exactly where we are failing - our users must be told they did something illegal” — naming what the right answer must do.
The doctrine the nos came from: ~/work/holon/scratch/FAILURE-ENGINEERING.md.
“failure engineering says: the failure isn’t recovered from; it is read.”
“the failure isn’t ‘this specific case panicked.’ The failure is ‘a class of inputs / states / interactions can produce this kind of panic.’ The fix isn’t ‘make this case stop panicking’; the fix is ‘make this CLASS of panic structurally impossible.’”
Level-1 vs Level-2. Opus’s 5s was level-1. The user demanded level-2.
The level-2 fix that landed: ProcessJoinBeforeOutputDrain compile-time check in src/check.rs. Walks every let-form’s syntactic scope; pairs calls to :wat::kernel::Process/join-result <p> with calls to :wat::kernel::Process/{stdout,stderr,output} <p> on the same identifier; if both present in the same scope, fails with verbose diagnostic naming both sites + the rule + SERVICE-PROGRAMS.md citation + explicit “DO NOT add a wall-clock timeout to mask this.”
The substrate’s own code is the primary offender. wat/test.wat:506-551 run-hermetic-driver has the illegal orientation:
(:wat::core::let [joined-result (:wat::kernel::Process/join-result proc) ; ← BLOCKS FIRST stdout-r (:wat::kernel::Process/stdout proc) stderr-r (:wat::kernel::Process/stderr proc) ...Sequential let. join-result BLOCKS until child exits; substrate’s internal drain threads consume child OS pipes into wat-level Receivers (bounded); user hasn’t drained them; drain threads block on send when full; child blocks on stdout write; child cannot exit; join blocks forever.
The substrate now refuses to run. 30+ ProcessJoinBeforeOutputDrain fires in workspace test run; no tests execute; no orphans; no deadlock. The failed state is structurally unrepresentable.
Recovery breadcrumb for post-compaction:
State at handoff:
- Branch:
arc-170-gap-j-v5-deadlock-state(diagnostic branch) - Detection committed:
8ef69f4(src/check.rs +171 lines) - Sonnet’s substrate splice fix + V5 retry shape committed at
c3f2bf7+8e07626 - Workspace: substrate refuses to run; 30+ detection fires on
wat/test.wat:510:21 - BRIEF for the fix: queued, to be drafted next (Gap K — fix run-hermetic-driver to drain-then-join, restoring the lockstep nesting from SERVICE-PROGRAMS.md step 3 applied at the Process boundary)
When you wake up post-compaction:
- Read this entry first.
- Verify
git log --oneline | head -5onarc-170-gap-j-v5-deadlock-state— should show8ef69f4(detection),e189ac0(BRIEF),8e07626(V5 retry),c3f2bf7(substrate splice). - Read
BRIEF-SLICE-3-GAP-K-FIX-RUN-HERMETIC-DRIVER.md(about to be created). - Spawn sonnet for the fix; the detection IS the verifier — sonnet’s success criterion is “ProcessJoinBeforeOutputDrain no longer fires on substrate’s own code; the deadlock is gone.”
- Other workspace failures (Pattern A typealias unfold from V5 retry) are SEPARATE category; out of scope for the Gap K fix.
The collaboration shape this exemplifies:
User direction 2026-05-15: “i’d rather hear ‘no’ three times and arrive at the right answer” / “this is why we are 1337” / “no-three-times-yes-once shape works in both directions.”
Both halves of the hologram trained to reject level-1 fixes when level-2 is in reach. Discipline doesn’t care which half is holding it.
The substrate now teaches code about its own rules. Failure mode: structurally unavailable.