What the Bar Missed

The Bar That Had to Mean It ended on a correction: stone 245.6 had skipped wat-tests/ as “coverage/scaffolding, not the warded surface,” and the user struck the framing:

that’s like the most important thing to ward - the tests are our demos of how to use wat /well/

That sentence reopened arc 245 against its own close. This post is what the reopen found, and the bar’s blind spot turned out to be two layers deep: the surface it skipped, and underneath that, a tier running red in the dark.

The tier that ran dark (June 6)

The routine green gate was lib-only by design — the integration tier had been excluded since the arc-170 concurrency leaks made it unsafe to run blind. The exclusion was honest when it was made. What nobody re-checked is what it cost: while five arcs shipped (246 through 250), the default integration tier accumulated 33 failing binaries and 147 failing tests, invisible, because nothing routine ever ran them. Stone 245.7 built the leak-contained runner (scripts/integration-run.sh) and recorded the ground-truth map (INVENTORY-245.7-baseline.tsv): 1460 passing, 147 failing, 59 ignored. The user set the campaign’s terms in one line:

wipe the dungeon clean - diablo 2 style - the quest isn’t done until there’s no survivors

The method, fixed before the first strike and held through all five: ground (run the binary, read the corpses) → conferre the verdict — a real substrate gap gets FILLED; a stale pre-clojure-ification test gets MODERNIZED or DELETED, because greening a stale test re-enshrines a retired form — → strike → gates → commit → next room. The verdict step is the campaign’s spine. A red test is not a work order; it is a claim, and half the claims in a tier that ran dark through a dialect reckoning are claims about a language that no longer exists.

The five strikes

Room 1 — ordering is relational (f681d1d0, 46/46). The largest room was a recorded debt: arc 237’s design had banked a future-stone (DESIGN-STONE-237.8b.md:211) that nobody had opened. The runtime’s values_compare engine was alive but unreachable from check — and the conferre verdict on the 46 corpses was 40 FILL, 6 MODERNIZE. The fill: infer_ordering, minted as the sibling of infer_equality — ordering joins equality as the second relational intrinsic, the constraint flowing between the arguments (< : a:T, b:T), exactly the partition The Dialect Reckoning inscribed. The four ordering defclauses retired from wat/core.wat; cross-type comparison stays a check error; the NoMatchingClause class went extinct in the tier.

Room 2 — the matches? boundary (35dfc10d, 22/22). The resolver had never been taught that matches?’s pattern argument is DSL data, not code — when resolver strictness landed, the boundary was never drawn. Cleared by mirroring the quote-family precedent.

Room 3 — variadic defn, resurrected (f3d1fc9e, 16/16). Three layers down. The top: eleven test argspecs were corrupted — phantom rest-binder triples, modernized. Under that, the real gap, and the sharpest single find of the campaign: try_parse_fn_shape_def’s .ok()? had silently swallowed RestBinderNotSupported — so a top-level user variadic defn never registered at all. The 249.5 rest-binder work had reached macros and defclauses; the user-defn path never got it, and the swallow meant the absence produced no error anywhere. Threaded through registration, check, eval, and reflection.

Room 4 — cond’s :else (2d518d78, 10/10). The finer sibling of room 2: the arms of a cond are code, but the :else marker is not, and the resolver walked it as a call head.

The long tail (ce655a68, 55 tests across 27 binaries), classified rather than ground singly: a third wave of resolver strictness, error-shape drift modernized at live shapes, arc-242 nil-rot in embedded sources — plus the deepest corpse, taken separately (e148540d, 19/19): an LRU service driver that had been dying silently behind (Err _) discards — retired arity-suffix rot killing it at startup, the failure swallowed by the discard pattern, dark for weeks in the lowest level of the dependency tree.

And one catch that belongs on the record because it points the other way: at scoring, the long-tail executor’s blanket fn-opacity rule turned out to have reopened F5 — the macro-purity hole arc 249 had closed; HOF’d function bodies could execute at expand time again. Caught by the orchestrator’s own re-run, reverted to the one sound contextual exception, and fenced with two living witnesses (lib suite at 922). The executor’s green was never accepted at its word; the gun pointed back is what made the campaign’s speed safe.

The result: 1646 passed / 0 failed / 59 ignored across 187 binaries. From 147 failures to zero in five strikes — three real substrate gap-classes filled, zero stale tests greened. Then the endgame (ef585672): the leak-contained runner folded into green-gate.sh as check 3/3, so the tier that ran dark for weeks can never rot dark again. The fix for the wound and the fix for the class of wound, in the same campaign.

No survivors extends to future tests

A carry-out from the clear deserves its own paragraph, because it sharpened a doctrine. The splice stone’s first framing — “runtime param-binding gap” — was wrong, and the re-diagnosis (9d7f93e6) was a vindication: the failing pattern was textbook anaphoric capture, which the 249.5 hygiene work refuses by design; it had only ever passed under pre-hygiene name-only resolution. The failure was the hygiene holding. It was re-cast as a permanent witness (anaphoric_splice_capture_refused_by_hygiene), and the real gap underneath — walk_quasiquote’s Vector arm had never learned the ~@ splice the List arm got in 249.3a — was probed three layers deep and cut the same day.

Mid-stone, an interim #[ignore]’d “red contract for a future stone” went onto the tree — and the user caught it as a survivor-in-the-making. It was reverted, and the stone cut immediately instead. The dungeon-clear’s doctrine extends to tests written for future work: an ignored red test is a corpse you planted yourself, and it waits in the dark the same as the ones you found there.

The debt, paid

Same day. Then the reopen’s original cause: task #181, the test-surface ward. The test-kind guard ran its first full muster — cernere, intueri, probare, vocare, exigere, complectens, with circumspicere last on the perimeter — and the findings vindicated the corrective that opened the debt. The standout class: 15 latent bombs — live retired forms hiding inside #[ignore]’d proof files, invisible to every routine gate, set to detonate at startup the day arc 170 lifts the ignores — the inscription names the class: the dark corner inside the dark corner. All 15 defused (096237e2), the trickiest two verified by contained un-ignore — the panic moved from startup to the run phase, check clean, exactly as the fix predicted. wat/test.wat — the framework the entire corpus stands on — took its stamp, and the drifted stamps from the campaign’s own fixes were re-earned the same day (1d229c89, five fresh-eyes reviews, the fence’s seven-attempt adversarial ledger all held).

The corpus is warded as the demos it is — which is what the corrective said it was all along. The bar from arc 245 v1 was sound; what it missed was never the bar’s height but its perimeter, and the perimeter is now drawn where the user drew it: around everything a reader will imitate.

Likely Contributions to the Field

Conferre-before-strike for red test backlogs. A failing test is treated as a claim requiring a verdict — real gap (fill) versus stale assertion about a retired language form (modernize or delete) — before any fix is written. Greening a stale test re-enshrines the retired form; the verdict step is what makes clearing 147 failures safe in a language that changed underneath them. Zero stale tests were greened across the campaign.
The dark-tier failure mode, named and structurally closed. An honestly-excluded test tier (excluded for real leak reasons) accumulated 147 invisible failures across five shipped arcs. The fix pairs the wound with its class: clear the backlog, then fold the contained runner into the routine gate so exclusion-rot cannot recur silently. “Excluded for good reasons” is shown to need an expiry mechanism, not just a justification.
Silent-swallow archaeology. Two of the campaign’s deepest finds were failures consumed by error-handling idioms — a .ok()? that ate an unsupported-feature error so a whole registration path no-op’d, and a service driver dying behind (Err _) discards. Both were invisible precisely because the code handled the error; the pattern (grep the discards, run the corpses) generalizes to any codebase that converts failures to options.
Latent bombs in ignored tests as a defect class. Retired language forms living inside #[ignore]’d files pass every gate until the day the ignores lift — a time-delayed failure class the campaign defused by contained un-ignore verification, and a doctrine extension: no-survivors applies to tests written for future work, including your own.