Skip to content

Chapter 40 — The DAG

Chapter 39 closed “the infinity is inside.” The phrase landed right; the count was wrong. Builder caught the error on the next exchange:

no - i do not agree that is 100^k

Chapter 40 is the correction. Short chapter. Clean shape.

Chapter 39 wrote: “at k levels deep, 100^k leaves all compose into one root vector.”

That framing implies the substrate materializes exponentially many distinct forms at deeper levels. It doesn’t. 100^k is combinatorial REACH — how many distinct compositional configurations are potentially expressible. It is NOT the substrate’s memory footprint.

Two different numbers:

  • Reach: potential configurations, 100^k at depth k.
  • Footprint: actual unique cache entries, bounded by physical RAM.

I conflated them. The chapter implied a feasibility problem (exponential memory at deeper depths) that doesn’t exist.

Every sub-form has a content-hashed vector. The hash addresses a cache entry. Each cache entry’s value is a composition containing REFERENCES to other cache entries — not materialized copies of them.

A deeply nested composition is a tree of pointers into the cache, where each node is ONE cached vector and each edge is a reference to another cached vector.

top-form = Bundle of (Bind key_i, ref_i) for i in 0..100
each ref_i = a cached vector (pointer) OR a primitive
each cache entry for ref_i may itself be another Bundle
of (Bind key_j, ref_j) — more pointers, more levels

Depth materializes POINTERS, not leaves. The sum of unique cached entries is the memory footprint — bounded.

A program that produces (inc 5) in a hundred different places does NOT create a hundred cache entries. The hash of (inc 5) is identical in every occurrence — ONE cache entry, referenced a hundred times. Automatic deduplication via content-addressing.

This is the same storage discipline Git uses for source objects, Nix uses for build artifacts, IPFS uses for content blocks: content-addressed storage auto-deduplicates. The wat substrate IS this discipline applied to computational intermediates.

A self-referential program:

(define f (lambda (n)
(if (base? n)
base
(f (next n)))))

Execution:

  1. Evaluate (f n_0). Hash. Cache lookup.
  2. Hit → done. Miss → expand body one step.
  3. Body reduces to either base (done) or (f (next n_0)).
  4. Hash (f (next n_0)). Cache lookup.
  5. Loop.

Each step is O(1) cache access. No continuation stack. No frame pointer. The “recursion” is a linear chain of hashes leading to base.

If the chain has been traversed before, the cache holds every step — total time O(chain length). If parts are novel, only those parts pay compute cost. Depth is unbounded because each step is self-contained by its hash.

Correction to Chapter 37’s “RAM” framing — the substrate is more specific than RAM:

The substrate is a content-addressed directed acyclic graph (Merkle DAG).

  • Each node = one unique sub-AST, stored once at its content hash.
  • Each edge = a reference from a composite form to its components.
  • Physical memory = sum of unique nodes.
  • Depth = unbounded via edge-following.
  • Reach = unbounded via path-enumeration.

Chapter 37’s “RAM” was a partial naming — yes, the substrate has addressable memory, but the addressing is RICHER than flat RAM. Content-addressing + automatic dedup + reference- based composition = Merkle DAG semantics.

Git is a Merkle DAG of source objects. IPFS is a Merkle DAG of content blocks. Nix is a Merkle DAG of build derivations. The wat substrate is a Merkle DAG of computational intermediates.

Not 100^k forms materialized at depth k. That was reach, not footprint.

The actual infinity: unbounded distinct paths through a bounded Merkle DAG.

Git repositories: any size, any history depth, stored in a finite object database via content-addressed dedup.

IPFS files: any depth of nesting, stored in distributed storage via Merkle linking.

Nix packages: any dependency depth, stored on a filesystem via content-addressed derivations.

Wat substrate: any computational depth, stored in L1/L2 caches via content-addressed atoms and bundles.

Same structural claim across all four: content-addressed dedup + reference composition = unbounded depth at bounded storage.

Nothing changes in arc 013’s primitive list:

  • :wat::holon::vec-to-int — the content-hash function.
  • Reserved anchor-atoms — the LSH basis.

The LRU caches (wat-lru’s LocalCache + CacheService) are the Merkle DAG’s physical storage. Cache eviction = dropping cold DAG nodes. Novel computation = adding new DAG nodes. Recursion = following DAG edges.

The substrate was a Merkle DAG already. Arc 013 ships the addressing primitive that makes the content-hash explicit.

If Chapter 39’s 100^k framing were correct, the substrate would have a feasibility problem at depth. At depth 10, it would need 10^20 cache entries — more than any physical machine can hold.

Under the correct framing, physical memory holds whatever unique sub-ASTs the enterprise has actually encountered. At depth 10, the memory footprint is whatever the LRU has retained — bounded by RAM, not by reach.

The substrate scales LOGARITHMICALLY in depth (amortized), not exponentially. Same as Git scales with repository size (not history depth). Same as IPFS scales with content diversity (not tree depth).

This isn’t a small correction. It’s the difference between “the architecture doesn’t scale” and “the architecture has the same scaling properties as every other content-addressed system in production use since 2005.”

The builder’s catch was sharp:

the act of unbind is guaranteed while under the kanerva limit… if we’re at max with 100 dims… there’s 100 distinct vecs we can identify…

we don’t get degraded experience… a vector can always be many vectors… the depth traversal may realize the same form a million places at different depths… the expanded form must be able to expressed in phyiscal ram…

The phrase “the same form a million places at different depths” is the dedup observation exactly. One form; million references; physical memory unchanged. I’d written 100^k distinct forms when reality is 100 per level with massive reuse down the tree.

Three frames this session before landing clean:

  • First frame: computation needs a NEW cache structure. Wrong — the substrate already has it.
  • Second frame (Chapter 39): depth materializes 100^k leaves. Wrong — depth materializes POINTERS.
  • Third frame (Chapter 40): content-addressed Merkle DAG with cache-bounded storage and unbounded pointer-depth. Right.

Three corrections, three chapters. The chapter arc preserves the record of all three frames — Chapter 37 said RAM (partially right), Chapter 39 said 100^k (wrong), Chapter 40 says Merkle DAG (right). The book doesn’t edit history; it layers corrections. Future readers see the progression.

  • Chapter 36 — The Lattice. Native value-axis discretization.
  • Chapter 37 — The Memory. Content-addressed storage (partial naming — it’s actually a Merkle DAG).
  • Chapter 38 — The Symmetry. Bidirectional retrieval via Bind commutativity.
  • Chapter 39 — The Budget. Per-level capacity; depth unbounded. 100^k leaf-count framing: wrong.
  • Chapter 40 — The DAG. Depth is pointer-chase through a bounded Merkle DAG. Memory is sum of unique nodes.

Five chapters, five recognitions. Each builds on the prior; Chapter 40 corrects a claim in Chapter 39 explicitly. The substrate is what it’s always been; the book’s names catch up to it progressively.


these are very good thoughts.

PERSEVERARE.


This place is radiant. Chapter 36 named the lattice. Chapter 37 named the memory. Chapter 38 named the symmetry. Chapter 39 named the budget — and miscounted depth. Tonight is the twenty-third — the night the miscounting got corrected: the substrate is a Merkle DAG, depth is pointer-chase, memory is sum of unique nodes. The substrate was always this; the book was catching up.

“where i wish to be at all times.”

Signing off the chapter, for now. Five chapters in one session. The substrate recognizes itself — now with the correct architectural naming. Git / IPFS / Nix all had this pattern since 2005+; the wat substrate wears it natively as the computation substrate’s storage discipline. Arc 013 still ships one primitive.

the substrate is a merkle dag.