Part II Understand It The Theory Behind Reliability
16 min read

Chapter 6 – Context Architecture (Why Slicing Matters)

Start with a failing tax case:

FAIL test_calculate_income_tax_high_earner_scenario
expected: 42_750
actual:   43_500

Do not hand the model the whole repo. Build one bounded slice:

One minimal slice manifest looks like this:

anchor:
  file: tests/test_tax_service.py
  symbol: test_calculate_income_tax_high_earner_scenario
map:
  - docs/tax_rules.md#progressive_brackets
terrain:
  - src/tax_service.py#calculate_income_tax
  - src/tax_service.py#round_currency
gates:
  - unit_test_tax_case
  - signature_unchanged

That packet is small enough to reason about and strict enough to grade. It is the practical slicing rule in one screen: Anchor -> Expand -> Prune.

Part I showed the loop. This chapter defines the context architecture that makes that loop deterministic.

Context windows keep expanding, but attention is finite. That limit is Physics. How you select, structure, and bound the window is architecture.

Slicing is not about fitting more tokens. It is about concentrating attention on contracts, evidence, and boundaries so generation does not wander into drift. When context grows faster than attention quality, outputs can look complete while quietly violating constraints. Slicing is the control surface that keeps that failure mode visible and testable.

Shaping the Optimization Terrain

This is the geometry view of slicing. You are not dumping the whole repository into the window. You reshape the terrain so valid paths are easier and drift is harder.

Intent defines the target basin. The slice, contracts, and validators warp the solution space so each loop iteration takes the shortest safe path toward the optima you actually want, not a nearby but wrong local maximum.

Intent preset: Central Optima
Interactive 3D terrain: peaks are higher-value, contract-satisfying outcomes; valleys are failure states and drift.

Even when models accept huge contexts, attention is not uniform (“lost in the middle”). Structure is compression for attention: it shrinks the surface where generation can invent structure or leak scope.

In practice, a well-kept Map carries more meaning per token than dumping full Terrain. Reliable systems do not just have context; they engineer it.

Maps also exist at more than one scale. At the smallest scale, a Map might be a schema, a route table, or a dependency inventory. At larger scales, it becomes the operating Map: architecture decisions, operating policies, incident learnings, onboarding guidance, brand voice, tone, and the constraints that tell the system what it is trying to preserve. Context Architecture decides which layer of that Map belongs in a given slice.

Chapter 2 gave the operational version of slicing in Prep. Here we add the fuller model: Context Graphs, branching-factor heuristics, slice failure modes, and the boundary between deterministic extraction and stochastic generation.

From One Slice to a Context Graph

That tax slice did not come from intuition. It came from a larger topology.

Slicing is the act of pulling one bounded neighborhood out of that larger graph. Anchor chooses the starting node. Expand follows only the edges required to make the task executable. Prune enforces the budget so the slice stays reviewable.

From Context Windows to Context Graphs

Context Architecture is the discipline of engineering how context is selected, structured, and bounded so a large language model (LLM), or any agent, can act deterministically.

The core artifact of Context Architecture is the Context Graph. It turns a repository into deterministic units plus typed relationships. Nodes can be files, functions, classes, symbols, documentation sections, API contracts, policy clauses, runbook steps, glossary entries, incident records, Immune System cases, or specific lines. Edges express relations such as “depends on,” “implements,” “is validated by,” “calls,” “references,” and “is documented by.”

In polyglot repositories, the graph also needs identity. A node is not just “a folder.” It is “a Rust crate built with Cargo” or “a Python package built with pyproject.” That identity is how the Driver Pattern selects the right mechanism without guessing.

Example: node identity (computed deterministically)

Identity is not something the model should infer heuristically. It is something your deterministic extractors compute by reading manifests (Cargo.toml, pyproject.toml) and attaching the result to nodes.

Instead of a monolithic blob, a Context Graph lets you slice the exact subset of information relevant to one task. The slice is produced by deterministic rules. It should carry three things: the evidence itself, why it matters to the task, and how it should be interpreted. A good slice can mix hard and soft memory surfaces: an API schema, a runbook clause, a policy rule, and a tone guide can all be relevant, but they imply different kinds of enforcement.

Deterministic scope routing (a self-referential example)

Slicing starts one step earlier than “pick files”: it starts by deciding which plane you are in (docs, code, infra, data).

Do that routing deterministically (path rules, manifest detection), not by asking a model to guess. The heuristics can be intentionally plain:

That is Context Architecture in practice: deterministic routing before any stochastic step runs.

Building the Context Graph (Deterministic Extraction)

You don’t need a perfect graph to start. You need a deterministic extractor that produces stable nodes and stable edges.

If you want implementation guidance (storage options, incremental rebuilds, and query shapes), see the Context Graph implementation guide in Appendix C.

At minimum, build it in two passes:

  1. Extract nodes: files, symbols, doc sections, schemas, Immune System cases.
  2. Derive edges: imports/calls, “documented by”, “validated by”, “exercises”.

You don’t need a fancy query language on day one. You need one reliable move:

Example: anchor → slice (conceptual)

The companion repo (github.com/kjwise/aoi_code) includes make graph and make slice targets that demonstrate this. build_context_graph.py is a simple extractor, and slice_context_graph.py consumes the graph to produce a bounded context packet.

How to Slice a Context Graph for Deterministic Action

The goal of slicing is precision: the minimum context that keeps determinism high and drift low.

Use a deterministic four-step slice:

  1. Anchor first: start from the failing case, target symbol, or interface being changed.
    • Good anchors are concrete and machine-addressable (file + symbol, failing test id, validator code).
    • If you cannot name the anchor precisely, you are still in diagnosis mode, not execution mode.
  2. Map first: include contracts, schemas, mission policy, and only docs/config that constrain correctness.
    • Prefer sources that define “must” conditions (schema, API contract, policy file) over explanatory prose.
    • Keep authority explicit: what is allowed to change, and what counts as done.
  3. Terrain second: include anchor implementation, one-hop dependencies, and the Judge surface that proves correctness.
    • Pull runtime behavior and test/assertion surfaces together so edits can be validated immediately.
    • Avoid pulling whole subsystems when a one-hop neighborhood is enough.
  4. Prune by budget: remove unrelated modules/history and enforce node/token/diff budgets.
    • Budget is part of correctness, not just performance.
    • If scope grows, make that an explicit mission change instead of silent drift.

If required dependencies still do not fit, split the Mission or chunk deterministically.

Minimal slice manifest (deterministic handoff)

Treat each slice as an explicit artifact, not an implicit prompt. A minimal manifest can be serialized and audited:

mission_id: tax-fix-2026-02-23
anchor:
  file: tests/test_tax_service.py
  symbol: test_calculate_income_tax_high_earner_scenario
map:
  - docs/tax_rules.md#progressive_brackets
terrain:
  - src/tax_service.py#calculate_income_tax
budgets:
  max_files_changed: 1
  max_lines_changed: 50
  max_attempts: 4
gates:
  - signature_unchanged
  - unit_test_tax_case

Two reasons this helps:

If a run fails, update the manifest deterministically (scope, budgets, gates) instead of growing ad-hoc prompt text.

Slice QA checklist (before you call the model)

Run a quick deterministic check:

If any of these fail, repair the slice before generation. Many “model quality” complaints in production loops are unresolved slice-quality defects.

Context priority stack (what to include first)

When you have a fixed context budget, prioritization matters more than cleverness. A practical ordering:

  1. Authority: system instructions + Mission Object (what is allowed, what “done” means).
  2. Latest deterministic findings: the last Judge output (structured failures, file/line, error codes).
  3. The anchor surface: the exact file/region/symbol you’re changing.
  4. Interfaces and contracts: signatures, schemas, route tables, inventories (skeleton-first).
  5. Immune System expectations: the one failing case (or the one contract-compat check) and the assertion surface that defines “correct.”
  6. Direct dependencies: one-hop imports/calls/types needed to make the anchor executable.
  7. Everything else: only if you still have budget, and only if you can justify the edge.

If you’re forced to choose between “more code” and “the contract,” choose the contract. The fastest way to create drift is to let the Effector guess what the contract was supposed to be.

Chunking patterns (when the slice still doesn’t fit)

Sometimes the minimal neighborhood still exceeds the budget (big files, big contracts, or a high fan-out module). When that happens, chunking is not “split the text.” It’s: choose a boundary that preserves meaning, then keep Physics in the loop.

Common chunking patterns and trade-offs:

  1. File-level chunking: each file is a chunk. Simple, but weak on cross-file reasoning unless you also include extracted interfaces.
  2. Dependency-aware chunking: anchor file + one-hop dependencies as separate chunks. Strong default for code, because boundaries follow imports/calls.
  3. Task-focused chunking: split by role, not by size: contract chunk, implementation chunk, failing-case chunk. This often outperforms naive file splits.
  4. Sliding-window chunking (large files): split one file into overlapping regions, but treat overlap as evidence, not permission to edit everywhere. Always pair this with allowed edit regions.

Recombining outputs should also be deterministic:

Operationally, treat chunks as review units: open a small PR per chunk (stack if you want). Merge sequentially so each diff stays inside the Review Budget and leaves a clean Ledger trace. Only squash at the end if your merge policy requires it.

Chunking is a sign your slice is doing real work. It is not a failure. It’s a reminder that context is a resource you budget, not an ocean you swim in.

Retrieval-Augmented Generation (RAG) as a slice hint, not a decision

Retrieval-augmented generation can help when you don’t know where the relevant context lives. But retrieval is not the same thing as slicing. Retrieval proposes candidates; slicing enforces boundaries.

A pragmatic approach in codebases is hybrid:

Guardrails:

If you don’t have retrieval infrastructure, don’t block on it. You can get far with deterministic anchors (failing cases, stack traces, schemas) and cheap lexical search.

Summarization is lossy compression (use it last)

Summarization can compress context, but it introduces new failure modes: it can drop constraints, blur interfaces, or smuggle in invented structure.

Rules of thumb:

Token budgeting: leave headroom

Token counting is approximate. Different models use different tokenizers, and your request needs space for the model’s response.

Practical rule: reserve ~20% headroom. If your maximum context is B tokens, aim to use at most ~0.8B for inputs, then let the output and tooling fit in the rest.

Repo-shaped pseudo-example: Fixing contract drift with a bounded slice

Suppose a CI Validator fails because your API contract drifted:

[validator] FAIL: openapi_drift
  path=/api/users GET parameter.limit.type
  expected=integer
  actual=string

Your repository might look like this:

repo/
  openapi.json                 # Map: the public contract surface
  src/users_handler.ts         # Terrain: runtime behavior
  tests/test_users_api.py      # Immune System cases
  clients/mobile/src/api.ts    # downstream consumer
  clients/mobile/src/types.ts  # downstream contract surface

Anchor: the failing Validator finding (openapi_drift) is your anchor node. It already tells you where correctness lives (the contract) and what broke (a specific field/type).

Map first: include only the contract surface (and any policy that constrains it).

Terrain second: include only the code and the Immune System cases that interpret that contract.

Prune: do not include clients/mobile/ wholesale, the entire OpenAPI file, or unrelated endpoints. If a file isn’t on the dependency chain of the anchor, it’s noise.

Now the Effector is constrained by (1) a precise anchor, (2) a bounded contract slice, (3) the exact runtime path that implements it, and (4) gates that can prove the fix. That is what makes “fix this drift” deterministic instead of creative.

Branching Factor (Rule of Seven)

Structure is a compression algorithm for attention. Keep your Map and Terrain queryable by maintaining a healthy branching factor (fan-out) at each layer.

Mechanism: quick branching-factor checks

The companion repo includes a make branching-factor target that runs lint_branching_factor.py to demonstrate this kind of linting.

Failure Modes: The Wrong Slice

The easiest way to feel why slicing matters is to watch it fail. There are two common failure modes:

If you’re debugging a real loop, Appendix B has a longer failure mode gallery (including slice-too-big / slice-too-small patterns and how to recognize them quickly).

Slice too small: missing the contract

You want to Refine a bug in calculate_income_tax(). You do the “obvious” thing: give the model the whole file and ask it to make the failing test pass.

Wrong slice (too small):

The model guesses. It changes constants and logic that look plausible, but it has no way to verify intent.

[immune] FAIL: test_calculate_income_tax_high_earner_scenario
  expected: 30000
  got:      35000
[validator] FAIL: regression detected (previously passing scenario now fails)

Fix: include the failing test case and the Map node that defines correctness (tax rules / schema). Now the Judge can localize the failure and the loop has a real target.

Slice too big: noise and scope leak

The opposite failure is to “be safe” by throwing in everything:

Wrong slice (too big):

Now the agent spends tokens on traversal and story-building. It may “fix” the bug by rewriting large sections, or by editing unrelated modules that happen to look adjacent.

[validator] FAIL: diff_budget exceeded (files_changed=12 > 3)
[validator] FAIL: out_of_scope_edit (touched src/payments/)

Fix: set budgets (max_nodes, max_tokens, diff budget), anchor hard (start from the failing case), and prune deterministically (imports/calls only, one-hop dependencies, one failing case). If the slice is still too large, that’s a signal to split the Mission Object into smaller steps.

Slicing is not about giving the model more. It’s about giving it the right evidence, in a bounded packet, so the loop can converge.

Skeleton-First Rule: Sensors Extract, Models Generate

Canonical rule (introduced in Chapter 2): deterministic Sensors extract the skeleton, and models generate only the flesh.

Typical skeleton surfaces:

Lock these with Validators before generation. If a candidate changes skeleton unintentionally, fail fast and retry with tighter scope.

Lehman gives useful language here. The Skeleton is an S-Type artifact: it can be formally verified and extracted by deterministic Sensors, so it should not be generated heuristically. The Flesh is a P-Type problem: descriptions, refactorings, and draft code rely on heuristics. The whole system remains E-Type because it lives inside a changing world. Slicing is the discipline of drawing that boundary correctly: what stays read-only truth, and what is the allowable generative area.

Strategic Component Composition (S-Type Boundaries)

As your Context Graph grows, you face a composition problem. If you connect two P-Type generative steps directly, variance multiplies. The combined system becomes harder to test, reason about, and govern.

The solution is strategic component composition: compose probabilistic systems using deterministic boundaries. Reuse existing high-quality components first, internal or OSS, and wrap them in S-Type interfaces, extractors, and validators so each seam stays testable. That keeps the E-Type whole evolvable without turning every internal boundary into a stochastic dependency.

Worked Example: Debugging a Tax Calculation with a Precise Slice

This is the same pattern from the contract-drift example, on a smaller surface. The goal is to repair behavior without widening blast radius.

Apply the same pattern:

One way to represent that work packet:

anchor: tests/test_tax_service.py#test_calculate_income_tax_high_earner_scenario
map:
  - docs/tax_rules.md#progressive_brackets
terrain:
  - src/tax_service.py#calculate_income_tax
validators:
  - signature_unchanged(calculate_income_tax)
  - diff_budget(files<=1, lines<=50)

Now the Judge has the contract, symptom, and exact edit surface. This is constrained repair, not open-ended generation. If this still thrashes, split by rule boundary (one bracket/rule per mission) rather than broadening the slice.

The examples/tax_service directory in the companion repo provides a full, runnable example of this, including the test, a buggy implementation, and the tools to build and slice its context graph.

Actionable: What you can do this week

For one real task this week, build a manual slice instead of sending whole files. Include:

  1. The anchor node (e.g., a function, a failing test case, or a validator code plus path).

  2. Any meta/ or docs/ files that define the intent or constraints.

  3. Only the directly relevant code dependencies.

  4. Then, ruthlessly prune anything that isn’t essential.

Compare it against your usual broad context: token use, failure rate, and scope leaks.

Share