Chapter 6 – Context Architecture (Why Slicing Matters)
Start with a failing tax case:
FAIL test_calculate_income_tax_high_earner_scenario
expected: 42_750
actual: 43_500
Do not hand the model the whole repo. Build one bounded slice:
- Anchor: the failing test case.
- Expand: the tax rule table, the function under test, and the one-hop helpers it calls.
- Prune: everything else, including unrelated endpoints, docs, and historical tickets.
One minimal slice manifest looks like this:
anchor:
file: tests/test_tax_service.py
symbol: test_calculate_income_tax_high_earner_scenario
map:
- docs/tax_rules.md#progressive_brackets
terrain:
- src/tax_service.py#calculate_income_tax
- src/tax_service.py#round_currency
gates:
- unit_test_tax_case
- signature_unchangedThat packet is small enough to reason about and strict enough to grade. It is the practical slicing rule in one screen: Anchor -> Expand -> Prune.
Part I showed the loop. This chapter defines the context architecture that makes that loop deterministic.
Context windows keep expanding, but attention is finite. That limit is Physics. How you select, structure, and bound the window is architecture.
Slicing is not about fitting more tokens. It is about concentrating attention on contracts, evidence, and boundaries so generation does not wander into drift. When context grows faster than attention quality, outputs can look complete while quietly violating constraints. Slicing is the control surface that keeps that failure mode visible and testable.
Shaping the Optimization Terrain
This is the geometry view of slicing. You are not dumping the whole repository into the window. You reshape the terrain so valid paths are easier and drift is harder.
Intent defines the target basin. The slice, contracts, and validators warp the solution space so each loop iteration takes the shortest safe path toward the optima you actually want, not a nearby but wrong local maximum.
Even when models accept huge contexts, attention is not uniform (“lost in the middle”). Structure is compression for attention: it shrinks the surface where generation can invent structure or leak scope.
In practice, a well-kept Map carries more meaning per token than dumping full Terrain. Reliable systems do not just have context; they engineer it.
Maps also exist at more than one scale. At the smallest scale, a Map might be a schema, a route table, or a dependency inventory. At larger scales, it becomes the operating Map: architecture decisions, operating policies, incident learnings, onboarding guidance, brand voice, tone, and the constraints that tell the system what it is trying to preserve. Context Architecture decides which layer of that Map belongs in a given slice.
Chapter 2 gave the operational version of slicing in
Prep. Here we add the fuller model: Context Graphs,
branching-factor heuristics, slice failure modes, and the boundary
between deterministic extraction and stochastic generation.
From One Slice to a Context Graph
That tax slice did not come from intuition. It came from a larger topology.
- The failing test is one node.
- The tax rule document is a Map node that defines correctness.
calculate_income_taxand its helper functions are Terrain nodes.- The test and validator surfaces are Judge nodes.
Slicing is the act of pulling one bounded neighborhood out of that
larger graph. Anchor chooses the starting node.
Expand follows only the edges required to make the task
executable. Prune enforces the budget so the slice stays
reviewable.
From Context Windows to Context Graphs
Context Architecture is the discipline of engineering how context is selected, structured, and bounded so a large language model (LLM), or any agent, can act deterministically.
The core artifact of Context Architecture is the Context Graph. It turns a repository into deterministic units plus typed relationships. Nodes can be files, functions, classes, symbols, documentation sections, API contracts, policy clauses, runbook steps, glossary entries, incident records, Immune System cases, or specific lines. Edges express relations such as “depends on,” “implements,” “is validated by,” “calls,” “references,” and “is documented by.”
In polyglot repositories, the graph also needs identity. A node is not just “a folder.” It is “a Rust crate built with Cargo” or “a Python package built with pyproject.” That identity is how the Driver Pattern selects the right mechanism without guessing.
Example: node identity (computed deterministically)
services/ledger/→ Rust package (cargo test)services/api/→ Python package (pytest)
Identity is not something the model should infer heuristically. It is
something your deterministic extractors compute by reading manifests
(Cargo.toml, pyproject.toml) and attaching the
result to nodes.
Instead of a monolithic blob, a Context Graph lets you slice the exact subset of information relevant to one task. The slice is produced by deterministic rules. It should carry three things: the evidence itself, why it matters to the task, and how it should be interpreted. A good slice can mix hard and soft memory surfaces: an API schema, a runbook clause, a policy rule, and a tone guide can all be relevant, but they imply different kinds of enforcement.
Deterministic scope routing (a self-referential example)
Slicing starts one step earlier than “pick files”: it starts by deciding which plane you are in (docs, code, infra, data).
Do that routing deterministically (path rules, manifest detection), not by asking a model to guess. The heuristics can be intentionally plain:
*.mdtargets are treated as book scope.*.pytargets are treated as code scope.- directories that contain a
book/subtree are treated as book scope.
That is Context Architecture in practice: deterministic routing before any stochastic step runs.
Building the Context Graph (Deterministic Extraction)
You don’t need a perfect graph to start. You need a deterministic extractor that produces stable nodes and stable edges.
If you want implementation guidance (storage options, incremental rebuilds, and query shapes), see the Context Graph implementation guide in Appendix C.
At minimum, build it in two passes:
- Extract nodes: files, symbols, doc sections, schemas, Immune System cases.
- Derive edges: imports/calls, “documented by”, “validated by”, “exercises”.
You don’t need a fancy query language on day one. You need one reliable move:
- choose an anchor
- include the smallest neighborhood that makes the anchor executable (contract + code + tests)
- prune aggressively (node/token budgets, one-hop edges)
Example: anchor → slice (conceptual)
- Anchor: the failing case.
- Map: the contract (schema / rules doc) that defines correctness.
- Terrain: the implementation under test + direct dependencies.
- Prune: one failing case, one-hop edges, strict budgets.
The companion repo (github.com/kjwise/aoi_code) includes
make graph and make slice targets that
demonstrate this. build_context_graph.py is a simple
extractor, and slice_context_graph.py consumes the graph to
produce a bounded context packet.
How to Slice a Context Graph for Deterministic Action
The goal of slicing is precision: the minimum context that keeps determinism high and drift low.
Use a deterministic four-step slice:
- Anchor first: start from the failing case, target
symbol, or interface being changed.
- Good anchors are concrete and machine-addressable (file + symbol, failing test id, validator code).
- If you cannot name the anchor precisely, you are still in diagnosis mode, not execution mode.
- Map first: include contracts, schemas, mission
policy, and only docs/config that constrain correctness.
- Prefer sources that define “must” conditions (schema, API contract, policy file) over explanatory prose.
- Keep authority explicit: what is allowed to change, and what counts as done.
- Terrain second: include anchor implementation,
one-hop dependencies, and the Judge surface that proves correctness.
- Pull runtime behavior and test/assertion surfaces together so edits can be validated immediately.
- Avoid pulling whole subsystems when a one-hop neighborhood is enough.
- Prune by budget: remove unrelated modules/history
and enforce node/token/diff budgets.
- Budget is part of correctness, not just performance.
- If scope grows, make that an explicit mission change instead of silent drift.
If required dependencies still do not fit, split the Mission or chunk deterministically.
Minimal slice manifest (deterministic handoff)
Treat each slice as an explicit artifact, not an implicit prompt. A minimal manifest can be serialized and audited:
mission_id: tax-fix-2026-02-23
anchor:
file: tests/test_tax_service.py
symbol: test_calculate_income_tax_high_earner_scenario
map:
- docs/tax_rules.md#progressive_brackets
terrain:
- src/tax_service.py#calculate_income_tax
budgets:
max_files_changed: 1
max_lines_changed: 50
max_attempts: 4
gates:
- signature_unchanged
- unit_test_tax_caseTwo reasons this helps:
- It separates selection (what evidence is in-bounds) from generation (what candidate to propose).
- It gives the Judge and Ledger a stable contract for replay, diffing, and incident review.
If a run fails, update the manifest deterministically (scope, budgets, gates) instead of growing ad-hoc prompt text.
Slice QA checklist (before you call the model)
Run a quick deterministic check:
- Anchor is explicit (
file + symbolorvalidator code + path), not a vague sentence. - At least one contract source is present (schema/interface/policy), not just implementation files.
- At least one Judge surface is present (failing case or validator assertion).
- Write scope is declared (allowlist or allowed regions), not implied.
- Budgets are finite (
max_attempts, scope/diff limits), not open-ended. - Retrieved evidence has provenance (file + line or section id).
- No unrelated subsystems are included “just in case.”
- The slice can be replayed without hidden chat state.
If any of these fail, repair the slice before generation. Many “model quality” complaints in production loops are unresolved slice-quality defects.
Context priority stack (what to include first)
When you have a fixed context budget, prioritization matters more than cleverness. A practical ordering:
- Authority: system instructions + Mission Object (what is allowed, what “done” means).
- Latest deterministic findings: the last Judge output (structured failures, file/line, error codes).
- The anchor surface: the exact file/region/symbol you’re changing.
- Interfaces and contracts: signatures, schemas, route tables, inventories (skeleton-first).
- Immune System expectations: the one failing case (or the one contract-compat check) and the assertion surface that defines “correct.”
- Direct dependencies: one-hop imports/calls/types needed to make the anchor executable.
- Everything else: only if you still have budget, and only if you can justify the edge.
If you’re forced to choose between “more code” and “the contract,” choose the contract. The fastest way to create drift is to let the Effector guess what the contract was supposed to be.
Chunking patterns (when the slice still doesn’t fit)
Sometimes the minimal neighborhood still exceeds the budget (big files, big contracts, or a high fan-out module). When that happens, chunking is not “split the text.” It’s: choose a boundary that preserves meaning, then keep Physics in the loop.
Common chunking patterns and trade-offs:
- File-level chunking: each file is a chunk. Simple, but weak on cross-file reasoning unless you also include extracted interfaces.
- Dependency-aware chunking: anchor file + one-hop dependencies as separate chunks. Strong default for code, because boundaries follow imports/calls.
- Task-focused chunking: split by role, not by size: contract chunk, implementation chunk, failing-case chunk. This often outperforms naive file splits.
- Sliding-window chunking (large files): split one file into overlapping regions, but treat overlap as evidence, not permission to edit everywhere. Always pair this with allowed edit regions.
Recombining outputs should also be deterministic:
- Prefer diff-shaped outputs against a working tree, not free-form rewrites.
- Validate each candidate locally (parse/scope first) before you attempt to merge it with other candidates.
- If chunk A implies a change in chunk B, don’t “guess the other chunk.” Expand the slice and run another iteration with the missing contract/code present.
Operationally, treat chunks as review units: open a small PR per chunk (stack if you want). Merge sequentially so each diff stays inside the Review Budget and leaves a clean Ledger trace. Only squash at the end if your merge policy requires it.
Chunking is a sign your slice is doing real work. It is not a failure. It’s a reminder that context is a resource you budget, not an ocean you swim in.
Retrieval-Augmented Generation (RAG) as a slice hint, not a decision
Retrieval-augmented generation can help when you don’t know where the relevant context lives. But retrieval is not the same thing as slicing. Retrieval proposes candidates; slicing enforces boundaries.
A pragmatic approach in codebases is hybrid:
- Lexical retrieval: symbol/name search (fast, high precision when you have good anchors).
- Structural retrieval: import graphs, call graphs, test-to-code mappings (deterministic, semantics-aware).
- Embedding retrieval: useful for prose and “what file talks about X” questions, less reliable as a sole signal for code edits.
Guardrails:
- Treat retrieved text as evidence with provenance (file + line), not authority.
- Run deterministic filters after retrieval: scope allowlists, dependency rules, and node/token budgets.
- Expect retrieval errors (false positives/negatives). If the loop is thrashing, treat it as a slice problem before you blame the model.
If you don’t have retrieval infrastructure, don’t block on it. You can get far with deterministic anchors (failing cases, stack traces, schemas) and cheap lexical search.
Summarization is lossy compression (use it last)
Summarization can compress context, but it introduces new failure modes: it can drop constraints, blur interfaces, or smuggle in invented structure.
Rules of thumb:
- Never summarize the Mission Object or the latest Judge findings.
- Prefer deterministic extraction (skeleton-first) over stochastic summarization for contracts and interfaces.
- If you must summarize, do it on low-authority evidence (background docs) and carry provenance so a human can verify.
- Treat summaries as a cache you can invalidate: when a summary becomes suspicious, re-extract from the Terrain.
Token budgeting: leave headroom
Token counting is approximate. Different models use different tokenizers, and your request needs space for the model’s response.
Practical rule: reserve ~20% headroom. If your maximum context is B tokens, aim to use at most ~0.8B for inputs, then let the output and tooling fit in the rest.
Repo-shaped pseudo-example: Fixing contract drift with a bounded slice
Suppose a CI Validator fails because your API contract drifted:
[validator] FAIL: openapi_drift
path=/api/users GET parameter.limit.type
expected=integer
actual=string
Your repository might look like this:
repo/
openapi.json # Map: the public contract surface
src/users_handler.ts # Terrain: runtime behavior
tests/test_users_api.py # Immune System cases
clients/mobile/src/api.ts # downstream consumer
clients/mobile/src/types.ts # downstream contract surface
Anchor: the failing Validator finding
(openapi_drift) is your anchor node. It already tells you
where correctness lives (the contract) and what broke
(a specific field/type).
Map first: include only the contract surface (and any policy that constrains it).
the exact
openapi.jsonregion that definesGET /api/usersandlimitthe Mission Object for the repair (scope allowlist + budgets + acceptance criteria), for example:
goal: "Restore contract alignment for GET /api/users pagination" scope_allowlist: - openapi.json - src/users_handler.ts - clients/mobile/src/types.ts budgets: max_files_changed: 3 max_lines_changed: 80 gates: - validate_openapi_compat - unit_tests - mobile_typecheck
Terrain second: include only the code and the Immune System cases that interpret that contract.
src/users_handler.ts(the parsing/serialization path forlimit)- the one failing check (or contract-compat case) that proves the drift
clients/mobile/src/types.tsonly if it is directly coupled to the contract surface (otherwise omit it)
Prune: do not include
clients/mobile/ wholesale, the entire OpenAPI file, or
unrelated endpoints. If a file isn’t on the dependency chain of the
anchor, it’s noise.
Now the Effector is constrained by (1) a precise anchor, (2) a bounded contract slice, (3) the exact runtime path that implements it, and (4) gates that can prove the fix. That is what makes “fix this drift” deterministic instead of creative.
Branching Factor (Rule of Seven)
Structure is a compression algorithm for attention. Keep your Map and Terrain queryable by maintaining a healthy branching factor (fan-out) at each layer.
- Junk drawers (high fan-out): A directory with 50 files, or a Mission Object with 30 flat constraints, dilutes signal. Important details get washed out.
- Rabbit holes (low fan-out): Five layers of folders each containing one item wastes context on traversal and hides peer relationships.
- Heuristic: aim for ~7 (±2) siblings per node (directories, headings, grouped constraints). If you’re at 20, you’re missing a sub-abstraction. If you’re at 1, flatten.
Mechanism: quick branching-factor checks
- Count immediate children in a directory layer.
- Count sibling headings in a doc layer.
- If you’re consistently above ~10, create a sub-abstraction; if you’re at 1, flatten.
The companion repo includes a make branching-factor
target that runs lint_branching_factor.py to demonstrate
this kind of linting.
Failure Modes: The Wrong Slice
The easiest way to feel why slicing matters is to watch it fail. There are two common failure modes:
- Slice too small: you omit a critical constraint or signal.
- Slice too big: you drown the anchor in noise and invite scope leak.
If you’re debugging a real loop, Appendix B has a longer failure mode gallery (including slice-too-big / slice-too-small patterns and how to recognize them quickly).
Slice too small: missing the contract
You want to Refine a bug in calculate_income_tax(). You
do the “obvious” thing: give the model the whole file and ask it to make
the failing test pass.
Wrong slice (too small):
- Includes
tax_service.py - Omits the failing test case (so the model can’t see the exact assertion)
- Omits the tax rules source (so the model can’t see the contract)
The model guesses. It changes constants and logic that look plausible, but it has no way to verify intent.
[immune] FAIL: test_calculate_income_tax_high_earner_scenario
expected: 30000
got: 35000
[validator] FAIL: regression detected (previously passing scenario now fails)
Fix: include the failing test case and the Map node that defines correctness (tax rules / schema). Now the Judge can localize the failure and the loop has a real target.
Slice too big: noise and scope leak
The opposite failure is to “be safe” by throwing in everything:
Wrong slice (too big):
src/(multiple modules, not just the tax function)tests/(entire suite, not the failing case)docs/(all tax docs, plus unrelated decisions)- Whatever happens to be nearby in the repository
Now the agent spends tokens on traversal and story-building. It may “fix” the bug by rewriting large sections, or by editing unrelated modules that happen to look adjacent.
[validator] FAIL: diff_budget exceeded (files_changed=12 > 3)
[validator] FAIL: out_of_scope_edit (touched src/payments/)
Fix: set budgets (max_nodes,
max_tokens, diff budget), anchor hard (start from the
failing case), and prune deterministically (imports/calls only, one-hop
dependencies, one failing case). If the slice is still too large, that’s
a signal to split the Mission Object into smaller steps.
Slicing is not about giving the model more. It’s about giving it the right evidence, in a bounded packet, so the loop can converge.
Skeleton-First Rule: Sensors Extract, Models Generate
Canonical rule (introduced in Chapter 2): deterministic Sensors extract the skeleton, and models generate only the flesh.
Typical skeleton surfaces:
- signatures and interface contracts
- schemas and route definitions
- Immune System case names/assertion surfaces
- file paths, imports, and allowed edit regions
Lock these with Validators before generation. If a candidate changes skeleton unintentionally, fail fast and retry with tighter scope.
Lehman gives useful language here. The Skeleton is an S-Type artifact: it can be formally verified and extracted by deterministic Sensors, so it should not be generated heuristically. The Flesh is a P-Type problem: descriptions, refactorings, and draft code rely on heuristics. The whole system remains E-Type because it lives inside a changing world. Slicing is the discipline of drawing that boundary correctly: what stays read-only truth, and what is the allowable generative area.
Strategic Component Composition (S-Type Boundaries)
As your Context Graph grows, you face a composition problem. If you connect two P-Type generative steps directly, variance multiplies. The combined system becomes harder to test, reason about, and govern.
The solution is strategic component composition: compose probabilistic systems using deterministic boundaries. Reuse existing high-quality components first, internal or OSS, and wrap them in S-Type interfaces, extractors, and validators so each seam stays testable. That keeps the E-Type whole evolvable without turning every internal boundary into a stochastic dependency.
Worked Example: Debugging a Tax Calculation with a Precise Slice
This is the same pattern from the contract-drift example, on a smaller surface. The goal is to repair behavior without widening blast radius.
Apply the same pattern:
- Anchor (Terrain signal): the failing case.
- Map (intent): the contract that defines correctness (schema, rules doc, API contract).
- Terrain (code): the function under test (and only the constants it touches).
- Physics (gates): budgets + Validators that lock the skeleton (signature unchanged) and bound the blast radius.
One way to represent that work packet:
anchor: tests/test_tax_service.py#test_calculate_income_tax_high_earner_scenario
map:
- docs/tax_rules.md#progressive_brackets
terrain:
- src/tax_service.py#calculate_income_tax
validators:
- signature_unchanged(calculate_income_tax)
- diff_budget(files<=1, lines<=50)
Now the Judge has the contract, symptom, and exact edit surface. This is constrained repair, not open-ended generation. If this still thrashes, split by rule boundary (one bracket/rule per mission) rather than broadening the slice.
The examples/tax_service directory in the companion repo
provides a full, runnable example of this, including the test, a buggy
implementation, and the tools to build and slice its context graph.
Actionable: What you can do this week
For one real task this week, build a manual slice instead of sending whole files. Include:
The anchor node (e.g., a function, a failing test case, or a validator code plus path).
Any
meta/ordocs/files that define the intent or constraints.Only the directly relevant code dependencies.
Then, ruthlessly prune anything that isn’t essential.
Compare it against your usual broad context: token use, failure rate, and scope leaks.