Part II Understand It The Theory Behind Reliability

Chapter 6 – Context Architecture (Why Slicing Matters)

You’ve built your first SDaC loop in Part I. You’ve seen it reliably generate, judge, and refine code. Now, in Part II, we drill into why it works, starting with the most critical ingredient: context.

If you’ve spent any time working with large language models (LLMs), you’ve encountered the “context window” problem. It’s the maximum amount of text an LLM can process at once. Most engineers treat this like a bucket: dump everything relevant (and often irrelevant) into it and hope the model figures it out. This approach might work for quick demos, but it’s a recipe for non-determinism, Stochastic Drift, and unreliable systems in production. It’s why “better Mission Objects” alone stops scaling.

Reliable GenAI systems don’t just have context; they engineer it.

From Context Windows to Context Graphs

A raw context window is like giving a junior engineer access to your entire codebase, asking them to Refine a specific bug, and telling them, “It’s in there somewhere.” They’ll spend more time sifting through noise than Refining the problem, and their solution might introduce new issues because they misunderstood the boundaries of their task.

This is where Context Architecture comes in. It’s the discipline of engineering how context is selected, structured, and bounded so an LLM (or any agent) can act deterministically. It moves beyond raw text dumps to a precise, verifiable understanding of what information is relevant and how it relates.

The core artifact of Context Architecture is the Context Graph. Imagine a graph where nodes represent discrete units of information—files, functions, classes, symbols, documentation segments, API contracts, Immune System cases, or even specific lines of code. Edges represent typed relationships: “depends on,” “implements,” “is validated by,” “calls,” “references,” “is documented by.”

In polyglot repositories, the graph also needs identity. A node isn’t just “a folder”; it’s “a Rust crate built with Cargo” or “a Python package built with pyproject.” That identity is how the Driver Pattern can select the right mechanism without guessing.

Example: node identity (computed deterministically)

{"id":"services/ledger","kind":"package","language":"rust","build":"cargo","test_cmd":"cargo test"}
{"id":"services/api","kind":"package","language":"python","build":"pyproject","test_cmd":"pytest"}

Identity is not what the model infers from vibes. It’s what your deterministic extractors compute by reading manifests (Cargo.toml, pyproject.toml) and attaching the results to nodes.

Instead of a monolithic blob, a Context Graph allows you to slice the exact subset of information pertinent to a single task. The slice is produced by deterministic rules. It should include three things: the information itself, why it matters to the task, and how it should be interpreted.

Deterministic scope routing (a self-referential example)

In this repository, the engine can operate on both prose and code. That means slicing starts one step earlier than “pick files”: it starts by deciding which plane you are in.

The router in core/scope.py does not ask a model to guess. It infers scope from the target path and returns a typed object (book scope vs. code scope). The heuristics are intentionally plain:

*.md targets are treated as book scope.
*.py targets are treated as code scope.
directories that contain a book/ subtree are treated as book scope.

That is Context Architecture in practice: deterministic routing before any stochastic step runs.

Building the Context Graph (Deterministic Extraction)

You don’t need a perfect graph to start. You need a deterministic extractor that produces stable nodes and stable edges.

If you want implementation guidance (storage options, incremental rebuilds, and query shapes), see the Context Graph implementation guide in Appendix C.

At minimum, build it in two passes:

Extract nodes (files, symbols, doc sections, schemas, Immune System cases).
Derive edges (imports, calls, “documented by”, “validated by”, “exercises”).

Pseudocode: extract_nodes()

def extract_nodes(repo_root):
    for path in iter_repo_files(repo_root):
        if is_source_file(path):
            yield Node(kind="file", id=str(path))
            for symbol in ast_symbols(path):
                yield Node(kind="symbol", id=f"{path}:{symbol.qualified_name}", file=str(path))

        if is_markdown(path):
            for heading in markdown_headings(path):
                yield Node(kind="doc_section", id=f"{path}#{heading.slug}", file=str(path))

        if is_test_file(path):
            for case in test_cases(path):
                yield Node(kind="immune_case", id=f"{path}:{case.name}", file=str(path))

Pseudocode: derive_edges()

def derive_edges(nodes):
    for file_node in nodes_of_kind(nodes, "file"):
        for imported in imports(file_node.id):
            yield Edge(src=file_node.id, dst=imported, kind="imports")

    for case in nodes_of_kind(nodes, "immune_case"):
        for symbol in referenced_symbols(case.file):
            yield Edge(src=case.id, dst=symbol, kind="exercises")

    for doc in nodes_of_kind(nodes, "doc_section"):
        for symbol in linked_symbols(doc.id):
            yield Edge(src=doc.id, dst=symbol, kind="documents")

Example query (anchor → slice):

anchor: src/tax_service.py:calculate_income_tax
include:
  - kind: doc_section
    match: "docs/tax_rules.md"
  - kind: immune_case
    match: "test_tax_service.py:test_calculate_income_tax_high_earner_scenario"
  - kind: symbol
    relation: "imports"
prune:
  max_nodes: 12
  max_tokens: 1500

Example slice returned (ordered):

docs/tax_rules.md
src/tax_service.py (only the relevant function + constants)
tests/test_tax_service.py (only the failing case)
src/tax_service.py direct imports (only what the function touches)

How to Slice a Context Graph for Deterministic Action

The goal of slicing is precision: the minimum necessary context to complete a task with high determinism and low risk of Stochastic Drift. Too little context leads to incomplete work; too much leads to “lost in the middle” phenomena and irrelevant generations.

Here’s a systematic approach to slicing, building on concepts familiar to software engineers:

Identify the Anchor Node: Every task has a focal point. This is your anchor node. If you’re Refining a bug, it might be the failing Immune System case, the function implicated in the stack trace, or the specific line of code. If you’re adding a feature, it’s the interface definition or the target file for implementation. This anchor defines the immediate scope of work.
Map First (Intent & Constraints): Before diving into implementation details, an agent needs to understand the intent and constraints. This is your “Map” layer of context.
- Project Structure: How is the repository organized? (e.g., src/, tests/, docs/, meta/)
- Architectural Contracts: Which interfaces, schemas, or data models must be adhered to? (e.g., api/v1/schema.json, interfaces.ts)
- Documentation: Relevant high-level design documents, user stories, or requirements. (e.g., docs/feature_x.md)
- Configuration: Global settings, environment variables, or build configurations that impact the change. (e.g., package.json, tsconfig.json) The Map provides the guardrails and ensures the agent understands the “why” and “what are the rules” before the “how.” In SDaC, this often translates to providing your meta/ directory or key architectural READMEs as initial context.
Terrain Second (Implementation Details): Once the Map is understood, you pull in the actual code and immediate dependencies—the “Terrain.” This layer is always interpreted through the lens of the Map.
- Code Dependencies: The functions, classes, or modules directly called by, or calling, your anchor node.
- Immune System cases: Related unit, integration, or end-to-end cases, especially if the task is bug-Refining or adding a new feature.
- Relevant Snippets: Specific blocks of code, not entire files, that are tightly coupled to the task.
Prune Aggressively: This is where the “less is more” principle becomes paramount. After identifying potential context, ruthlessly eliminate anything that isn’t strictly necessary for the current task.
- Irrelevant Modules: If you’re working on a backend API, frontend UI code is usually noise.
- Unused Functions: Functions in the same file that are unrelated to the current task.
- Excessive History: While useful for humans, commit history or extensive diffs might not be necessary for an LLM to generate a specific Refinement if the current state and target change are well-defined. The most effective pruning often relies on deterministic tools: AST parsers for dependency graphs, static analysis tools, or even simple grep commands for symbol resolution.

Branching Factor (Rule of Seven)

Structure is a compression algorithm for attention. Keep your Map and Terrain queryable by maintaining a healthy branching factor (fan-out) at each layer.

Junk drawers (high fan-out): A directory with 50 files, or a Mission Object with 30 flat constraints, dilutes signal. Important details get washed out.
Rabbit holes (low fan-out): Five layers of folders each containing one item wastes context on traversal and hides peer relationships.
Heuristic: aim for ~7 (±2) siblings per node (directories, headings, grouped constraints). If you’re at 20, you’re missing a sub-abstraction. If you’re at 1, flatten.

Mechanism: quick branching-factor checks

# Directory fan-out: how many immediate children are in this layer?
find services/api -maxdepth 1 -mindepth 1 -print | wc -l

# Doc fan-out: how many sibling sections (headings) are at this layer?
rg -n '^### ' docs/feature_x.md | wc -l

If you want to enforce this as Physics, lint it:

children = count_children("services/api")  # ignore .git, build artifacts, etc.
if children > 10:
    warn(f"branching_factor_too_high path=services/api children={children}")

This layered approach ensures that the LLM receives relevant context in a structure that guides deterministic behavior, reducing the surface area for stochastic generation.

Failure Modes: The Wrong Slice

The easiest way to feel why slicing matters is to watch it fail. There are two common failure modes:

Slice too small: you omit a critical constraint or signal.
Slice too big: you drown the anchor in noise and invite scope leak.

Slice too small: missing the contract

You want to Refine a bug in calculate_income_tax(). You do the “obvious” thing: give the model the whole file and ask it to make the failing Immune System case pass.

Wrong slice (too small):

Includes tax_service.py
Omits the failing Immune System case (so the model can’t see the exact assertion)
Omits the tax rules source (so the model can’t see the contract)

The model guesses. It changes constants and logic that look plausible, but it has no way to verify intent.

[immune] FAIL: test_calculate_income_tax_high_earner_scenario
  expected: 30000
  got:      35000
[validator] FAIL: regression detected (previously passing scenario now fails)

Fix: include the failing Immune System case and the Map node that defines correctness (tax rules / schema). Now the Judge can localize the failure and the loop has a real target.

Slice too big: noise and scope leak

The opposite failure is to “be safe” by throwing in everything:

Wrong slice (too big):

src/ (multiple modules, not just the tax function)
tests/ (entire suite, not the failing case)
docs/ (all tax docs, plus unrelated decisions)
Whatever happens to be nearby in the repository

Now the agent spends tokens on traversal and story-building. It may “fix” the bug by rewriting large sections, or by editing unrelated modules that happen to look adjacent.

[validator] FAIL: diff_budget exceeded (files_changed=12 > 3)
[validator] FAIL: out_of_scope_edit (touched src/payments/)

Fix: set budgets (max_nodes, max_tokens, diff budget), anchor hard (start from the failing Immune System case), and prune deterministically (imports/calls only, one-hop dependencies, one failing case). If the slice is still too large, that’s a signal to split the Mission Object into smaller steps.

Slicing is not about giving the model more. It’s about giving it the right evidence, in a bounded packet, so the loop can converge.

Skeleton-First Rule: Sensors Extract, Models Generate

One of the core ideas for taming stochastic generation is to separate the “skeleton” from the “flesh.”

The Skeleton-First Rule dictates that critical, structural elements of the system (the “skeleton”) must be deterministically extracted and presented to the LLM, not generated by it. The LLM’s role is then to fill in the “flesh”—the implementation details, the exact wording, the logic within established boundaries.

Examples of “skeleton” elements include:

Function signatures
Class definitions and their members
Interface contracts
Data schemas (e.g., JSON Schema, Protobuf definitions)
Immune System definitions (e.g., describe blocks, it descriptions)
File paths and names
Imports/exports

These are elements that, if changed randomly by an LLM, would break the system’s architecture or external contracts. A Sensor (deterministic extractor: AST walker, schema parser, router introspector) extracts these elements from the existing codebase and provides them as immutable context. The LLM is then instructed (and often constrained by tools) to operate within these provided structures.

For instance, if you’re asking an LLM to implement a new method for an existing interface, the interface definition itself is deterministically extracted by a Sensor and provided as context. The LLM can then only generate the body of the method, ensuring it adheres to the contract. This significantly reduces the model’s degrees of freedom, pushing it towards reliable and verifiable outputs.

Worked Example: Debugging a Tax Calculation with a Precise Slice

Let’s apply these principles to a real-world scenario: debugging a tax calculation function in a Python service.

Imagine a tax_service.py file with a calculate_income_tax function. A new Immune System test_tax_service.py is failing, specifically for a scenario involving specific income brackets.

The Problem: test_tax_service.py

import pytest
from tax_service import calculate_income_tax

def test_calculate_income_tax_high_earner_scenario():

    # Example for someone earning $150,000
    income = 150000
    expected_tax = 30000  # 50k*0.1 + 50k*0.2 + 50k*0.3
    actual_tax = calculate_income_tax(income)
    assert actual_tax == pytest.approx(expected_tax, 0.01) # Fails here

tax_service.py (simplified snippet with a potential bug):


# tax_service.py
TAX_BRACKETS = [
    (0, 50000, 0.10),    # 10% for first $50k
    (50001, 100000, 0.20),  # 20% for next $50k
    (100001, float('inf'), 0.30) # 30% for everything above $100k
]

def calculate_income_tax(income: float) -> float:
    total_tax = 0.0
    remaining_income = income

    for lower, upper, rate in TAX_BRACKETS:
        if remaining_income <= 0:
            break

        # Bug: This calculation for upper brackets might be off
        taxable_in_bracket = min(remaining_income, upper - lower + 1) # Potential off-by-one or logic error
        if lower <= remaining_income: # This condition might also be problematic
            taxable_in_bracket = min(remaining_income, upper - lower + 1)
            tax_amount = taxable_in_bracket * rate
            total_tax += tax_amount
            remaining_income -= taxable_in_bracket
    return total_tax

Here’s how we’d build a precise Context Graph slice for an agent to Refine this bug:

Anchor Node: The failing Immune System case: test_calculate_income_tax_high_earner_scenario within test_tax_service.py. This immediately points to the calculate_income_tax function as the primary suspect.
Map First (Intent & Constraints):
- Tax Rules Documentation: A simplified tax_rules.md explaining the progressive tax bracket system. (e.g., “Income between $0-$50k is 10%, $50,001-$100k is 20%, above $100k is 30%”). This defines the correct behavior.
- Project Structure: Knowing tax_service.py is in src/ and test_tax_service.py is in tests/ helps the agent understand file relationships.
- Type Hinting: The function signature calculate_income_tax(income: float) -> float is a hard constraint. The agent must not change this.
Terrain Second (Implementation Details):
- tax_service.py (sliced): Only the TAX_BRACKETS constant and the calculate_income_tax function body. No other functions in tax_service.py (if any) are included.
- test_tax_service.py (sliced): Only the test_calculate_income_tax_high_earner_scenario function, including its assertion. This provides the specific failing input and expected output.
Prune Aggressively:
- Remove all other Immune System cases from test_tax_service.py.
- Remove all other functions/classes from tax_service.py.
- Exclude unrelated files like database models, API endpoints, or UI templates.

The Context Provided to the LLM:


# file: tax_rules.md

## Progressive Income Tax System

Income tax is calculated based on progressive tax brackets.

- Income $0 - $50,000: 10%

- Income $50,001 - $100,000: 20% (applied to income within this bracket)

- Income > $100,000: 30% (applied to income above $100,000)

---

# file: src/tax_service.py

# (Only the relevant constant and function from tax_service.py)

TAX_BRACKETS = [
    (0, 50000, 0.10),
    (50001, 100000, 0.20),
    (100001, float('inf'), 0.30)
]

def calculate_income_tax(income: float) -> float:
    total_tax = 0.0
    remaining_income = income

    for lower, upper, rate in TAX_BRACKETS:
        if remaining_income <= 0:
            break

        # Bug is likely here.

        # Current logic:

        # taxable_in_bracket = min(remaining_income, upper - lower + 1)

        # if lower <= remaining_income:

        #     taxable_in_bracket = min(remaining_income, upper - lower + 1)

        #     tax_amount = taxable_in_bracket * rate

        #     total_tax += tax_amount

        #     remaining_income -= taxable_in_bracket
    return total_tax

---

# file: tests/test_tax_service.py

# (Only the failing test from test_tax_service.py)

import pytest
from tax_service import calculate_income_tax

def test_calculate_income_tax_high_earner_scenario():
    income = 150000
    expected_tax = 30000  # 50k*0.1 + 50k*0.2 + 50k*0.3
    actual_tax = calculate_income_tax(income)
    assert actual_tax == pytest.approx(expected_tax, 0.01) # This test is currently failing.

The agent’s task: “Refine the calculate_income_tax function to pass test_calculate_income_tax_high_earner_scenario, adhering to the tax rules and without changing the function signature.”

Skeleton-First Rule in Action: A deterministic extractor + Validator (for example an AST-based checker in a pre-commit hook) will verify that:

The calculate_income_tax function signature (def calculate_income_tax(income: float) -> float:) remains unchanged.
The TAX_BRACKETS constant is not arbitrarily altered (unless specifically requested).
The agent only modifies the body of the function and only to Refine the tax calculation logic.

This precise slicing provides the LLM with exactly what it needs: the goal (failing Immune System), the rules (tax docs), and the specific code to modify, all within strict deterministic boundaries. This leads to a higher probability of correct and verifiable output.

Actionable: What you can do this week

Start thinking about the context you provide to your own GenAI agents. Instead of giving them entire files or directories, identify a single task, an anchor node, and then manually construct a “slice” of context for it. Include:

The anchor node (e.g., a function, an Immune System).
Any meta/ or docs/ files that define the intent or constraints.
Only the directly relevant code dependencies.
Then, ruthlessly prune anything that isn’t essential.

Experiment with how a smaller, more focused context slice affects the output quality and determinism compared to a larger, less structured one. Pay attention to how many tokens you use, and how often the model strays from the core task.