Part III Scale It From Loop to System
12 min read

Chapter 9 – The Dream Daemon (Background Maintenance)

Start with the 90% version most teams should actually run:

02:00 nightly scan
  -> rank the top maintenance target
  -> open no PRs
  -> emit one report with evidence
09:00 human picks one item
  -> run one bounded maintenance loop
  -> review the diff like any other PR

That is already a Dream loop. It is scheduled, boring, and useful. It fights entropy without asking the team to trust background merges.

By now, we have seen two key loops: Mission Objects package intent (Chapter 7), and Map-Updaters keep system understanding current (Chapter 8). Both usually start from an explicit trigger: a code commit, a schema change, or a manual command. But a lot of maintenance work has no clean trigger. It comes from slow entropy: docs drift, dependencies age, and conventions quietly diverge.

Background maintenance is how you handle that slow drift. If a system lives for months or years, you need a way to fight entropy systematically, not just react when something breaks.

This chapter introduces the Dream Daemon as a pattern: a controller that turns entropy signals into bounded maintenance work. Here, daemon just means a background process or controller, not anything mystical. We start at Depth 0 (sensors only) and then outline the deeper implementations you can grow into.

Dream is powerful, but it is not “free productivity.” A daemon that can propose changes while humans sleep can also compound mistakes while humans sleep. What makes the difference is governance: task selection, budgets, and gates.

Dream is a Control Loop, Not a Schedule

Always-On Loops (Maintenance That Compounds)

A Dream Daemon is not a cron job. It is a bounded control loop that senses entropy, picks one target, and exits with a deterministic decision: PASS, DEFER, or ESCALATE.

Timeline Week 8/26
Active loops
Cycle budget 6x
Entropy debt
/100
Incidents
/wk
P95 cost/outcome
Budget remaining
6x
Selected target
Expected gain:
The chart shows the long-horizon trend. The controls below model the bounded loop: one target per cycle, a hard budget, and deterministic admission gates.

Debt (left axis) vs P95 cost/outcome (right axis). The dashed marker tracks the selected week.

Constraint → deterministic gate
Signal ranker improves debt trajectory PENDING
Guardrails active (map + governance) PASS
Cycle budget not exhausted PASS
Receipt written to Ledger PENDING
Daemon state of record
Status READY
Cycle 0/6
Last run --
ledger/dream-daemon/week-08/cycle-00.json
Minimum progress window (W=4)
Collecting baseline.
Ready to run one bounded cycle.
Cycle outcomes
Cyan = PASS, amber = DEFER, violet = ESCALATE

A scheduler is just a clock. The Dream Daemon is the loop that turns measured entropy into bounded work.

Start with the lowest-risk version: measure entropy and emit a report. Scheduling is optional. Automated changes come later, after you trust your sensors, budgets, and gates. The mistake is thinking “cron” is the architecture. Cron is packaging.

Scrum, Kanban, and most other delivery processes are also control loops: select work, execute, inspect, adapt. The difference is enforcement. Those loops run on meetings and social contracts. Dream takes the same posture and compiles it into executable artifacts: Sensors emit signals, a deterministic ranker selects targets, Effectors produce bounded diffs, Validators grade, and governance gates decide what is admitted.

The Dream Loop: Sense → Decide → Act → Verify

At its best, Dream is a controller with a simple posture:

  1. Sense: run entropy sensors and collect signals.
  2. Decide: rank + budget, then pick work (or defer).
  3. Act: dispatch to an allowlisted action to produce a diff.
  4. Verify: run the same Validators you require for merges.

The pattern stays the same. What changes is the depth of implementation: at Depth 0 you stop after Sense + Decide. A human performs Act. Verification still runs through the Immune System.

Dream reads both directions:

flowchart TD
  S[Sensors] --> RB["Rank + Budget"]
  RB --> M[Mission]
  M --> E[Effector]
  E --> IS["Immune System"]
  IS --> PR[PR]
  PR --> R[Review]

Dream Maintains Memory, Not Just Code

Dream is not only janitorial code automation. It is one of the ways a system keeps its operating Map alive.

When reality teaches something new, the loop should be able to route that learning back into the Map:

If an incident never changes the Map, the organization has no memory. It just pays tuition repeatedly.

This is where a Dream Manifest becomes useful: a versioned backlog of entropy signals, work selectors, and memory updates. It tells Dream not just what code to clean up, but what lessons to preserve.

When Dream starts touching memory surfaces that shape future behavior, provenance matters. It is not enough to know that a runbook, validator, or policy changed. You want the Ledger to say which operating or constitutional surface moved, what evidence justified the move, and who authorized it: a human steward, or a prior bounded loop acting inside an explicit delegation.

Task Selection Is the Problem

“Run a maintenance daemon” is easy to say. The hard question is: what is it allowed to work on, and how does it choose?

You want Dream to spend its budget on high-impact, high-feasibility maintenance that is low risk and easy to verify. That is a task selection problem, not an agent problem.

A practical task taxonomy

Not all maintenance work is daemon-shaped. A simple taxonomy:

Task category Good Dream targets? Why
Deterministic hygiene (formatters, lint fixes, dead imports) Yes Low risk, high feasibility, easy to verify
Coverage gaps and missing tests Often The Validator signal is clear, but scope must be bounded
Local refactors (simplify a function, reduce complexity) Sometimes Useful when scoped to one symbol/file with strict budgets
Deduplication (remove repeated helpers, merge near-copies) Sometimes Risk of subtle behavior changes; needs strong tests/ratchets
Product behavior changes (business logic) No by default High risk, ambiguous intent, requires domain judgment
Architecture changes (new modules, moving boundaries) Rare Expensive and destabilizing unless tightly constrained

Dream becomes safe when it is not “a general fixer,” but a controller for an allowlisted catalog of actions, each with known Physics.

A scoring heuristic (impact × feasibility ÷ cost × risk)

Dream does not need a perfect model of value. It needs a conservative filter.

One way to formalize the decision:

score(task) = (impact(task) * feasibility(task)) / (cost(task) * risk(task))

Where the terms are estimated from deterministic signals whenever possible:

The important part is not the algebra. It is the posture: only attempt tasks that are easy to grade, cheap to try, and cheap to undo.

Treat the task-selection heuristic and Dream Manifest as Map surfaces:

Higher-order maintenance depends on that legibility. If Dream can later ask “why is this selector set this way?” or “why did this validator become mandatory?”, the answer has to live in a queryable Ledger entry, not in oral history.

Implementation Depths (Start at Depth 0)

You can implement Dream as a ladder. Each step adds autonomy, but also raises the governance bar.

Depth 0: Sensors Only (default)

Depth 0 is a deterministic entropy scan that outputs a ranked worklist with evidence. It does not open PRs. It does not modify files. It produces targets.

This is enough to change team behavior because it makes maintenance specific:

In this repository, the Dream controller is implemented in tools/dream.py and wired behind make dream (one cycle) and make dream-loop (continuous mode). It can dispatch bounded maintenance actions, so do not treat the current implementation as “Depth 0 by default.” Treat its scan + decide phase as the Depth 0 building block: deterministic signals turned into a ranked target with evidence.

The scan combines a few signals:

Then it chooses one file and one allowlisted action. The rough logic is:

if coverage is low:
    choose action = "tests"
elif duplication dominates:
    choose action = "dedupe"
elif file is bloated (too long or too complex):
    choose action = "split"
else:
    choose action = "refactor"

This is not a universal truth. It is a starting heuristic you can tune. The win is that the “Decide” step is explicit and versioned. You can inspect it, adjust thresholds, and argue about it with evidence.

One repo-shaped scoring sketch (matching the spirit of tools/dream.py):

for file in roots:
    base = static_score(file)          # size + complexity snapshot
    cov = coverage_percent(file)       # 0–100
    dup = duplication_score(file)      # higher = more repeated units

    if cov < 60:
        action = "tests"
        score = 120 - cov              # lower coverage = higher priority
    elif cov < 75:
        action = "tests"
        score = 90 - cov
    elif dup is dominating:
        action = "dedupe"
        score = base + weight(dup)
    elif file is bloated:
        action = "split"
        score = base + penalties(size, complexity)
    else:
        action = "refactor"
        score = base

choose the file with highest score, then do one bounded action

At Depth 0, you stop at “Sense + Decide”: emit a ranked worklist and attach evidence. A human (or a later Depth) can then pick one item and run a normal Software Development as Code (SDaC) loop against that bounded target, under the same Physics gates.

If you do implement a writer, keep the unit of work small: one target per cycle, strict diff budgets, and a hard stop when the signal doesn’t improve (Chapter 5’s minimum-progress ratchets).

Depth 0.5: Scheduled reports (no diffs)

Depth 0.5 is Depth 0 on a schedule.

You run the sensors nightly or weekly and publish the report as an artifact: a dashboard entry, a ticket, or a message with the top findings and their evidence. Nothing is modified. No diffs are generated. The automation is real, but it stays read-only.

Treat the schedule as a throttle, not as “off-hours.” Global teams and incident timelines do not respect a single clock.

This is often the lowest-friction path to adoption: you get a consistent maintenance signal without triggering the fear that “the agent is writing code in the background.”

Depth 1: Scheduled, human-approved proposals

Once you trust your sensors, you can move from “report” to “proposal”:

At this depth, the daemon is not “creative.” It is a work scheduler for a fixed catalog of maintenance actions.

Depth 2: Autonomous selection (bounded)

At higher autonomy, Dream becomes a controller: it chooses which strategy to apply based on signal shape (coverage gaps, complexity hotspots, duplication spikes). This is where you must tighten budgets and allowlists:

Depth 3: Autonomous merge (aspirational)

Auto-merge is possible, but only after Depth 1–2 are stable and your governance is mature. If you cannot explain why a diff exists and reproduce the verification, you cannot auto-merge it.

Daemon Safety Controls (Budgets, Gates, Kill Switches)

Dream is safe only when it is bounded. Practical controls:

The safety goal is not “never fail.” It is “fail small, fail early, and fail reversibly.”

Integration With Human Development

Dream is not a privileged actor. Treat it like any other contributor with stricter limits:

Before Dream opens a PR, run one deterministic conflict-avoidance check:

This keeps Dream from turning normal parallel work into merge-noise.

Minimal evidence payload for a Dream-generated PR:

dream_evidence:
  trigger: "coverage_gap"        # or duplication / complexity / drift
  target: "path/to/file_or_root"
  budgets:
    max_files_changed: 2
    max_lines_changed: 120
  validators:
    - "make test"
    - "make lint"
  decision_trace:
    scan_run_id: "2026-03-06T120000Z"
    why_this_target: "highest impact within allowlist"

Security: Hostile Terrain and Instruction Injection

So far, we’ve focused on protecting the repository from the agent (scope limits, Validators, protected graders). You also need the other half of the threat model: protecting the agent from hostile input. This is why Prep must sanitize (Chapter 2) and why governance must include Input Hygiene at scale (Chapter 12).

Dream increases the amount of Terrain text your system reads. Every TODO comment, every ticket description, and every log line becomes an input channel, and you should treat it as adversarial.

The attack shape and Prep-layer defenses are covered in Chapter 2 (“The attack shape”). Chapter 12 extends this to governance-at-scale: policy validators that detect injection-shaped text and force safe outcomes (defer, file_ticket) rather than attempting execution.

For Dream, keep a simple rule: untrusted text never becomes authority. It can only become evidence attached to a Mission Object compiled from allowlisted templates, with explicit budgets and gates.

Actionable: What you can do this week

  1. Pick one entropy sensor: coverage gaps, complexity hotspots, duplication, or drift between a Map surface and Terrain.

  2. Implement Depth 0: write a deterministic scan that emits a ranked worklist with evidence. No writes.

  3. Run it weekly: treat the output like a backlog generator, not a one-time audit.

  4. Fix one item: pick the top target and run a bounded loop with explicit Physics gates.

  5. Only then add autonomy: when you can predict the failure modes, add Depth 1 proposals and keep them human-approved.

Share