Chapter 9 – The Dream Daemon (Background Maintenance)
Start with the 90% version most teams should actually run:
02:00 nightly scan
-> rank the top maintenance target
-> open no PRs
-> emit one report with evidence
09:00 human picks one item
-> run one bounded maintenance loop
-> review the diff like any other PR
That is already a Dream loop. It is scheduled, boring, and useful. It fights entropy without asking the team to trust background merges.
By now, we have seen two key loops: Mission Objects
package intent (Chapter 7), and Map-Updaters keep system
understanding current (Chapter 8). Both usually start from an explicit
trigger: a code commit, a schema change, or a manual command. But a lot
of maintenance work has no clean trigger. It comes from slow entropy:
docs drift, dependencies age, and conventions quietly diverge.
Background maintenance is how you handle that slow drift. If a system lives for months or years, you need a way to fight entropy systematically, not just react when something breaks.
This chapter introduces the Dream Daemon as a
pattern: a controller that turns entropy signals into
bounded maintenance work. Here, daemon just means a
background process or controller, not anything mystical. We start at
Depth 0 (sensors only) and then outline the deeper
implementations you can grow into.
Dream is powerful, but it is not “free productivity.” A daemon that can propose changes while humans sleep can also compound mistakes while humans sleep. What makes the difference is governance: task selection, budgets, and gates.
Dream is a Control Loop, Not a Schedule
Always-On Loops (Maintenance That Compounds)
A Dream Daemon is not a cron job. It is a bounded control loop that senses entropy, picks one target, and exits with a deterministic decision: PASS, DEFER, or ESCALATE.
Debt (left axis) vs P95 cost/outcome (right axis). The dashed marker tracks the selected week.
A scheduler is just a clock. The Dream Daemon is the loop that turns measured entropy into bounded work.
Start with the lowest-risk version: measure entropy and emit a report. Scheduling is optional. Automated changes come later, after you trust your sensors, budgets, and gates. The mistake is thinking “cron” is the architecture. Cron is packaging.
Scrum, Kanban, and most other delivery processes are also control loops: select work, execute, inspect, adapt. The difference is enforcement. Those loops run on meetings and social contracts. Dream takes the same posture and compiles it into executable artifacts: Sensors emit signals, a deterministic ranker selects targets, Effectors produce bounded diffs, Validators grade, and governance gates decide what is admitted.
The Dream Loop: Sense → Decide → Act → Verify
At its best, Dream is a controller with a simple posture:
- Sense: run entropy sensors and collect signals.
- Decide: rank + budget, then pick work (or defer).
- Act: dispatch to an allowlisted action to produce a diff.
- Verify: run the same Validators you require for merges.
The pattern stays the same. What changes is the depth of implementation: at Depth 0 you stop after Sense + Decide. A human performs Act. Verification still runs through the Immune System.
Dream reads both directions:
- It reads Terrain (code, metrics, drift signals) to learn what is true.
- It reads Map (policies, budgets, protected paths, ratchets) to learn what is allowed.
flowchart TD
S[Sensors] --> RB["Rank + Budget"]
RB --> M[Mission]
M --> E[Effector]
E --> IS["Immune System"]
IS --> PR[PR]
PR --> R[Review]
Dream Maintains Memory, Not Just Code
Dream is not only janitorial code automation. It is one of the ways a system keeps its operating Map alive.
When reality teaches something new, the loop should be able to route that learning back into the Map:
- a repeated incident can become a new Validator or tighter budget
- a recurring review comment can become a checklist item or Mission template
- a policy exception can become a clarified runbook, glossary rule, or escalation path
If an incident never changes the Map, the organization has no memory. It just pays tuition repeatedly.
This is where a Dream Manifest becomes useful: a versioned backlog of entropy signals, work selectors, and memory updates. It tells Dream not just what code to clean up, but what lessons to preserve.
When Dream starts touching memory surfaces that shape future behavior, provenance matters. It is not enough to know that a runbook, validator, or policy changed. You want the Ledger to say which operating or constitutional surface moved, what evidence justified the move, and who authorized it: a human steward, or a prior bounded loop acting inside an explicit delegation.
Task Selection Is the Problem
“Run a maintenance daemon” is easy to say. The hard question is: what is it allowed to work on, and how does it choose?
You want Dream to spend its budget on high-impact, high-feasibility maintenance that is low risk and easy to verify. That is a task selection problem, not an agent problem.
A practical task taxonomy
Not all maintenance work is daemon-shaped. A simple taxonomy:
| Task category | Good Dream targets? | Why |
|---|---|---|
| Deterministic hygiene (formatters, lint fixes, dead imports) | Yes | Low risk, high feasibility, easy to verify |
| Coverage gaps and missing tests | Often | The Validator signal is clear, but scope must be bounded |
| Local refactors (simplify a function, reduce complexity) | Sometimes | Useful when scoped to one symbol/file with strict budgets |
| Deduplication (remove repeated helpers, merge near-copies) | Sometimes | Risk of subtle behavior changes; needs strong tests/ratchets |
| Product behavior changes (business logic) | No by default | High risk, ambiguous intent, requires domain judgment |
| Architecture changes (new modules, moving boundaries) | Rare | Expensive and destabilizing unless tightly constrained |
Dream becomes safe when it is not “a general fixer,” but a controller for an allowlisted catalog of actions, each with known Physics.
A scoring heuristic (impact × feasibility ÷ cost × risk)
Dream does not need a perfect model of value. It needs a conservative filter.
One way to formalize the decision:
score(task) = (impact(task) * feasibility(task)) / (cost(task) * risk(task))
Where the terms are estimated from deterministic signals whenever possible:
- Impact: does this reduce incidents, unblock changes, or shrink a known debt hotspot?
- Feasibility: is there a clear Validator signal and a bounded change surface?
- Cost: how much CI time, wall-clock time, or model budget will this consume?
- Risk: what is the blast radius if it is wrong, and how reversible is it?
The important part is not the algebra. It is the posture: only attempt tasks that are easy to grade, cheap to try, and cheap to undo.
Treat the task-selection heuristic and Dream Manifest as Map surfaces:
- It is versioned.
- It is reviewable.
- It is tuned with evidence (what converged, what thrashed, what caused incidents).
- It is part of the operating Map, not just a list of chores.
Higher-order maintenance depends on that legibility. If Dream can later ask “why is this selector set this way?” or “why did this validator become mandatory?”, the answer has to live in a queryable Ledger entry, not in oral history.
Implementation Depths (Start at Depth 0)
You can implement Dream as a ladder. Each step adds autonomy, but also raises the governance bar.
Recommended rollout (safe default posture)
- Weeks 1–2: Depth 0 — run the scan weekly, tune sensors, and build trust in the evidence. No writes.
- Weeks 3–4: Depth 0.5 — schedule the report (nightly/weekly), still read-only. No diffs.
- Promote to Depth 1 only when all are true:
- Sensors are reproducible (same finding appears across runs; low false positives).
- Budgets are explicit (one target per cycle, hard diff limits, protected paths).
- Governance is wired (required checks, CODEOWNERS/branch protection, no bypass).
- Review + rollback are real (who reviews Dream PRs, and how you revert safely).
Depth 0: Sensors Only (default)
Depth 0 is a deterministic entropy scan that outputs a ranked worklist with evidence. It does not open PRs. It does not modify files. It produces targets.
This is enough to change team behavior because it makes maintenance specific:
- you stop arguing about “what to clean up”
- you stop doing maintenance only when something breaks
- you get a consistent stream of small, reviewable targets
In this repository, the Dream controller is implemented in
tools/dream.py and wired behind make dream
(one cycle) and make dream-loop (continuous mode). It can
dispatch bounded maintenance actions, so do not treat the current
implementation as “Depth 0 by default.” Treat its scan + decide phase as
the Depth 0 building block: deterministic signals turned into a ranked
target with evidence.
The scan combines a few signals:
- Coverage per file (a proxy for risk and feasibility)
- Size and complexity (a proxy for maintainability hotspots)
- Duplication (a proxy for wasted effort and inconsistent fixes)
Then it chooses one file and one allowlisted action. The rough logic is:
if coverage is low:
choose action = "tests"
elif duplication dominates:
choose action = "dedupe"
elif file is bloated (too long or too complex):
choose action = "split"
else:
choose action = "refactor"
This is not a universal truth. It is a starting heuristic you can tune. The win is that the “Decide” step is explicit and versioned. You can inspect it, adjust thresholds, and argue about it with evidence.
One repo-shaped scoring sketch (matching the spirit of
tools/dream.py):
for file in roots:
base = static_score(file) # size + complexity snapshot
cov = coverage_percent(file) # 0–100
dup = duplication_score(file) # higher = more repeated units
if cov < 60:
action = "tests"
score = 120 - cov # lower coverage = higher priority
elif cov < 75:
action = "tests"
score = 90 - cov
elif dup is dominating:
action = "dedupe"
score = base + weight(dup)
elif file is bloated:
action = "split"
score = base + penalties(size, complexity)
else:
action = "refactor"
score = base
choose the file with highest score, then do one bounded action
At Depth 0, you stop at “Sense + Decide”: emit a ranked worklist and attach evidence. A human (or a later Depth) can then pick one item and run a normal Software Development as Code (SDaC) loop against that bounded target, under the same Physics gates.
If you do implement a writer, keep the unit of work small: one target per cycle, strict diff budgets, and a hard stop when the signal doesn’t improve (Chapter 5’s minimum-progress ratchets).
Depth 0.5: Scheduled reports (no diffs)
Depth 0.5 is Depth 0 on a schedule.
You run the sensors nightly or weekly and publish the report as an artifact: a dashboard entry, a ticket, or a message with the top findings and their evidence. Nothing is modified. No diffs are generated. The automation is real, but it stays read-only.
Treat the schedule as a throttle, not as “off-hours.” Global teams and incident timelines do not respect a single clock.
This is often the lowest-friction path to adoption: you get a consistent maintenance signal without triggering the fear that “the agent is writing code in the background.”
Depth 1: Scheduled, human-approved proposals
Once you trust your sensors, you can move from “report” to “proposal”:
- the daemon picks one target per run
- it runs an allowlisted action to produce a diff
- it runs the Immune System suite
- it opens a PR for review
At this depth, the daemon is not “creative.” It is a work scheduler for a fixed catalog of maintenance actions.
Depth 2: Autonomous selection (bounded)
At higher autonomy, Dream becomes a controller: it chooses which strategy to apply based on signal shape (coverage gaps, complexity hotspots, duplication spikes). This is where you must tighten budgets and allowlists:
- one unit of work per cycle
- hard diff budgets (files/lines)
- protected paths enforced by policy
- escalation rules (defer, file ticket) instead of “try harder”
Depth 3: Autonomous merge (aspirational)
Auto-merge is possible, but only after Depth 1–2 are stable and your governance is mature. If you cannot explain why a diff exists and reproduce the verification, you cannot auto-merge it.
Daemon Safety Controls (Budgets, Gates, Kill Switches)
Dream is safe only when it is bounded. Practical controls:
- Rate limits: maximum cycles per hour/day.
- Work caps: one task per cycle; strict diff budgets (files/lines).
- Timeouts: per-effector wall-clock caps; no infinite retries.
- Cost caps: token budgets and circuit breakers (Chapter 5).
- Cool-down: after any mutation, wait before attempting another.
- Kill switch: a simple, documented way to stop Dream immediately.
- Defer path: when in doubt, file a ticket with evidence instead of acting.
The safety goal is not “never fail.” It is “fail small, fail early, and fail reversibly.”
Integration With Human Development
Dream is not a privileged actor. Treat it like any other contributor with stricter limits:
- Run in a clean workspace (or a fresh clone) and never commit directly to protected branches.
- Submit changes as PRs with labels and evidence: what signal triggered, what budgets applied, what Validators ran.
- Pause when humans are already changing the same surface (or when the branch is out of date).
- Prefer tasks that reduce review burden (small diffs, local scopes) over tasks that create it.
Before Dream opens a PR, run one deterministic conflict-avoidance check:
- refresh the base branch
- require a clean worktree
- compare the candidate paths to open PRs or active branches
- if the same surface is already moving, defer and attach the conflict as evidence
This keeps Dream from turning normal parallel work into merge-noise.
Minimal evidence payload for a Dream-generated PR:
dream_evidence:
trigger: "coverage_gap" # or duplication / complexity / drift
target: "path/to/file_or_root"
budgets:
max_files_changed: 2
max_lines_changed: 120
validators:
- "make test"
- "make lint"
decision_trace:
scan_run_id: "2026-03-06T120000Z"
why_this_target: "highest impact within allowlist"Security: Hostile Terrain and Instruction Injection
So far, we’ve focused on protecting the repository from the agent
(scope limits, Validators, protected graders). You also need the other
half of the threat model: protecting the agent from hostile
input. This is why Prep must sanitize (Chapter 2)
and why governance must include Input Hygiene at scale (Chapter 12).
Dream increases the amount of Terrain text your system reads. Every TODO comment, every ticket description, and every log line becomes an input channel, and you should treat it as adversarial.
The attack shape and Prep-layer defenses are covered in
Chapter 2 (“The attack shape”). Chapter 12 extends this to
governance-at-scale: policy validators that detect injection-shaped text
and force safe outcomes (defer, file_ticket)
rather than attempting execution.
For Dream, keep a simple rule: untrusted text never becomes authority. It can only become evidence attached to a Mission Object compiled from allowlisted templates, with explicit budgets and gates.
Actionable: What you can do this week
Pick one entropy sensor: coverage gaps, complexity hotspots, duplication, or drift between a Map surface and Terrain.
Implement Depth 0: write a deterministic scan that emits a ranked worklist with evidence. No writes.
Run it weekly: treat the output like a backlog generator, not a one-time audit.
Fix one item: pick the top target and run a bounded loop with explicit Physics gates.
Only then add autonomy: when you can predict the failure modes, add Depth 1 proposals and keep them human-approved.