Chapter 3 – Define “Good” (Your First Validators)
You’ve built a loop and named the sandwich. Now define “good” so the gate means something.
Validators turn “looks fine” into PASS/FAIL and give you
deterministic failure you can act on. Start small: hard physics first
(linters, types, schemas), then a few task-specific checks you can
tighten over time.
Two practical rules:
- A Validator is a Sensor, not a Judge. It emits a structured signal; the Judge decides whether to refine, revert, or escalate.
- Put the cheapest Validators first. Failing fast is not pessimism; it’s throughput.
Where Validators live in real systems
In production SDaC tooling, Validators are rarely “just scripts.” They usually exist as explicit Steps in a Workflow graph so the loop is inspectable and reproducible.
In the engine behind this book, the workflow runtime lives in
core/workflow.py, and common Physics checks are implemented
as Step classes in core/steps/ (for example
LintStep, TypeCheckStep,
CoverageStep, and TestStep in
core/steps/code_steps.py). A controller wires these Steps
together into a deterministic graph with named transitions, then records
a trace.
That implementation detail matters: when Validators are first-class nodes, you can tag them, order them, retry them, visualize them, and reuse the same gates across every loop (human-written or machine-written changes).
A useful Validator output shape looks like this:
{
"validator": "json_schema",
"status": "fail",
"artifact": "generated/pr_summary_bad.json",
"errors": [
{
"path": "reviewer_suggestions",
"message": "'charlie_qa' is not of type 'array'"
}
]
}The Hard Physics of Validation
Forget the philosophical debates about AI truthfulness for a moment. Instead, focus on the “hard physics” of code and data:
Linters: Do your generated code blocks conform to a style guide? Are they valid syntax?
Type Checkers: Does the generated data structure use the expected data types (e.g., an integer where an integer is expected, not a string)?
Schema Validators: Does the overall structure of the generated output match a predefined schema (e.g., a JSON Schema, a Protobuf definition, a Pydantic model)?
Contract Tests: Does the generated output fulfill a specific contract, like an API request or response format?
Policy Checks: Does the generated content comply with specific policies, such as security rules or legal disclaimers?
These are “hard physics” because they are entirely deterministic. Given an input and a rule, the outcome is always the same: valid or invalid. This is your anchor in the sea of stochastic generation.
Validator Taxonomy: Hard First, Then Semantic
Hard physics gets you admissible artifacts. It does not get you correct artifacts.
In practice, an Immune System blends:
- Hard Physics Validators: syntax, formatting, types, schemas, path invariants.
- Terminology Validators: banned/redirect terms, canonical vocabulary, contract language.
- Semantic Validators: domain invariants you care about (e.g., “reviewers must be real users,” “no new public API without docs,” “diff touches only allowed paths”).
- Policy Validators: security and compliance rules (secrets scanning, licensing checks, dependency allow/deny lists).
Validation vs. evaluation (keep the gate deterministic)
It’s easy to blur two different activities:
- Validation (Physics): deterministic
PASS/FAIL. Same input, same result. This is what blocks merges and halts loops. - Evaluation (Judgement support): heuristic or model-based scoring. Useful, but not reproducible enough to be a gate.
You can absolutely use evaluation to help your Judge decide what to do next (refine, revert, escalate). But do not mistake it for Physics.
Rule: if you can’t make it deterministic, it isn’t a Validator. Treat it as a Sensor and keep it downstream of human review or hard policy.
Portability Map: Same Physics, Different Tooling
The pattern is stack-agnostic. The tool names change; the deterministic PASS/FAIL signals do not.
| Surface | Python | TypeScript | Rust |
|---|---|---|---|
| Formatting | ruff format /
black |
prettier |
rustfmt |
| Linting | ruff |
eslint |
clippy |
| Types | mypy /
pyright |
tsc |
rustc |
| Immune System suite | pytest |
jest |
cargo test |
This is not limited to application code. Infrastructure and policy
surfaces have hard physics too. For Terraform, a deterministic validator
suite might include terraform validate,
tflint, and a policy scanner; see Appendix C for a
ready-to-copy recipe.
A Validator That Catches a Real Bug
Let’s anchor this with a concrete example. Imagine you’re using an LLM to help summarize pull requests and suggest reviewers. Your system expects a JSON output with specific fields. A common GenAI bug class is structural drift—the model sometimes deviates from the requested JSON format, either subtly changing field names, missing fields, or providing the wrong data type for a value.
We’ll use a JSON Schema validator to enforce the expected structure.
First, define a simple JSON Schema for our
pull_request_summary.json.
File: meta/schemas/pr_summary_v1.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Pull Request Summary Schema",
"description": "Schema for generated pull request summaries.",
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Concise title for the pull request."
},
"description": {
"type": "string",
"description": "Detailed description of the changes in the pull request."
},
"reviewer_suggestions": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of suggested reviewers (usernames)."
},
"impact_scope": {
"type": "string",
"enum": ["minor", "medium", "major"],
"description": "Estimated impact scope of the changes."
}
},
"required": ["title", "description", "reviewer_suggestions", "impact_scope"]
}Now, let’s look at two hypothetical GenAI outputs.
Generated output 1 (Correct):
File: generated/pr_summary_good.json
{
"title": "Refactor User Authentication Flow",
"description": "Rewrites the user authentication service to improve security and performance. Migrates from JWT to session-based authentication.",
"reviewer_suggestions": ["alice", "bob_dev"],
"impact_scope": "major"
}Generated output 2 (Incorrect - a common error):
In this example, reviewer_suggestions is a string
instead of an array.
File: generated/pr_summary_bad.json
{
"title": "Fix: Login Bug",
"description": "Corrects an issue where users could not log in if their password contained special characters.",
"reviewer_suggestions": "charlie_qa",
"impact_scope": "minor"
}Here’s a simple Python script (validate_pr_summary.py)
that uses the jsonschema library to validate these outputs
against our schema:
# Example: validate_pr_summary.py
import json
import sys
from jsonschema import validate, ValidationError
def validate_json_file(schema_path, data_path):
with open(schema_path, 'r') as f:
schema = json.load(f)
with open(data_path, 'r') as f:
data = json.load(f)
try:
validate(instance=data, schema=schema)
print(f"Validation PASSED for {data_path}")
return True
except ValidationError as e:
print(f"Validation FAILED for {data_path}:")
print(e.message)
print(f"Path: {'.'.join(str(p) for p in e.path)}")
return False
except json.JSONDecodeError as e:
print(f"ERROR: Invalid JSON file {data_path}: {e}")
return False
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python validate_pr_summary.py <schema_path> <data_path>")
sys.exit(1)
schema_file = sys.argv[1]
data_file = sys.argv[2]
if not validate_json_file(schema_file, data_file):
sys.exit(1) # Indicate failure with a non-zero exit codeRunning the validator:
# Validate the good output
python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_good.json
# Expected output: Validation PASSED for generated/pr_summary_good.json
# Validate the bad output
python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_bad.json
# Expected output:
# Validation FAILED for generated/pr_summary_bad.json:
# 'charlie_qa' is not of type 'array'
# Path: reviewer_suggestionsThis is your deterministic gate. The
validate_pr_summary.py script exits with 0 for
success and 1 for failure. This exit code is the universal
signal a build system needs to decide whether to proceed or halt. This
simple validator immediately catches a real, common class of GenAI
errors—structure violations—before they ever hit a downstream
system.
Failure Modes: False Positives and False Negatives
Once you have Validators, you’ll hit the two classic failure modes:
- False positive: the Validator fails on an artifact you would accept.
- False negative: the Validator passes an artifact you would reject.
The fix is not to abandon validation. The fix is to tune it.
Example false positive (too strict)
Imagine you require impact_scope for every PR summary.
That’s reasonable for code changes, but maybe your workflow allows
docs-only PRs where impact_scope is intentionally omitted.
Your JSON Schema would fail an otherwise useful summary.
Tune options:
- Split the schema into variants (
pr_summary_code_v1.jsonvspr_summary_docs_v1.json) and select deterministically inPrep. - Keep one schema but make
impact_scopeoptional, then add a separate Policy Validator that requires it only when the diff includes code paths.
Example false negative (too weak)
Our schema doesn’t prove that "reviewer_suggestions" is
a good list. This passes schema validation but is still wrong
in practice:
{
"reviewer_suggestions": ["definitely_not_a_user_123"]
}Tune options:
- Add a Semantic Validator that cross-checks suggestions against an allowlist (or your directory) and fails if any are unknown.
- Add a budget (“at most 3 suggestions”) and a policy (“must include at least 1 code owner when touching protected paths”).
Composing Validators: Your Immune System
One validator is good, but multiple validators working together form
a more robust “immune system” for your generated artifacts. You can
compose them by chaining them together, often using a simple script or a
Makefile.
Consider your generated directory contains not just
JSON, but also Python code and documentation. Each might need its own
type of validation:
.pyfiles:black(linter),mypy(type checker).jsonfiles:jsonschema(schema validator).mdfiles:markdownlint(style checker)
Your Makefile or validate.sh script could
then look like this:
# Example: Makefile
.RECIPEPREFIX := >
.PHONY: validate
validate: validate_json validate_python
validate_json:
> echo "--- Validating JSON schemas ---"
> python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_good.json
> python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_bad.json || (echo "JSON validation found errors. See above." && exit 1)
validate_python:
> echo "--- Linting Python files ---"
> # This assumes generated/module.py exists.
> black --check generated/module.py || (echo "Black linting failed." && exit 1)
> mypy generated/module.py || (echo "Mypy type checking failed." && exit 1)In this Makefile, if validate_pr_summary.py
exits with a non-zero code for
generated/pr_summary_bad.json, the
|| (echo ... && exit 1) ensures that the
validate target will fail immediately, preventing
subsequent steps. This creates a powerful, composable error detection
system.
At this point, you have seen how to build a deterministic gate for your AI-generated outputs. Any deviation from your explicit definition of “good” (as encoded in schemas, linters, or type systems) will halt your SDaC loop, preventing erroneous outputs from propagating.
Coverage and Strictness: A Ladder, Not a Cliff
Don’t aim for perfect validation from the start. Think of building your validator “immune system” as a ladder:
Start with the basics (Minimal Coverage, Low Strictness): Ensure your outputs are valid JSON/YAML/Python. Check for the absolute minimum required fields. This catches the most egregious, system-breaking errors. Our JSON Schema example started here, ensuring
title,description,reviewer_suggestions, andimpact_scopeare present and of the correct basic type.Increase Coverage (More Files, More Structures): Extend validators to all generated artifacts. If you generate 5 types of JSON, write 5 schemas. If you generate 10 Python files, lint and type-check all 10.
Increase Strictness (More Granular Rules): Once basic structure is guaranteed, start adding more specific rules. For example, in our
pr_summary_v1.json, we already added anenumforimpact_scope("minor", "medium", "major"). You could add regex patterns for specific fields, minimum/maximum lengths, or ensure values are within a certain range. This catches more subtle, but still critical, semantic errors.Custom Validators (Business Logic): For very specific business rules that can’t be expressed purely in schemas or linters, write custom scripts. For instance, a script that checks if all suggested reviewers (
reviewer_suggestions) are actual known users in your system.
Each step up the ladder adds more reliability. The key is to build this incrementally, focusing on the highest-impact checks first, and tightening your grip on “good” over time.
Actionable: What you can do this week
Identify a Generated Output: Pick one artifact generated by an LLM in your current workflow (e.g., a config file, a code snippet, a report).
Define a Simple Schema/Rule: For that output, identify one clear, deterministic rule it must follow. Examples:
“It must be valid JSON.”
“It must be valid Python syntax.”
“If it’s a JSON file, it must have a
nameandversionfield.”
Implement a Basic Validator:
If it’s JSON, write a simple JSON Schema and a Python script (like
validate_pr_summary.py) to validate it.If it’s Python, write a
validate.shscript that runspython -m py_compile <your_generated_file.py>orblack --check <your_generated_file.py>.
Integrate and Observe: Add this validator to your existing SDaC
Makefileor validation script. Intentionally generate an output that violates your rule and confirm that your build fails deterministically. Then, generate a correct output and confirm it passes.Expand (Optional): If you’re feeling ambitious, add a second, different type of validator (e.g., if you have a JSON schema, add a linter for a Python script).
Tune one failure mode: Find one false positive or false negative and fix it by splitting schemas, adding a semantic validator, or making the selection logic explicit in
Prep.