Part I Build It The Weekend Sprint

Chapter 3 – Define “Good” (Your First Validators)

You’ve built a loop and named the sandwich. Now define “good” so the gate means something.

Validators turn “looks fine” into PASS/FAIL and give you deterministic failure you can act on. Start small: hard physics first (linters, types, schemas), then a few task-specific checks you can tighten over time.

Two practical rules:

Where Validators live in real systems

In production SDaC tooling, Validators are rarely “just scripts.” They usually exist as explicit Steps in a Workflow graph so the loop is inspectable and reproducible.

In the engine behind this book, the workflow runtime lives in core/workflow.py, and common Physics checks are implemented as Step classes in core/steps/ (for example LintStep, TypeCheckStep, CoverageStep, and TestStep in core/steps/code_steps.py). A controller wires these Steps together into a deterministic graph with named transitions, then records a trace.

That implementation detail matters: when Validators are first-class nodes, you can tag them, order them, retry them, visualize them, and reuse the same gates across every loop (human-written or machine-written changes).

A useful Validator output shape looks like this:

{
  "validator": "json_schema",
  "status": "fail",
  "artifact": "generated/pr_summary_bad.json",
  "errors": [
    {
      "path": "reviewer_suggestions",
      "message": "'charlie_qa' is not of type 'array'"
    }
  ]
}

The Hard Physics of Validation

Forget the philosophical debates about AI truthfulness for a moment. Instead, focus on the “hard physics” of code and data:

These are “hard physics” because they are entirely deterministic. Given an input and a rule, the outcome is always the same: valid or invalid. This is your anchor in the sea of stochastic generation.

Validator Taxonomy: Hard First, Then Semantic

Hard physics gets you admissible artifacts. It does not get you correct artifacts.

In practice, an Immune System blends:

Validation vs. evaluation (keep the gate deterministic)

It’s easy to blur two different activities:

You can absolutely use evaluation to help your Judge decide what to do next (refine, revert, escalate). But do not mistake it for Physics.

Rule: if you can’t make it deterministic, it isn’t a Validator. Treat it as a Sensor and keep it downstream of human review or hard policy.

Portability Map: Same Physics, Different Tooling

The pattern is stack-agnostic. The tool names change; the deterministic PASS/FAIL signals do not.

Surface Python TypeScript Rust
Formatting ruff format / black prettier rustfmt
Linting ruff eslint clippy
Types mypy / pyright tsc rustc
Immune System suite pytest jest cargo test

This is not limited to application code. Infrastructure and policy surfaces have hard physics too. For Terraform, a deterministic validator suite might include terraform validate, tflint, and a policy scanner; see Appendix C for a ready-to-copy recipe.

A Validator That Catches a Real Bug

Let’s anchor this with a concrete example. Imagine you’re using an LLM to help summarize pull requests and suggest reviewers. Your system expects a JSON output with specific fields. A common GenAI bug class is structural drift—the model sometimes deviates from the requested JSON format, either subtly changing field names, missing fields, or providing the wrong data type for a value.

We’ll use a JSON Schema validator to enforce the expected structure.

First, define a simple JSON Schema for our pull_request_summary.json.

File: meta/schemas/pr_summary_v1.json

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Pull Request Summary Schema",
  "description": "Schema for generated pull request summaries.",
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "Concise title for the pull request."
    },
    "description": {
      "type": "string",
      "description": "Detailed description of the changes in the pull request."
    },
    "reviewer_suggestions": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of suggested reviewers (usernames)."
    },
    "impact_scope": {
      "type": "string",
      "enum": ["minor", "medium", "major"],
      "description": "Estimated impact scope of the changes."
    }
  },
  "required": ["title", "description", "reviewer_suggestions", "impact_scope"]
}

Now, let’s look at two hypothetical GenAI outputs.

Generated output 1 (Correct):

File: generated/pr_summary_good.json

{
  "title": "Refactor User Authentication Flow",
  "description": "Rewrites the user authentication service to improve security and performance. Migrates from JWT to session-based authentication.",
  "reviewer_suggestions": ["alice", "bob_dev"],
  "impact_scope": "major"
}

Generated output 2 (Incorrect - a common error):

In this example, reviewer_suggestions is a string instead of an array.

File: generated/pr_summary_bad.json

{
  "title": "Fix: Login Bug",
  "description": "Corrects an issue where users could not log in if their password contained special characters.",
  "reviewer_suggestions": "charlie_qa",
  "impact_scope": "minor"
}

Here’s a simple Python script (validate_pr_summary.py) that uses the jsonschema library to validate these outputs against our schema:


# Example: validate_pr_summary.py
import json
import sys
from jsonschema import validate, ValidationError

def validate_json_file(schema_path, data_path):
    with open(schema_path, 'r') as f:
        schema = json.load(f)
    with open(data_path, 'r') as f:
        data = json.load(f)

    try:
        validate(instance=data, schema=schema)
        print(f"Validation PASSED for {data_path}")
        return True
    except ValidationError as e:
        print(f"Validation FAILED for {data_path}:")
        print(e.message)
        print(f"Path: {'.'.join(str(p) for p in e.path)}")
        return False
    except json.JSONDecodeError as e:
        print(f"ERROR: Invalid JSON file {data_path}: {e}")
        return False

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python validate_pr_summary.py <schema_path> <data_path>")
        sys.exit(1)

    schema_file = sys.argv[1]
    data_file = sys.argv[2]

    if not validate_json_file(schema_file, data_file):
        sys.exit(1) # Indicate failure with a non-zero exit code

Running the validator:


# Validate the good output
python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_good.json

# Expected output: Validation PASSED for generated/pr_summary_good.json

# Validate the bad output
python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_bad.json

# Expected output:

# Validation FAILED for generated/pr_summary_bad.json:

# 'charlie_qa' is not of type 'array'

# Path: reviewer_suggestions

This is your deterministic gate. The validate_pr_summary.py script exits with 0 for success and 1 for failure. This exit code is the universal signal a build system needs to decide whether to proceed or halt. This simple validator immediately catches a real, common class of GenAI errors—structure violations—before they ever hit a downstream system.

Failure Modes: False Positives and False Negatives

Once you have Validators, you’ll hit the two classic failure modes:

The fix is not to abandon validation. The fix is to tune it.

Example false positive (too strict)

Imagine you require impact_scope for every PR summary. That’s reasonable for code changes, but maybe your workflow allows docs-only PRs where impact_scope is intentionally omitted. Your JSON Schema would fail an otherwise useful summary.

Tune options:

Example false negative (too weak)

Our schema doesn’t prove that "reviewer_suggestions" is a good list. This passes schema validation but is still wrong in practice:

{
  "reviewer_suggestions": ["definitely_not_a_user_123"]
}

Tune options:

Composing Validators: Your Immune System

One validator is good, but multiple validators working together form a more robust “immune system” for your generated artifacts. You can compose them by chaining them together, often using a simple script or a Makefile.

Consider your generated directory contains not just JSON, but also Python code and documentation. Each might need its own type of validation:

Your Makefile or validate.sh script could then look like this:


# Example: Makefile
.RECIPEPREFIX := >
.PHONY: validate

validate: validate_json validate_python

validate_json:
> echo "--- Validating JSON schemas ---"
> python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_good.json
> python validate_pr_summary.py meta/schemas/pr_summary_v1.json generated/pr_summary_bad.json || (echo "JSON validation found errors. See above." && exit 1)

validate_python:
> echo "--- Linting Python files ---"
> # This assumes generated/module.py exists.
> black --check generated/module.py || (echo "Black linting failed." && exit 1)
> mypy generated/module.py || (echo "Mypy type checking failed." && exit 1)

In this Makefile, if validate_pr_summary.py exits with a non-zero code for generated/pr_summary_bad.json, the || (echo ... && exit 1) ensures that the validate target will fail immediately, preventing subsequent steps. This creates a powerful, composable error detection system.

At this point, you have seen how to build a deterministic gate for your AI-generated outputs. Any deviation from your explicit definition of “good” (as encoded in schemas, linters, or type systems) will halt your SDaC loop, preventing erroneous outputs from propagating.

Coverage and Strictness: A Ladder, Not a Cliff

Don’t aim for perfect validation from the start. Think of building your validator “immune system” as a ladder:

  1. Start with the basics (Minimal Coverage, Low Strictness): Ensure your outputs are valid JSON/YAML/Python. Check for the absolute minimum required fields. This catches the most egregious, system-breaking errors. Our JSON Schema example started here, ensuring title, description, reviewer_suggestions, and impact_scope are present and of the correct basic type.

  2. Increase Coverage (More Files, More Structures): Extend validators to all generated artifacts. If you generate 5 types of JSON, write 5 schemas. If you generate 10 Python files, lint and type-check all 10.

  3. Increase Strictness (More Granular Rules): Once basic structure is guaranteed, start adding more specific rules. For example, in our pr_summary_v1.json, we already added an enum for impact_scope ("minor", "medium", "major"). You could add regex patterns for specific fields, minimum/maximum lengths, or ensure values are within a certain range. This catches more subtle, but still critical, semantic errors.

  4. Custom Validators (Business Logic): For very specific business rules that can’t be expressed purely in schemas or linters, write custom scripts. For instance, a script that checks if all suggested reviewers (reviewer_suggestions) are actual known users in your system.

Each step up the ladder adds more reliability. The key is to build this incrementally, focusing on the highest-impact checks first, and tightening your grip on “good” over time.


Actionable: What you can do this week

  1. Identify a Generated Output: Pick one artifact generated by an LLM in your current workflow (e.g., a config file, a code snippet, a report).

  2. Define a Simple Schema/Rule: For that output, identify one clear, deterministic rule it must follow. Examples:

    • “It must be valid JSON.”

    • “It must be valid Python syntax.”

    • “If it’s a JSON file, it must have a name and version field.”

  3. Implement a Basic Validator:

    • If it’s JSON, write a simple JSON Schema and a Python script (like validate_pr_summary.py) to validate it.

    • If it’s Python, write a validate.sh script that runs python -m py_compile <your_generated_file.py> or black --check <your_generated_file.py>.

  4. Integrate and Observe: Add this validator to your existing SDaC Makefile or validation script. Intentionally generate an output that violates your rule and confirm that your build fails deterministically. Then, generate a correct output and confirm it passes.

  5. Expand (Optional): If you’re feeling ambitious, add a second, different type of validator (e.g., if you have a JSON schema, add a linter for a Python script).

  6. Tune one failure mode: Find one false positive or false negative and fix it by splitting schemas, adding a semantic validator, or making the selection logic explicit in Prep.