CW-201a · Module 2

Accuracy Verification Loops

4 min read

AI-generated content has a specific failure mode that most people underestimate: confident inaccuracy. The output reads well, sounds authoritative, and is completely wrong on a specific fact. The confidence is the problem — it makes the error invisible to a casual reader. Accuracy verification loops are the systematic defense.

The trust-but-verify pipeline works like this. A creation agent produces the deliverable. A fact-checking agent extracts every factual claim from the deliverable — names, dates, numbers, quotes, statistics, causal claims — and lists them. Then the fact-checking agent attempts to verify each claim against its available sources. Claims that verify get a green flag. Claims that cannot be verified get a yellow flag. Claims that contradict available evidence get a red flag.

The yellow flags are the most important output. Red flags are obvious — the fact is wrong and needs correction. Green flags need no action. But yellow flags — claims that cannot be verified — require human judgment. Maybe the claim is true but the verification agent could not find a source. Maybe the claim is plausible but unsubstantiated. Maybe the claim is the kind of statement that should not appear in a professional deliverable without a citation. The human decides. The verification loop surfaces the decisions that need to be made.

This is fundamentally different from asking an agent to "make sure everything is accurate." That prompt lets the agent evaluate its own confidence, which is exactly the problem. The verification loop separates the claimer from the checker. The creation agent makes claims. The verification agent challenges them. Structural separation produces structural accountability.

1. Extract Claims The verification agent reads the deliverable and extracts every factual claim: statistics, dates, names, causal statements, quotations, and assertions of fact. Each claim becomes a line item to verify.
2. Verify Against Sources For each claim, the verification agent searches for corroborating evidence. It checks web sources, provided reference documents, and its own knowledge. Each claim gets a status: verified, unverified, or contradicted.
3. Flag and Categorize Green: verified with source. Yellow: unable to verify — may be true but no supporting evidence found. Red: contradicted by available evidence. The flag report goes to the human for review.
4. Human Review and Revision The human reviews yellow and red flags, makes decisions, and instructs the creation agent to revise. Red flags get corrected. Yellow flags get either cited, softened with caveats, or removed. Green flags stay as-is.