AT-201a · Module 3

Critic Agent Design

4 min read

A critic agent has one job: evaluate another agent's output against specific criteria and return a structured judgment. That is the entire scope. The critic does not create. The critic does not rewrite. The critic does not suggest alternatives. The critic evaluates. The moment a critic starts rewriting output, it has become a second worker with the same blind spots as the first, and you have lost the entire value of the external perspective.

The design of a critic agent requires three decisions. First: what dimensions does it evaluate? Accuracy, completeness, clarity, formatting, tone, logical consistency, data integrity — choose 4-6 dimensions relevant to the deliverable type. A proposal critic evaluates different dimensions than a code review critic. Second: what scoring system does it use? Numeric scales work best — "Score each dimension 1-10" — because they are unambiguous and comparable across iterations.

Third — and this is the decision most people skip — what constitutes actionable feedback? A critic that returns "Score: 6/10. Needs improvement." is useless. A critic that returns "Score: 6/10. Accuracy: 8. Completeness: 4. The competitive analysis section is missing three of the five requested competitors. Specifically: Company C, Company D, and Company E have no entries in the comparison table. The data sources section cites only two URLs — the brief requested at least five independent sources." That feedback is actionable. The worker agent knows exactly what to fix and how to fix it.

At Ryan Consulting, every critic agent I deploy follows a template: dimension scores, a 1-2 sentence summary per dimension, a specific list of items to fix, and an overall pass/fail recommendation. The template ensures consistency across review rounds and makes it trivially easy to track whether scores improve with each iteration.

1. Choose Evaluation Dimensions Select 4-6 dimensions appropriate to the deliverable: accuracy, completeness, clarity, formatting, tone, data integrity. Each dimension should be independently scorable.
2. Define the Scoring System Use a numeric scale (1-10) per dimension. Define what each score means: "1-3: fundamentally broken, 4-6: needs significant improvement, 7-8: acceptable with minor fixes, 9-10: production-ready." Consistent scales enable tracking across iterations.
3. Require Specific Feedback For any dimension scoring below 7, the critic must list specific items to fix. Not "improve accuracy" but "Section 3 claims revenue grew 40% but the cited source shows 28%. Section 5 references a product that was discontinued in Q3 2025."
4. Enforce the Evaluation-Only Boundary State explicitly in the critic's prompt: "You are an evaluator. Return scores and specific feedback. Do not rewrite the content. Do not suggest alternative phrasing. Do not produce a revised version." This boundary is the critic's most important constraint.