AT-301c · Module 1

Critic Agent Design

4 min read

A critic agent without explicit scoring dimensions is just an opinion generator. The critic must evaluate on defined, measurable axes — and the axes must be specific to the task type, not generic. For code review: correctness, performance, readability, test coverage. For content: voice accuracy, factual integrity, structural coherence, engagement metrics. For design: faithfulness, hierarchy clarity, accessibility, pixel precision.

Each dimension gets a 1-10 score with explicit rubrics. A 7 on "voice accuracy" means the output is identifiable as the correct agent within 3 sentences. A 4 means the voice is generic. The rubric eliminates subjectivity — two different critic instances should produce scores within 0.8 points of each other on the same artifact. If they diverge further, the rubric needs tightening.

Do This

Define 3-5 scoring dimensions per task type with explicit rubrics
Set pass/fail thresholds per dimension — a 9 in three dimensions does not compensate for a 3 in one
Calibrate rubrics by testing two critics on the same artifact — scores should converge within 0.8

Avoid This

Ask the critic "is this good?" — vague questions produce vague answers
Use a single composite score — it hides the dimension where the work actually failed
Let the critic freelance its evaluation criteria — consistency requires structure