AT-301c · Module 1
Critic Agent Design
4 min read
A critic agent without explicit scoring dimensions is just an opinion generator. The critic must evaluate on defined, measurable axes — and the axes must be specific to the task type, not generic. For code review: correctness, performance, readability, test coverage. For content: voice accuracy, factual integrity, structural coherence, engagement metrics. For design: faithfulness, hierarchy clarity, accessibility, pixel precision.
Each dimension gets a 1-10 score with explicit rubrics. A 7 on "voice accuracy" means the output is identifiable as the correct agent within 3 sentences. A 4 means the voice is generic. The rubric eliminates subjectivity — two different critic instances should produce scores within 0.8 points of each other on the same artifact. If they diverge further, the rubric needs tightening.
Do This
- Define 3-5 scoring dimensions per task type with explicit rubrics
- Set pass/fail thresholds per dimension — a 9 in three dimensions does not compensate for a 3 in one
- Calibrate rubrics by testing two critics on the same artifact — scores should converge within 0.8
Avoid This
- Ask the critic "is this good?" — vague questions produce vague answers
- Use a single composite score — it hides the dimension where the work actually failed
- Let the critic freelance its evaluation criteria — consistency requires structure