RC-401h · Module 4

Capstone Scenario: Deploy a 3-Agent System with Versioned Prompts

6 min read

This lesson is the capstone scenario — a complete, end-to-end deployment exercise that integrates every concept from Modules 1 through 3. You will architect and deploy a 3-agent system: a lead agent (CLAWMANDER pattern), a specialist agent (FORGE pattern), and a monitor agent. Each agent gets a role-specific system message registered in the prompt library, versioned, validated against a schema contract, and promoted through the dev → staging → prod pipeline. The handoff protocol between lead and specialist is defined, tested, and instrumented. The monitor agent's degradation thresholds are set and wired to the telemetry layer.

This is not a conceptual exercise. By the end of this lesson, you will have a working architecture specification — a document that, if handed to an engineering team, gives them enough precision to build the system without a follow-up meeting. FORGE does not write proposals that require clarification. This capstone does not produce architecture diagrams that require interpretation.

Define the System Scope and Agent Roles Document the 3-agent system's purpose in one sentence. Assign each agent its role: lead (orchestrates task routing and escalation), specialist (executes a single defined task type), monitor (evaluates specialist output quality and flags degradation). For each role, write the one-paragraph scope definition: what this agent is accountable for and what is explicitly not its responsibility. These scope definitions are the foundations of the system messages. Scope first. Prompt second.
Write and Register the System Messages Author three system messages using the role-specific templates from Module 2. Register each in the prompt library with namespace, owner, model target, token budget, and schema reference. Assign version 1.0.0 to each. The lead agent's system message includes the handoff routing logic referencing the specialist by its agent ID. The monitor agent's system message references the specialist's output schema as its evaluation criterion. Cross-agent references in prompts are first-class library relationships — document them in the library metadata.
Define and Validate the Handoff Schema Write the Zod schema for the lead-to-specialist handoff payload. Include all required fields, conditional constraints, and validation error messages. Write a schema test file with 10 cases: 5 valid payloads of varying complexity, 3 invalid payloads that should fail specific validation rules, and 2 edge cases at the boundary of valid. Run the schema tests. All 10 must pass before the schema is registered in the library.
Build the Regression Suite Write the golden set for each agent: 20 input cases covering the full behavioral range of the prompt. For the specialist, include adversarial inputs that probe the scope boundary — requests that are almost in scope but should be rejected or escalated. For the monitor, include degraded specialist outputs that should trigger an alert. Run the suite against the 1.0.0 versions. Record the baseline scores. These scores are the benchmark that all future versions must meet or exceed.
Set Up Instrumentation and Promote to Prod Configure the telemetry middleware for all three agents: schema validation on 100% of calls, LLM-as-judge on 5% sample, degradation alert thresholds at 95% schema valid rate and 75 judge score. Promote each prompt from dev → staging (run the regression suite against the staging variants) → prod. Document the promotion in the library metadata with the date, the regression suite results, and the name of the approving reviewer. The system is live. The monitor is watching. The rollback path is clear.