MP-301i · Module 3

Canary Deployments

3 min read

Canary deployments route a small percentage of traffic to the new version while the majority continues on the old version. If the canary shows elevated errors, increased latency, or unexpected behavior, you roll it back before the problem affects most users. For MCP servers, the canary percentage should start at 1-5% of sessions, not requests — routing individual requests to different versions within the same session would break stateful interactions. The load balancer assigns new sessions to the canary version based on a percentage, and all requests within that session go to the same version.

Canary analysis must compare the canary's metrics against the baseline's metrics, not against static thresholds. A 2% error rate might be normal (baseline is also 2%) or catastrophic (baseline is 0.1%). Automated canary analysis tools (Kayenta, Flagger) compute a statistical score by comparing latency distributions, error rates, and custom metrics between the canary and baseline populations. If the score falls below a threshold, the canary is automatically rolled back. This removes human judgment from the loop — which is essential at 3 AM when judgment is impaired.

Progressive delivery extends canary deployments with multiple stages: 1% for 10 minutes (catch obvious regressions), 10% for 30 minutes (catch subtle regressions with statistical significance), 50% for 1 hour (catch long-tail issues), then 100% (full rollout). At each stage, automated analysis gates decide whether to proceed, hold, or rollback. Between stages, the deployment pauses and collects metrics. This turns deployment from a binary event (deployed or not) into a gradual process with multiple safety checkpoints.

  1. Configure session-level routing Set the load balancer to route new sessions (not individual requests) to the canary based on a configurable percentage. Start at 1%. Ensure all requests within a session hit the same version.
  2. Define canary success criteria Compare canary vs baseline on: p99 latency (within 10% of baseline), error rate (within 0.5% of baseline), and tool success rate (within 1% of baseline). Automate the comparison.
  3. Implement automated rollback If canary metrics breach the success criteria at any stage, automatically route 100% of traffic back to the baseline version. Alert the on-call engineer. Do not wait for human confirmation to rollback.