SA-301f · Module 3

Circuit Breaker Design

4 min read

The circuit breaker prevents a failing dependency from taking down the caller. When a downstream service fails, the circuit breaker opens — all calls to that service are immediately short-circuited to a fallback response without making the actual call. This prevents the caller from accumulating timeout connections and exhausting its own resources while waiting for a service that is not going to respond. The circuit breaker is the architectural fuse that sacrifices one capability to protect the system.

Three States Closed: normal operation, all calls pass through. Open: the dependency has failed, all calls return the fallback immediately. Half-open: a limited number of test calls are allowed through to check if the dependency has recovered. If the test calls succeed, the circuit closes. If they fail, the circuit reopens. The transition thresholds — how many failures trigger open, how long to wait before half-open, how many successes close — are tuning parameters specific to each dependency.
Threshold Configuration Set the failure threshold based on the dependency's error budget, not an arbitrary number. A dependency with a 99.9% availability SLA has an error budget of 0.1% — the circuit should open when failures exceed that rate. Set the recovery window based on the dependency's typical recovery time. A service that restarts in 30 seconds needs a 30-second open window. A service that requires manual intervention needs a 5-minute window.
Fallback Design When the circuit is open, the fallback response determines the caller's behavior. Cached data serves stale-but-available responses. Default values serve degraded-but-functional responses. Error responses force the caller to handle the unavailability explicitly. The fallback is a design decision that defines how the system degrades — design it intentionally, not as an afterthought.