CDX-301e · Module 3
Auto-Scaling Strategies
3 min read
Auto-scaling adjusts concurrency limits and warm pool sizes dynamically based on demand signals. The simplest signal is queue depth: when the queue grows beyond a threshold, scale up; when it drains, scale down. More sophisticated signals include time-of-day patterns (higher concurrency during business hours), event-driven spikes (scale up when a batch workflow triggers), and cost-based ceilings (reduce concurrency when monthly spend approaches the budget limit).
Auto-scaling for Codex Cloud has a unique constraint that traditional compute scaling does not: the human review bottleneck. Scaling from 5 to 50 parallel tasks produces 10x more output — but if only one developer reviews the results, the review queue becomes the bottleneck, not the execution queue. Effective auto-scaling must account for downstream capacity: how many branches can the team review and merge per hour? Scale execution up to that rate and no further. Beyond that point, you are producing inventory that sits in the review queue, not delivering value faster.
# Auto-scaling configuration
scaling:
min_concurrency: 2
max_concurrency: 20
scale_up:
trigger: queue_depth > 10
increment: 3
cooldown: 5m
scale_down:
trigger: queue_depth == 0 for 10m
decrement: 2
cooldown: 10m
cost_ceiling:
daily_max: $50
action: pause_queue # Stop accepting new tasks
# Time-based scheduling
schedule:
business_hours: # 8am-6pm weekdays
max_concurrency: 20
off_hours:
max_concurrency: 5
weekend:
max_concurrency: 2
# Review-aware scaling
review_capacity:
max_pending_reviews: 15 # Pause execution when 15 PRs await review
resume_threshold: 5 # Resume when backlog drops to 5
Do This
- Set cost ceilings to prevent runaway spending during auto-scale events
- Account for downstream review capacity when setting max concurrency
- Use cooldown periods to prevent rapid oscillation between scale-up and scale-down
- Monitor end-to-end throughput (submit to merge) not just execution throughput
Avoid This
- Auto-scale without a cost ceiling — burst traffic can exhaust monthly budgets in hours
- Scale based solely on queue depth without considering review bandwidth
- Set cooldown periods too short — you will oscillate between scaling up and down constantly
- Treat auto-scaling as a set-and-forget configuration — review metrics weekly and adjust