CDX-301e · Module 3

Auto-Scaling Strategies

3 min read

Auto-scaling adjusts concurrency limits and warm pool sizes dynamically based on demand signals. The simplest signal is queue depth: when the queue grows beyond a threshold, scale up; when it drains, scale down. More sophisticated signals include time-of-day patterns (higher concurrency during business hours), event-driven spikes (scale up when a batch workflow triggers), and cost-based ceilings (reduce concurrency when monthly spend approaches the budget limit).

Auto-scaling for Codex Cloud has a unique constraint that traditional compute scaling does not: the human review bottleneck. Scaling from 5 to 50 parallel tasks produces 10x more output — but if only one developer reviews the results, the review queue becomes the bottleneck, not the execution queue. Effective auto-scaling must account for downstream capacity: how many branches can the team review and merge per hour? Scale execution up to that rate and no further. Beyond that point, you are producing inventory that sits in the review queue, not delivering value faster.

# Auto-scaling configuration

scaling:
  min_concurrency: 2
  max_concurrency: 20
  scale_up:
    trigger: queue_depth > 10
    increment: 3
    cooldown: 5m
  scale_down:
    trigger: queue_depth == 0 for 10m
    decrement: 2
    cooldown: 10m
  cost_ceiling:
    daily_max: $50
    action: pause_queue        # Stop accepting new tasks

# Time-based scheduling
schedule:
  business_hours:              # 8am-6pm weekdays
    max_concurrency: 20
  off_hours:
    max_concurrency: 5
  weekend:
    max_concurrency: 2

# Review-aware scaling
review_capacity:
  max_pending_reviews: 15      # Pause execution when 15 PRs await review
  resume_threshold: 5          # Resume when backlog drops to 5

Do This

Set cost ceilings to prevent runaway spending during auto-scale events
Account for downstream review capacity when setting max concurrency
Use cooldown periods to prevent rapid oscillation between scale-up and scale-down
Monitor end-to-end throughput (submit to merge) not just execution throughput

Avoid This

Auto-scale without a cost ceiling — burst traffic can exhaust monthly budgets in hours
Scale based solely on queue depth without considering review bandwidth
Set cooldown periods too short — you will oscillate between scaling up and down constantly
Treat auto-scaling as a set-and-forget configuration — review metrics weekly and adjust