CDX-301i · Module 1
Priority Scheduling & SLA Management
3 min read
SLA management defines response time guarantees for each task priority level. A production incident (critical priority) might have a 5-minute SLA — the agent must begin working on it within 5 minutes of submission. A routine documentation update (low priority) might have a 4-hour SLA. The dispatcher uses these SLAs to make scheduling decisions: if a critical task arrives and all agents are busy with normal tasks, the dispatcher may preempt a normal task to meet the critical SLA.
Priority inversion is the primary risk in SLA-managed systems. A flood of high-priority tasks can starve low-priority tasks indefinitely — the documentation update never runs because review tasks keep arriving. Two mechanisms prevent starvation: aging (a task's effective priority increases the longer it waits) and reserved capacity (a percentage of the agent pool is reserved for low-priority tasks and cannot be preempted). Both mechanisms ensure that every task eventually runs, even during high-load periods.
from dataclasses import dataclass
from datetime import datetime, timedelta
@dataclass
class SLAPolicy:
priority: int
max_wait_minutes: int # Must start within this window
max_duration_minutes: int # Must complete within this window
preempt_lower: bool # Can preempt lower-priority tasks
SLA_POLICIES = {
"critical": SLAPolicy(0, max_wait_minutes=5,
max_duration_minutes=30, preempt_lower=True),
"high": SLAPolicy(1, max_wait_minutes=15,
max_duration_minutes=60, preempt_lower=True),
"normal": SLAPolicy(2, max_wait_minutes=60,
max_duration_minutes=120, preempt_lower=False),
"low": SLAPolicy(3, max_wait_minutes=240,
max_duration_minutes=480, preempt_lower=False),
}
def check_sla_breach(task, now: datetime) -> bool:
"""Check if a queued task is approaching SLA breach."""
policy = SLA_POLICIES[task.priority_name]
wait_time = (now - task.submitted_at).total_seconds() / 60
# Alert at 80% of SLA window
return wait_time > policy.max_wait_minutes * 0.8
Do This
- Define explicit SLAs per priority level with max wait and max duration
- Implement aging to prevent priority starvation of low-priority tasks
- Reserve 10-20% of agent capacity for low-priority tasks to prevent complete starvation
- Alert on SLA breach risk at 80% of the window — proactive, not reactive
Avoid This
- Run all tasks at the same priority — everything is urgent means nothing is urgent
- Allow unlimited preemption — a critical task flood can abort half-completed normal work
- Measure SLA compliance by average — P95 and P99 matter more than the mean
- Set SLAs without measuring baseline performance — SLAs must be achievable with current capacity