CDX-301i · Module 1

Priority Scheduling & SLA Management

3 min read

SLA management defines response time guarantees for each task priority level. A production incident (critical priority) might have a 5-minute SLA — the agent must begin working on it within 5 minutes of submission. A routine documentation update (low priority) might have a 4-hour SLA. The dispatcher uses these SLAs to make scheduling decisions: if a critical task arrives and all agents are busy with normal tasks, the dispatcher may preempt a normal task to meet the critical SLA.

Priority inversion is the primary risk in SLA-managed systems. A flood of high-priority tasks can starve low-priority tasks indefinitely — the documentation update never runs because review tasks keep arriving. Two mechanisms prevent starvation: aging (a task's effective priority increases the longer it waits) and reserved capacity (a percentage of the agent pool is reserved for low-priority tasks and cannot be preempted). Both mechanisms ensure that every task eventually runs, even during high-load periods.

from dataclasses import dataclass
from datetime import datetime, timedelta

@dataclass
class SLAPolicy:
    priority: int
    max_wait_minutes: int     # Must start within this window
    max_duration_minutes: int # Must complete within this window
    preempt_lower: bool       # Can preempt lower-priority tasks

SLA_POLICIES = {
    "critical": SLAPolicy(0, max_wait_minutes=5,
                          max_duration_minutes=30, preempt_lower=True),
    "high":     SLAPolicy(1, max_wait_minutes=15,
                          max_duration_minutes=60, preempt_lower=True),
    "normal":   SLAPolicy(2, max_wait_minutes=60,
                          max_duration_minutes=120, preempt_lower=False),
    "low":      SLAPolicy(3, max_wait_minutes=240,
                          max_duration_minutes=480, preempt_lower=False),
}

def check_sla_breach(task, now: datetime) -> bool:
    """Check if a queued task is approaching SLA breach."""
    policy = SLA_POLICIES[task.priority_name]
    wait_time = (now - task.submitted_at).total_seconds() / 60
    # Alert at 80% of SLA window
    return wait_time > policy.max_wait_minutes * 0.8

Do This

Define explicit SLAs per priority level with max wait and max duration
Implement aging to prevent priority starvation of low-priority tasks
Reserve 10-20% of agent capacity for low-priority tasks to prevent complete starvation
Alert on SLA breach risk at 80% of the window — proactive, not reactive

Avoid This

Run all tasks at the same priority — everything is urgent means nothing is urgent
Allow unlimited preemption — a critical task flood can abort half-completed normal work
Measure SLA compliance by average — P95 and P99 matter more than the mean
Set SLAs without measuring baseline performance — SLAs must be achievable with current capacity