OC-301g · Module 1
Distributed Tracing for Agent Workflows
3 min read
A multi-agent workflow spans multiple agents, multiple services, and potentially multiple external APIs. When something goes wrong — the output is late, the quality is low, or an error occurs — you need to trace the request through every component it touched. Distributed tracing assigns a unique trace ID at the workflow entry point and propagates it through every agent handoff, service call, and data access.
The trace structure for agent workflows has a unique element: decision spans. Traditional traces have spans for service calls (database query: 45ms, API call: 200ms). Agent traces add spans for decision-making (agent deliberation: 1200ms, council vote: 3400ms). Decision spans capture not just the duration but the decision record — what the agent considered and concluded. This enables performance analysis (where is the bottleneck?) and quality analysis (which decision step produced the error?) from the same trace.
interface AgentTraceSpan {
traceId: string; // propagated through workflow
spanId: string; // unique to this span
parentSpanId?: string; // for nested spans
agentId: string;
spanType: 'task' | 'decision' | 'api_call' | 'council';
startTime: string;
endTime: string;
durationMs: number;
// Decision-specific fields
decision?: {
trigger: string;
alternativesConsidered: number;
confidence: number;
selectedAction: string;
};
// Error fields
error?: {
code: string;
message: string;
recoverable: boolean;
};
}