OC-201c · Module 2

Performance Monitoring

3 min read

Performance degradation is slow. Your daily briefing skill used to complete in 8 seconds. This week it takes 22 seconds. Nobody noticed because it still works — the message arrives, the data is correct, the user is happy. But the latency tripled. The weather API started rate-limiting you because your free tier quota dropped. The calendar API introduced pagination that your code handles but doubles the request count. The AI model response time increased because the provider is under load. No single change broke anything. But the system is slower, more expensive, and closer to a timeout failure than it was a month ago.

Performance monitoring tracks three metrics per module: latency (how long did the module take to execute), cost (how much did the API calls cost), and reliability (what percentage of executions completed without errors). Store these metrics as time-series data — one data point per execution. Plot trends weekly. When latency doubles or reliability drops below 95%, investigate before the gradual degradation becomes an outage.

Do This

Record execution time, API cost, and success/failure for every module run
Store metrics as time-series data and plot weekly trends
Set alert thresholds: latency > 2x baseline, reliability < 95%, cost > 2x average
Investigate gradual degradation before it becomes a sudden failure

Avoid This

Measure performance only when something feels slow
Store only the most recent metric — you need the trend, not the snapshot
Ignore cost metrics because "it is only a few dollars" — small leaks compound
Wait for a timeout failure to discover that latency has been climbing for weeks