CC-301g · Module 2

CI Failure Diagnosis

3 min read

CI failures fall into three categories, and each category requires a different remediation approach. Build failures (TypeScript errors, compilation errors) are deterministic — the same code always fails the same way. Claude fixes these reliably because the error messages are precise and the fix is mechanical. Test failures are semi-deterministic — they might indicate a real bug or a flaky test. Claude needs to distinguish between them. Infrastructure failures (network timeouts, out-of-memory, container issues) are not code problems at all — Claude should not try to fix code for an infrastructure failure.

The remediation workflow starts with classification. When a CI job fails, the first step is not "fix it" — it is "classify it." Parse the CI output and determine: is this a build error, a test failure, or an infrastructure issue? The classification determines the remediation path. Build errors go directly to Claude for fixing. Test failures go to Claude for investigation. Infrastructure failures go to the on-call engineer.

1. Classify the Failure Parse CI output for error categories. Build errors (tsc, compilation): deterministic, Claude can fix. Test failures (assertion, timeout): investigate before fixing. Infrastructure (ENOMEM, ECONNREFUSED, timeout): escalate to ops, do not attempt code fixes.
2. Feed Context to Claude For build/test failures, give Claude the error output, the commit that triggered the failure, and the diff from the previous passing build. This triple — error, commit, diff — is everything Claude needs to diagnose.
3. Fix and Re-trigger Claude fixes the issue, commits the fix, pushes. CI re-runs automatically on the new commit. If it passes, the loop is closed. If it fails again, re-classify — the error might have shifted categories.