CC-301b · Module 3

Testing & Debugging Skills

4 min read

Untested skills are unreliable skills. The challenge with testing skills is that the output is non-deterministic — Claude does not produce identical output from identical inputs. You cannot write a unit test that asserts the skill produces exactly this string. But you can test the properties of the output: does it contain the required sections? Does it follow the specified format? Does it reference the correct files? Does it handle the edge cases described in the core instructions?

Property-based testing is the correct framework for skill validation. Define the properties that a correct output must satisfy, then run the skill multiple times and verify the properties hold. "The output must contain a Summary section." "The output must reference at least 3 data sources." "The output must not contain the word TODO." "The output file must be valid JSON." These assertions are deterministic even when the content is not.

Debugging skills requires a systematic approach because the failure modes are subtle. A skill that produces "mostly correct" output is harder to debug than a skill that fails completely, because the partial success masks the underlying issue. The debugging protocol has three steps. First: isolate the layer. Is the problem in the front matter (skill not triggering), the core instructions (wrong execution path), or the linked files (wrong template or example)? Second: test the layer in isolation. For front matter issues, verify trigger activation in a fresh session. For core instruction issues, manually step through the instructions and identify where Claude deviates. For linked file issues, read the linked files and verify they contain what the instructions expect. Third: fix and regression test. After fixing the issue, run the original failing case and five additional cases to verify the fix did not break other paths.