OC-301f · Module 1

Test Data Management

3 min read

Agent tests need realistic data — contrived examples miss the edge cases that production data contains. But production data contains sensitive information that should not exist in test suites. Test data management is the discipline of creating test data that is realistic enough to catch real bugs without exposing real data.

The data strategy has three tiers. Tier one: synthetic data generated from production schemas with realistic distributions but no real values. Names, emails, and financial figures are fabricated but structurally valid. Tier two: anonymized production data — real data with PII removed, dates shifted, and identifying characteristics randomized. Retains the structural complexity of production without the sensitivity. Tier three: curated edge cases — specific data configurations that have caused bugs in the past, preserved as test fixtures. Each bug fix adds a new test fixture to the edge case collection.

Do This

  • Use synthetic data for happy path tests — realistic structure without sensitivity concerns
  • Anonymize production data for integration tests — real complexity without real risk
  • Curate edge case fixtures from every bug report — the test suite grows from production experience

Avoid This

  • Use production data directly in tests — PII in test suites is a compliance violation waiting to happen
  • Use trivial test data that does not represent production complexity — the tests pass but production fails
  • Discard the data from bug reports after fixing — each bug is a test case you earned the hard way