MP-301f · Module 2

Retention Policies & Compliance

3 min read

Data retention policies define how long data can be kept, when it must be deleted, and what happens during the retention period. In an MCP context, retention applies to three layers: the source data (governed by the data owner), the MCP server cache (governed by the server operator), and the AI conversation context (governed by the AI platform). Each layer needs its own retention rules, and they must be aligned — caching data for 30 days when the source requires deletion after 7 days creates a compliance violation.

Compliance mapping connects data classifications to regulatory requirements. GDPR requires right-to-erasure for EU personal data. HIPAA requires access logs for protected health information. PCI DSS requires encryption for cardholder data. SOC 2 requires access control evidence for all customer data. The MCP server must know which regulations apply to each resource and enforce the corresponding controls — masking, access logging, cache TTLs, and deletion schedules.

interface RetentionPolicy {
  classification: Classification;
  cacheTtlSeconds: number;      // Max time in MCP server cache
  auditRetentionDays: number;   // How long to keep access logs
  regulations: string[];         // Applicable regulations
  deletionStrategy: "hard" | "soft" | "anonymize";
}

const RETENTION_POLICIES: Record<Classification, RetentionPolicy> = {
  public: {
    classification: "public",
    cacheTtlSeconds: 3600,        // 1 hour cache OK
    auditRetentionDays: 90,
    regulations: [],
    deletionStrategy: "hard",
  },
  internal: {
    classification: "internal",
    cacheTtlSeconds: 900,         // 15 min cache
    auditRetentionDays: 365,
    regulations: ["SOC2"],
    deletionStrategy: "soft",
  },
  confidential: {
    classification: "confidential",
    cacheTtlSeconds: 300,         // 5 min cache
    auditRetentionDays: 730,      // 2 years
    regulations: ["SOC2", "GDPR"],
    deletionStrategy: "anonymize",
  },
  restricted: {
    classification: "restricted",
    cacheTtlSeconds: 0,           // No caching
    auditRetentionDays: 2555,     // 7 years (regulatory)
    regulations: ["SOC2", "GDPR", "HIPAA", "PCI-DSS"],
    deletionStrategy: "hard",     // Must be fully purged
  },
};

// Enforce cache TTL based on classification
function getCacheTtl(classification: Classification): number {
  return RETENTION_POLICIES[classification].cacheTtlSeconds * 1000;
}

// Validate that cached data has not exceeded retention
function isCacheCompliant(
  cachedAt: number,
  classification: Classification
): boolean {
  const policy = RETENTION_POLICIES[classification];
  if (policy.cacheTtlSeconds === 0) return false; // Never cache
  return Date.now() - cachedAt < policy.cacheTtlSeconds * 1000;
}

Do This

  • Align cache TTLs with data classification — shorter TTLs for more sensitive data
  • Map each classification tier to specific regulatory requirements
  • Enforce deletion schedules at the cache layer independently of the source system
  • Log retention policy violations as alerts, not just audit entries

Avoid This

  • Cache restricted data "just for performance" — it creates a shadow copy outside governance controls
  • Assume the source system handles all retention — your MCP cache is a separate data store
  • Apply a single retention policy to all data — public and restricted data have different requirements
  • Forget about AI conversation logs — the model context is another place where data persists