MP-201b · Module 3

Caching & Performance

4 min read

MCP resource reads are synchronous request-response calls, and every read hits the underlying data source unless you cache. In an AI conversation where the model reads the same resource multiple times — re-checking a database table, re-reading a configuration file, re-fetching an API response — caching eliminates redundant I/O and keeps response latency low. The MCP protocol does not prescribe a caching mechanism; it is the server's responsibility to implement one that fits the data source.

TTL-based caching is the simplest strategy: cache the result of a resource read for a fixed duration, serve from cache on subsequent reads, and invalidate when the TTL expires. This works well for data that changes on a known schedule — daily reports, hourly metrics, configuration files. For data that changes unpredictably, event-driven invalidation is better: the cache is invalidated when the server detects a change (via filesystem watcher, database notification, or webhook). Hybrid strategies use TTL as a floor and event-driven invalidation for immediate freshness.

Large datasets need special handling. A resource that returns 10,000 rows consumes significant context window when the model reads it. Server-side strategies include pagination (expose page/limit parameters), projection (return only requested columns), and summarization (return aggregates instead of raw rows). Client-side, the model should be guided to read summary resources first and drill into detail resources only when needed. The MCP server can enforce this by limiting row counts and providing explicit navigation to detail views.

interface CacheEntry {
  data: string;
  cachedAt: number;
  ttlMs: number;
}

class ResourceCache {
  private cache = new Map<string, CacheEntry>();

  get(uri: string): string | null {
    const entry = this.cache.get(uri);
    if (!entry) return null;
    if (Date.now() - entry.cachedAt > entry.ttlMs) {
      this.cache.delete(uri);
      return null;  // TTL expired
    }
    return entry.data;
  }

  set(uri: string, data: string, ttlMs: number): void {
    this.cache.set(uri, { data, cachedAt: Date.now(), ttlMs });
  }

  invalidate(uri: string): void {
    this.cache.delete(uri);
    // Also notify subscribed clients
    server.notification({
      method: "notifications/resources/updated",
      params: { uri },
    });
  }

  // Invalidate all entries matching a prefix
  invalidatePrefix(prefix: string): void {
    for (const key of this.cache.keys()) {
      if (key.startsWith(prefix)) this.invalidate(key);
    }
  }
}

const cache = new ResourceCache();

// Usage in a resource handler
server.resource("metrics", "api://metrics/daily", async (uri) => {
  const cached = cache.get(uri.href);
  if (cached) return { contents: [{ uri: uri.href, text: cached }] };

  const data = await fetchMetrics();
  const text = JSON.stringify(data);
  cache.set(uri.href, text, 5 * 60 * 1000); // 5-minute TTL
  return { contents: [{ uri: uri.href, text }] };
});

Profile your read patterns Log which resources are read most frequently during AI sessions. Those are your caching priorities — a resource read 10 times in one conversation with identical results is pure waste without caching.
Choose TTL by volatility Static config: 1 hour. Daily reports: 15 minutes. Live metrics: 30 seconds. Real-time data: no cache, use subscriptions instead.
Set size limits Cap your cache at a reasonable memory budget (e.g., 100 MB). Evict least-recently-used entries when the cap is reached. A cache that grows unbounded will eventually consume all server memory.