MP-301e · Module 2

Chunked Responses

4 min read

MCP resources return complete responses — there is no native streaming in the protocol. But complete does not mean unbounded. When a data source produces continuous output (log streams, event feeds, real-time metrics), the MCP server must chunk the data into discrete, bounded responses. Each chunk is a valid resource read that contains a time-bounded or size-bounded slice of the stream, plus a cursor pointing to the next chunk.

The chunking strategy depends on the data source. Time-bounded chunks work for log streams: each read returns events from the last N seconds, with a timestamp cursor for the next read. Size-bounded chunks work for large data transfers: each read returns up to N items, with a sequence cursor. Count-bounded chunks work for event feeds: each read returns up to N events starting from a sequence number. The key constraint is that each chunk must be independently useful — the model should be able to act on any single chunk without needing the full stream history.

Chunk size directly affects AI model performance. Too large and the chunk consumes excessive context window, leaving less room for the model to reason about the data. Too small and the model makes many sequential reads, each with round-trip latency. The sweet spot depends on the data type: 50-100 log lines, 500 metric data points, or 20 structured events per chunk. Monitor the model's re-read frequency — if it consistently requests multiple consecutive chunks, increase the chunk size.

// Log stream — time-bounded chunks
server.resource(
  "app-logs",
  new ResourceTemplate("logs://app{?since,limit}", { list: undefined }),
  async (uri, params) => {
    const since = params.since
      ? new Date(params.since)
      : new Date(Date.now() - 30_000); // Default: last 30 seconds
    const limit = Math.min(Number(params.limit) || 100, 200);

    const logs = await pool.query(
      `SELECT timestamp, level, message, metadata
       FROM app_logs
       WHERE timestamp > $1
       ORDER BY timestamp ASC
       LIMIT $2`,
      [since.toISOString(), limit + 1]
    );

    const hasMore = logs.rows.length > limit;
    const items = hasMore ? logs.rows.slice(0, limit) : logs.rows;
    const nextCursor = items.length > 0
      ? items[items.length - 1].timestamp
      : null;

    return {
      contents: [{
        uri: uri.href,
        mimeType: "application/json",
        text: JSON.stringify({
          items,
          since: since.toISOString(),
          nextCursor,
          hasMore,
          count: items.length,
        }),
      }],
    };
  }
);

Choose a chunking strategy Time-bounded for continuous streams (logs, metrics). Size-bounded for bulk transfers. Count-bounded for discrete events. Match the strategy to how the data is produced.
Set chunk size limits Cap each chunk at a size that fits comfortably in the model context window (50-500 items depending on item size). Return hasMore and a cursor for continuation.
Test chunk independence Verify that each chunk is useful in isolation. A model should be able to answer questions from a single chunk without needing previous or subsequent chunks.