MP-301a · Module 3

Large Payload Strategies

3 min read

MCP tool responses go into the LLM's context window. A tool that returns a 50,000-token JSON blob does not just slow down processing — it can exhaust the context budget for the entire conversation, leaving no room for the LLM to reason about the result or call additional tools. The fundamental rule for large payloads: the tool must summarize, paginate, or truncate. Never return raw large data and expect the LLM to figure it out.

Pagination is the standard pattern for large result sets. Return the first page with a cursor or offset, and include metadata about the total count and how to fetch the next page. The LLM reads page 1, decides if it has enough, and calls again with the cursor for more. This keeps each response small while giving the LLM full access to the data over multiple calls. The key detail: your first page should include a summary of the full result set (total count, aggregate statistics) so the LLM can answer questions like "how many results are there?" without paging through everything.

For inherently large single objects (log files, database dumps, full documents), summarization beats pagination. A tool that returns a 10,000-line log file is useless — a tool that returns "10,247 lines, 3 errors detected, 12 warnings, most recent error at line 8,934: ConnectionTimeoutError" is immediately actionable. Let the LLM request specific sections by line range or keyword filter when it needs detail. You are building a drill-down interface, not a dump pipe.

// Paginated results with summary metadata
const PAGE_SIZE = 20;
const MAX_RESPONSE_CHARS = 8000;

async function paginatedSearch(
  query: string,
  cursor?: string,
) {
  const offset = cursor ? parseInt(cursor, 10) : 0;
  const { total, rows } = await db.search(query, {
    offset,
    limit: PAGE_SIZE + 1, // fetch one extra to detect hasMore
  });

  const hasMore = rows.length > PAGE_SIZE;
  const page = rows.slice(0, PAGE_SIZE);

  // Truncate individual records if response would exceed budget
  const serialized = JSON.stringify(page);
  const records = serialized.length > MAX_RESPONSE_CHARS
    ? page.map(r => ({ id: r.id, title: r.title, score: r.score })) // slim version
    : page;

  return {
    content: [{ type: "text" as const, text: JSON.stringify({
      query,
      total,
      page: Math.floor(offset / PAGE_SIZE) + 1,
      pageSize: PAGE_SIZE,
      returned: records.length,
      hasMore,
      nextCursor: hasMore ? String(offset + PAGE_SIZE) : null,
      results: records,
    }, null, 2) }],
  };
}
  1. Set a response size budget Define a MAX_RESPONSE_CHARS constant (e.g., 8,000). If the serialized result exceeds it, switch to a summary or slim format automatically.
  2. Paginate result sets Default to returning 20 results with a cursor. Include total count and hasMore so the LLM knows the full scope without fetching it.
  3. Summarize large objects For logs, documents, and dumps, return a structured summary (counts, key findings, line references) with a way to drill into specific sections.