MP-301c · Module 1

Profiling & Bottleneck Analysis

4 min read

MCP tool latency has a fixed cost and a variable cost. The fixed cost is protocol overhead — JSON-RPC serialization, transport framing, and SDK routing — typically under 5ms. The variable cost is your handler: database queries, API calls, file I/O, and computation. In practice, protocol overhead is never the bottleneck. If your tool is slow, the problem is in your handler. Profiling confirms this by measuring where time actually goes, replacing speculation with data.

Node.js provides two profiling approaches for MCP servers. The built-in --prof flag generates a V8 profiling log that shows function-level CPU time — useful for finding hot loops and expensive computations. For I/O-bound tools (the common case), use the performance.mark/measure API to instrument your handler at the operation level: how long did the database query take? The API call? The JSON serialization? This gives you a waterfall view of each tool call, showing exactly which operation dominates the total latency.

The 80/20 rule applies aggressively to MCP tool performance. One database query without an index, one API call without a timeout, or one synchronous file read will dominate the latency budget. Profile first, optimize the single biggest bottleneck, re-profile. Do not optimize code that takes 2ms when a database query takes 800ms — the code optimization is invisible in the total. Focus on the operation that would make the biggest absolute difference.

// Operation-level profiling middleware
function withProfiling(toolName: string, handler: ToolHandler): ToolHandler {
  return async (args) => {
    const spans: { op: string; durationMs: number }[] = [];
    const ctx = {
      span: (op: string) => {
        const start = performance.now();
        return {
          end: () => spans.push({
            op,
            durationMs: Math.round((performance.now() - start) * 100) / 100,
          }),
        };
      },
    };

    const totalStart = performance.now();
    const result = await handler(args, ctx);
    const totalMs = Math.round((performance.now() - totalStart) * 100) / 100;

    // Log the waterfall
    console.error(JSON.stringify({
      event: "tool_profile",
      tool: toolName,
      totalMs,
      spans,
      slowest: spans.sort((a, b) => b.durationMs - a.durationMs)[0]?.op,
    }));

    return result;
  };
}

// Usage in a handler:
async function searchHandler(args: { query: string }, ctx: ProfilingContext) {
  const dbSpan = ctx.span("db_query");
  const rows = await db.search(args.query);
  dbSpan.end();

  const serSpan = ctx.span("serialize");
  const text = JSON.stringify(rows, null, 2);
  serSpan.end();

  return { content: [{ type: "text", text }] };
}

Instrument every I/O operation Wrap database queries, API calls, and file reads with span markers. Every operation that leaves the process boundary gets a span.
Establish a latency budget Set a target per tool: 200ms for lookups, 500ms for searches, 2s for aggregations. Tools that exceed their budget trigger investigation.
Profile under realistic load Run profiling with concurrent tool calls, not just sequential. Contention under load reveals bottlenecks that single-call profiling misses.