MP-301b · Module 3

Regression Suites

3 min read

A regression suite captures the exact request-response pairs from real MCP conversations and replays them against new server versions. When you fix a bug, the first step is to write a test that reproduces it — the bug's tool call with the exact arguments that triggered the failure, and an assertion on the correct behavior. This test prevents the bug from recurring and documents the fix for future maintainers. Over time, your regression suite becomes a catalog of every production failure mode your server has encountered.

Snapshot testing is a lightweight alternative for regression detection. Serialize the complete tool catalog (names, descriptions, schemas) and the responses to a standard set of test calls, then compare against stored snapshots. Any change triggers a diff review. This catches unintentional regressions — a refactor that accidentally changes an error message, a dependency update that alters response formatting, or a merged PR that adds a field to a schema without updating tests. Snapshot tests are low-effort to write but require discipline to review: do not blindly update snapshots without understanding the diff.

import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { createTestPair } from "../helpers/fixture-server.js";

// Regression cases: each represents a bug that was found and fixed
const regressionCases = [
  {
    id: "BUG-042",
    description: "Empty query string caused unhandled exception",
    tool: "search_customers",
    args: { query: "" },
    assert: (result: { isError?: boolean; content: { text: string }[] }) => {
      expect(result.isError).toBe(true);
      expect(result.content[0].text).toContain("non-empty");
    },
  },
  {
    id: "BUG-057",
    description: "Unicode in customer name broke JSON serialization",
    tool: "get_customer",
    args: { customer_id: "CUS-004" }, // has non-breaking space
    assert: (result: { isError?: boolean; content: { text: string }[] }) => {
      expect(result.isError).toBeFalsy();
      expect(() => JSON.parse(result.content[0].text)).not.toThrow();
    },
  },
  {
    id: "BUG-063",
    description: "Missing isError flag on validation failures",
    tool: "update_customer",
    args: { customer_id: "NOPE", status: "invalid-status" },
    assert: (result: { isError?: boolean }) => {
      expect(result.isError).toBe(true);
    },
  },
];

describe("regression suite", () => {
  let client: Awaited<ReturnType<typeof createTestPair>>["client"];
  let cleanup: () => Promise<void>;

  beforeAll(async () => ({ client, cleanup } = await createTestPair()));
  afterAll(() => cleanup());

  for (const tc of regressionCases) {
    it(`${tc.id}: ${tc.description}`, async () => {
      const result = await client.callTool({ name: tc.tool, arguments: tc.args });
      tc.assert(result as { isError?: boolean; content: { text: string }[] });
    });
  }
});

Do This

  • Write a regression test for every production bug before fixing it
  • Label regression tests with bug IDs and descriptions for traceability
  • Use snapshot tests for tool catalog regression — catch unintentional schema changes
  • Review snapshot diffs carefully before updating — each diff is a potential breaking change

Avoid This

  • Fix bugs without adding a test — the same bug will return within 3 months
  • Delete regression tests when they "haven't failed in a while" — that means they are working
  • Auto-update snapshots in CI — this defeats the purpose of regression detection
  • Mix regression tests with unit tests — regressions need their own suite for visibility