MP-301b · Module 2
End-to-End Test Flows
3 min read
End-to-end tests simulate realistic multi-tool conversations. Instead of testing each tool in isolation, you write a test that mirrors what an LLM would do: call tool A, use data from A's response to call tool B, verify the final state. This catches integration bugs that single-tool tests miss — data format mismatches between tools, missing cross-references in tool descriptions, and ordering dependencies that break when tools are called in unexpected sequences.
The most valuable e2e pattern is the "golden path" test: a complete user workflow exercised through tool calls. For a CRM server, the golden path might be: search_customers → get_customer → update_customer → get_customer (verify update persisted). For a knowledge base, it might be: index_document → search → get_document → delete_document → search (verify deletion). Each golden path test validates the entire tool ecosystem works together, not just that individual tools respond correctly.
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { createTestPair } from "../helpers/fixture-server.js";
describe("CRM workflow e2e", () => {
let client: Awaited<ReturnType<typeof createTestPair>>["client"];
let cleanup: () => Promise<void>;
beforeAll(async () => {
({ client, cleanup } = await createTestPair());
});
afterAll(() => cleanup());
it("search → get → update → verify flow", async () => {
// Step 1: Search for a customer
const searchResult = await client.callTool({
name: "search_customers",
arguments: { query: "Acme" },
});
expect(searchResult.isError).toBeFalsy();
const searchData = JSON.parse(searchResult.content[0].text);
expect(searchData.results.length).toBeGreaterThan(0);
// Step 2: Get full details using ID from search
const customerId = searchData.results[0].id;
const getResult = await client.callTool({
name: "get_customer",
arguments: { customer_id: customerId },
});
expect(getResult.isError).toBeFalsy();
const customer = JSON.parse(getResult.content[0].text);
expect(customer.id).toBe(customerId);
// Step 3: Update using the same ID
const updateResult = await client.callTool({
name: "update_customer",
arguments: { customer_id: customerId, status: "vip" },
});
expect(updateResult.isError).toBeFalsy();
// Step 4: Verify update persisted
const verifyResult = await client.callTool({
name: "get_customer",
arguments: { customer_id: customerId },
});
const updated = JSON.parse(verifyResult.content[0].text);
expect(updated.status).toBe("vip");
});
});
Do This
- Test multi-tool workflows that mirror real LLM conversation patterns
- Use data from one tool's response as input to the next — this catches format mismatches
- Verify state changes persist across tool calls within a session
- Limit e2e tests to 3-5 critical golden paths to keep the suite maintainable
Avoid This
- Duplicate unit test coverage in e2e tests — each layer tests different things
- Hardcode expected values that depend on fixture data ordering
- Skip cleanup between e2e tests — state leakage causes cascading failures
- Write e2e tests for every possible tool combination — focus on real user flows