MP-301a · Module 1

Fan-Out / Fan-In Orchestration

4 min read

Fan-out/fan-in is the pattern where a tool splits a request into parallel sub-operations, executes them concurrently, and merges the results. The classic use case: "search across five data sources simultaneously and return a unified result." Without server-side fan-out, the LLM would call five separate tools sequentially, consuming five round-trips and five times the token overhead for tool-call framing. A fan-out tool does the same work in one call with parallel execution inside the server.

The fan-in merge is where the real complexity lives. When three of five sources return results and two time out, what do you return? A strict strategy fails the entire operation. A lenient strategy returns partial results with metadata about which sources failed. A quorum strategy requires N of M sources to succeed before returning. The right choice depends on your use case: financial data needs all sources (strict), search results are useful even if partial (lenient), and consensus-based decisions need a majority (quorum).

Concurrency limits are essential in production fan-out. If the LLM calls your "search everything" tool and it fans out to 50 data sources simultaneously, you may exhaust connection pools, hit API rate limits, or overwhelm downstream services. Use a concurrency limiter (semaphore pattern) to cap the number of in-flight sub-operations. A fan-out of 50 with a concurrency limit of 5 still completes faster than sequential, without the thundering-herd risk.

// Fan-out with concurrency limit and partial-result tolerance
async function multiSearch(query: string, sources: string[]) {
  const CONCURRENCY = 5;
  const TIMEOUT_MS = 3000;
  const results: { source: string; status: string; data?: unknown }[] = [];

  // Semaphore for concurrency control
  let running = 0;
  const queue = [...sources];

  async function runNext(): Promise<void> {
    if (queue.length === 0) return;
    const source = queue.shift()!;
    running++;
    try {
      const data = await Promise.race([
        searchSource(source, query),
        new Promise((_, reject) =>
          setTimeout(() => reject(new Error("timeout")), TIMEOUT_MS)
        ),
      ]);
      results.push({ source, status: "ok", data });
    } catch (err) {
      results.push({ source, status: "failed", data: { error: (err as Error).message } });
    } finally {
      running--;
      await runNext();
    }
  }

  // Launch initial batch
  await Promise.all(
    Array.from({ length: Math.min(CONCURRENCY, sources.length) }, runNext)
  );

  const succeeded = results.filter(r => r.status === "ok");
  const failed = results.filter(r => r.status === "failed");

  return {
    content: [{ type: "text" as const, text: JSON.stringify({
      query,
      totalSources: sources.length,
      succeeded: succeeded.length,
      failed: failed.length,
      results: succeeded.map(r => r.data),
      failures: failed.map(r => ({ source: r.source, error: (r.data as { error: string }).error })),
    }, null, 2) }],
  };
}
  1. Set a concurrency ceiling Cap parallel sub-operations with a semaphore. Start with 5 and tune based on downstream capacity. Never fan out unbounded.
  2. Choose a merge strategy Decide upfront: strict (all-or-nothing), lenient (partial results OK), or quorum (N of M required). Document it in the tool description.
  3. Add per-source timeouts Wrap each sub-operation in a Promise.race with a timeout. One slow source should not block the entire fan-out. Report timeouts as failures in the scorecard.