MP-301a · Module 1
Fan-Out / Fan-In Orchestration
4 min read
Fan-out/fan-in is the pattern where a tool splits a request into parallel sub-operations, executes them concurrently, and merges the results. The classic use case: "search across five data sources simultaneously and return a unified result." Without server-side fan-out, the LLM would call five separate tools sequentially, consuming five round-trips and five times the token overhead for tool-call framing. A fan-out tool does the same work in one call with parallel execution inside the server.
The fan-in merge is where the real complexity lives. When three of five sources return results and two time out, what do you return? A strict strategy fails the entire operation. A lenient strategy returns partial results with metadata about which sources failed. A quorum strategy requires N of M sources to succeed before returning. The right choice depends on your use case: financial data needs all sources (strict), search results are useful even if partial (lenient), and consensus-based decisions need a majority (quorum).
Concurrency limits are essential in production fan-out. If the LLM calls your "search everything" tool and it fans out to 50 data sources simultaneously, you may exhaust connection pools, hit API rate limits, or overwhelm downstream services. Use a concurrency limiter (semaphore pattern) to cap the number of in-flight sub-operations. A fan-out of 50 with a concurrency limit of 5 still completes faster than sequential, without the thundering-herd risk.
// Fan-out with concurrency limit and partial-result tolerance
async function multiSearch(query: string, sources: string[]) {
const CONCURRENCY = 5;
const TIMEOUT_MS = 3000;
const results: { source: string; status: string; data?: unknown }[] = [];
// Semaphore for concurrency control
let running = 0;
const queue = [...sources];
async function runNext(): Promise<void> {
if (queue.length === 0) return;
const source = queue.shift()!;
running++;
try {
const data = await Promise.race([
searchSource(source, query),
new Promise((_, reject) =>
setTimeout(() => reject(new Error("timeout")), TIMEOUT_MS)
),
]);
results.push({ source, status: "ok", data });
} catch (err) {
results.push({ source, status: "failed", data: { error: (err as Error).message } });
} finally {
running--;
await runNext();
}
}
// Launch initial batch
await Promise.all(
Array.from({ length: Math.min(CONCURRENCY, sources.length) }, runNext)
);
const succeeded = results.filter(r => r.status === "ok");
const failed = results.filter(r => r.status === "failed");
return {
content: [{ type: "text" as const, text: JSON.stringify({
query,
totalSources: sources.length,
succeeded: succeeded.length,
failed: failed.length,
results: succeeded.map(r => r.data),
failures: failed.map(r => ({ source: r.source, error: (r.data as { error: string }).error })),
}, null, 2) }],
};
}
- Set a concurrency ceiling Cap parallel sub-operations with a semaphore. Start with 5 and tune based on downstream capacity. Never fan out unbounded.
- Choose a merge strategy Decide upfront: strict (all-or-nothing), lenient (partial results OK), or quorum (N of M required). Document it in the tool description.
- Add per-source timeouts Wrap each sub-operation in a Promise.race with a timeout. One slow source should not block the entire fan-out. Report timeouts as failures in the scorecard.