MP-301f · Module 1

Cross-Resource Joins

4 min read

When data spans multiple MCP servers, the AI model performs joins in its context window. It reads a customer record from Salesforce, reads the customer's open tickets from ServiceNow, and correlates them by account name or ID. This works because LLMs excel at natural language pattern matching — they can identify that "Acme Corp" in Salesforce is the same entity as "ACME Corporation" in ServiceNow, even without a shared identifier. But relying on fuzzy matching is fragile. Production federation requires explicit correlation keys.

Correlation keys are shared identifiers that link records across systems. The most common approach is a canonical entity ID stored in each system — a customer UUID that exists in both Salesforce (as a custom field) and ServiceNow (as a reference field). When the model reads from both systems, it joins on this shared ID. Alternative approaches include email addresses (unique per person, present in most systems), domain names (unique per company), or composite keys (system prefix + native ID like sf:001xx or sn:INC0012345).

For cases where correlation keys do not exist, a mapping resource provides the bridge. A dedicated MCP server — or a resource on an existing server — maintains a cross-system entity map: { salesforceId: "001xx", servicenowId: "INC0012345", sapCustomer: "CUST-9999" }. The model reads the mapping first, then uses it to correlate reads across systems. This centralized mapping is easier to maintain than distributing IDs across every system.

// Cross-system entity mapping resource
interface EntityMapping {
  canonicalId: string;
  systems: Record<string, string>; // system → native ID
  displayName: string;
  entityType: "company" | "contact" | "product";
}

server.resource(
  "entity-map",
  new ResourceTemplate("mapping://entities/{entityType}{?search}", {
    list: undefined,
  }),
  async (uri, { entityType, search }) => {
    let query = `
      SELECT canonical_id, system_ids, display_name, entity_type
      FROM entity_mappings
      WHERE entity_type = $1
    `;
    const params: string[] = [entityType];

    if (search) {
      query += ` AND display_name ILIKE $2`;
      params.push(`%${search}%`);
    }
    query += ` LIMIT 100`;

    const result = await pool.query(query, params);
    const mappings: EntityMapping[] = result.rows.map((r) => ({
      canonicalId: r.canonical_id,
      systems: r.system_ids, // { salesforce: "001xx", sn: "INC..." }
      displayName: r.display_name,
      entityType: r.entity_type,
    }));

    return {
      contents: [{
        uri: uri.href,
        mimeType: "application/json",
        text: JSON.stringify(mappings),
      }],
    };
  }
);

// Usage by AI model:
// 1. Read mapping://entities/company?search=Acme
// 2. Get { salesforce: "001xx", servicenow: "ACME_CORP" }
// 3. Read salesforce://accounts/001xx
// 4. Read servicenow://accounts/ACME_CORP/incidents

Do This

  • Establish canonical entity IDs shared across systems wherever possible
  • Build a mapping resource for systems that cannot share IDs natively
  • Return correlation keys in every resource response so the model can join across reads
  • Document which fields are join keys in your schema resources

Avoid This

  • Rely on the AI model to fuzzy-match entity names across systems — it works 80% of the time, not 100%
  • Build a monolithic aggregation server that pre-joins all systems — it defeats the federation model
  • Use system-specific IDs as the only correlation key — they are opaque across system boundaries
  • Assume email addresses are stable join keys — people change emails, companies have multiple domains