DR-301a · Module 2

Multi-Source Merging

4 min read

Collection gives you data from twenty sources. Synthesis gives you intelligence from that data. The merge step is where twenty fragmented perspectives become one coherent picture. Multi-source merging takes normalized data about the same entity, event, or trend from different sources and combines them into a single enriched record. Each source adds information the others lack. The merged result contains more signal than any individual source.

The merge pipeline operates in three passes. Pass one: deduplication. Identify records from different sources that describe the same underlying event or entity. This is not trivial — the same acquisition might appear as "Acme acquires Beta" in one source and "Beta sold to Acme for $50M" in another. Both describe the same event but with different framing and different details. Pass two: enrichment. Combine the unique information from each source into a single record. Source A has the acquisition price. Source B has the strategic rationale. Source C has the timeline. The merged record has all three. Pass three: conflict flagging. When sources disagree on a specific data point — different prices, different dates, different participants — the merge pipeline flags the conflict for downstream resolution rather than silently picking one.

  1. Pass 1: Deduplication Match records across sources using entity resolution, event fingerprinting, and temporal proximity. Two records about the same company published within 24 hours with overlapping keywords are likely describing the same event. Deduplication links them without deleting either — you want the merged result, not the survivor.
  2. Pass 2: Enrichment For each group of matched records, combine unique fields from each source into a single enriched record. Prefer specific data over vague data. Prefer recent data over stale data. Prefer primary sources over secondary sources. Document which source contributed each field.
  3. Pass 3: Conflict Flagging When sources disagree on a specific data point, do not silently choose one. Flag the conflict with both values, their sources, and their confidence scores. Conflicts are often where the most interesting intelligence hides — a disagreement between sources usually means the situation is more complex than any single source captured.