PE-201b · Module 1

Deduplication Strategies

3 min read

Duplicates are the most common and most insidious data quality problem. A duplicate company record means every deal, contact, and activity associated with it is split across two records. Your pipeline value is inflated. Your activity history is incomplete on both records. Your reports show two accounts where one exists. And the problem compounds — every new interaction that gets attached to the wrong record makes the split worse.

Fuzzy Matching Rules Exact matching catches "Acme Corp" and "Acme Corp" but misses "Acme Corporation" and "ACME Corp." Fuzzy matching uses algorithms like Levenshtein distance and phonetic matching to identify records that are probably the same entity despite minor spelling variations. Configure fuzzy matching on company name, domain, and phone number.
Domain-Based Deduplication Company website domain is the most reliable identifier. "acme.com" is "acme.com" regardless of how the company name is spelled. Use domain as the primary matching key and company name as the secondary. This catches the vast majority of duplicates with minimal false positives.
Merge Workflows When duplicates are identified, the merge must preserve data from both records — the record with the most recent activity becomes the survivor, all deals and contacts from the losing record are transferred, and a log entry records what was merged. Automated merge workflows prevent data loss during deduplication.