The art of deduplication

Deduplicating data effectively is a key part of building a single customer view, allowing a complete picture to be gained of everything known about your contacts. This can be more difficult than it sounds, often requiring a complex mix of both ‘science’ and ‘art’ to achieve the best results. Here we list our top 5 tips for successful ‘merging and purging’:

1. Use email address – with caution. Since email addresses are normally unique to each individual, they can be used as a ‘key’ to deduplicate and integrate contacts between separate unrelated systems. However, many organisations use shared email addresses for internal purposes – for example, a staff member at an agency may register 100s of different customers under the same ‘admin’ email account. It is therefore important also to allow an exception list of emails that should not be used for joining.

2. Generate keys. Where email address is unavailable, and there are no other reliable ID fields to match on, contacts can often be deduplicated based on keys generated from a combination of other fields, such as first name, surname and postcode. The challenge is to find the right combination of fields for the data set in question in order to guarantee uniqueness. Where free text fields are used for joining, fuzzy matching techniques can be helpful to ensure that minor variations in spelling (e.g. ‘St’/’Street’ in address data) do not prevent a successful match.

3. Use multiple keys. Working with multiple keys at the same time is technically complex, but is essential for the best results. For example, a single customer can have more than one email address (work/home), name (before/after marriage) or address (previous/current) and may have provided you with various combinations of these over time whilst completing different online forms. Effective deduplication of this contact’s data should be able to pull all of this information together.

4. Exception reporting. The ability to create effective reports quickly is important, as this enables special cases to be spotted, reviewed, and added to exception lists in order to resolve them. A setup which allows fast review of results, flexible changes to keying rules, and rapid re-testing is usually much more successful than one which requires you to try to get everything right first time.

5. Expert knowledge. Managing a deduplication project requires a mix of both technical and ‘editorial’ skills, plus the experience to know which approaches are likely to be the most successful for a given set of data. Deduplication is often underestimated and undertaken as an in-house project or ‘minor’ task prior to loading data into a new system. However, in practice this can be highly complex, and you may achieve better (and quicker) results by outsourcing the task.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s