The 7 deadly sins of data quality

One of the side effects of creating an integrated single customer view with a system like MasterVision is that it can bring to light issues of data quality which may previously have been hidden from view in various separate source systems. Since data quality is a tricky concept, let’s take a quick tour of the different types of problem which can arise:

1. Gaps. You’ve captured a customer’s details, but certain crucial pieces of information have been left blank. That might be their country, or their email, or even their name. This can be solved (for new sign-ups) by striking the right balance between making your registration form quick and easy, while still requiring users to fill in certain key fields. For existing customers, you can fill any gaps with ‘progressive profiling’ – ie. configuring your systems to ask the customer for a little more info next time they log in.

2. Duplicates. There’s also the challenge of tidying up multiple records for the same customer. De-duplication is difficult to get right, because the aim is to find and merge all of the duplicates, while avoiding ‘false positives’, whereby similar but different customers are wrongly merged together. The best approach will vary depending on the nature of the source data, but often involves a mixture of email, phone, name, and address info. It’s important to get this right, otherwise any analysis will overstate the number of unique customers you really have.

3. Errors. These can take many forms, but typical examples include incorrect info such as a country value of ‘London’, or plain ‘junk’ info such as a country value of ‘zzzz’. Those might be reasonably easy to find in a field like country (because there’s a finite list of correct values), but harder to find in a personal name field. And even having found incorrect and ‘junk’ values, those fields will then often need to be blanked (in the absence of any better values), so producing more ‘gaps’ to be filled.

4. Inconsistencies. A different type of issue arises where perfectly valid info has been entered inconsistently. One example might be a title entered as ‘Mr’ by one customer, ‘Mister’ by another, and ‘MR’ by a third. These all clearly mean the same thing, but the various forms will make it harder to analyse the data cleanly. This same issue often arises for product info too, where different codes and names are used in different systems, even when referring to exactly the same product. With the right tools, inconsistencies like these can usually be cleaned up very effectively.

5. Missed connections. When dealing with hierarchical customer data, missed connections can arise where an individual isn’t linked up to their parent organisation. This will then reduce the quality of the organisation’s profile, since the activities of all related individuals won’t be taken into account. If referencing a third-party market database (such as Ringgold), then similar quality issues will also arise if your organisational customers are not correctly linked. Both of these problems can be fixed either with manual auditing or with automated linking tools.

6. Old information. A more subtle data quality problem can arise if you have old data about a given customer. Their key information may all be present, but if it was provided many years ago, then there is a much higher risk that it is no longer accurate. Almost every piece of info you might hold about a customer can change over time, including their address, email, phone number, interests, and even their name. Customers may also have died, and marketing to them in those circumstances can be very distressing for relatives. For all these reasons, trying to keep your customer info reasonably up-to-date adds another dimension to the data quality challenge.

7. Conflicts. The seventh and final deadly sin of data quality relates to conflicts. What to do if the same customer has opted in to marketing in one of your databases, and opted out from all contact in another? And if you have more than one postal address for a customer, which one do you choose for your next mailing? And if you have more than one email address, which is the right one to use? These types of conflicts are a natural part of creating a single customer view. They can all be resolved by putting in place the right business rules to choose the ‘best’ option in each case – which might be based on the most recent record, or on treating one source database as ‘more trusted’ than others, or some other criteria.

So, in conclusion, data quality can present a lot of different headaches, but all of these areas do need to be tackled with care, in order to create a reliable and accurate overview of your whole business which you can really trust.