When you hear the phrase ‘data quality’, you may immediately think of fixing errors, and obviously that’s a fundamental part of improving the quality of your data. But there’s much more to a comprehensive data quality solution than that. DataSalon DQ doesn’t just offer clean-up rules for existing fields, but also allows you to create and populate brand new fields. Some examples will show just how powerful this can be.
A new field may provide an additional validation tool. Say a subscription period is always expected to be a year. The difference between the subscription start and end dates can be placed in a new field and values other than 365/366 filtered out to find problem records.
Adding information from another file
A particular dataset may lack information that is available in another dataset. Provided the two share some kind of identifying field, it’s easy to pull in that missing info. So a usage report that contains only a customer ID could have full contact details pulled in from a separate list of customers, or book sales records could have publication details added from their ISBNs.
Information in a dataset may not be formatted in the most helpful way for the end-user. Country information is useful for many reasons but is sometimes tagged onto the end of the address – it’s often possible to pull out this information into a separate field. In other cases, you may want to group rather than split information: if there are a number of separate subject fields, searching can be simplified by concatenating them into a single field.
Presenting information in alternative ways
Sometimes the same file will be used differently by different users, and adding additional fields can provide the information in a format to suit each of those users. Revenue could be converted into different currencies, or groupings created to allow users to work at different levels of detail (e.g. general subject areas could be inferred from specific subjects, or continents or sales regions from countries).
Inferring additional information
The values in existing fields may be used to infer extra information, making it easier to segment and analyse the data. A subscription end date could be checked against the current date in order to give each record an ‘active’ or ‘expired’ status field, while mathematical functions make it simple to calculate further numerical information about transactions, such as total prices or percentage discounts.
By using a data quality toolkit with these capabilities, you can not only increase confidence in the accuracy of your data, but make that data work for all your users and maximise its potential for analytic purposes.