The magic of automatching

Over the last 10 years, we’ve been working with scholarly publishers to help them clean, link and de-dupe their institutional customers to create unique records from the many different ways customers are recorded in their databases. In that time, we’ve developed a clever set of tools that can automatically match (‘automatch’) customers to a distinct reference record.

There are various reference sets of institutional data available, against which customer data can be matched. Some, such as Ringgold, are commercial and require a licence to use them, but there are also open datasets, such as GRID and OrgRef, which can be downloaded for free.

Automatching has many benefits, both in improving the quality of your data and extending your customer list:

  • Removing duplicates – by identifying records that are in fact for the same customer, despite having different names.
  • Standardising data – alternative name forms can be replaced so that the same form is used consistently across all your data.
  • Correcting errors – reports of unmatched records may highlight issues such as incorrect information in the country or state field.
  • Identifying sales leads – matching against a reference dataset with good coverage of your particular sector can highlight institutions who are not yet customers.
  • Pulling back IDs – to help improve supply chain communications.

In addition to matching institutional customers to a reference record, automatching can also be used to link individuals to their parent organization. Knowing how individuals relate to their institution and in what way, be it author, editor, librarian etc., allows for much more targeted sales and marketing communications.

For all these reasons, automatching has always been an important part of MasterVision’s customer-centric approach.

Now, by popular request, we’ve also added it to our brand new DataSalon DQ service.

Matching to Ringgold (licence permitting), GRID and OrgRef is supported ‘out of the box’ by DataSalon DQ, and it couldn’t be simpler: you specify which fields in your data contain the relevant information (org name, country, etc), add the matched IDs to your data set, and view full reports of matches, misses and ambiguities.

Of course, automatching isn’t always a straightforward task – with variations in name forms, different data structures across sources, and errors in data entry, it’s more complex than simply checking that a name and location matches up. However, our experience in this area has enabled us to develop a set of strategies for improving match rates, including fuzzy searching, maintaining extensive lists of abbreviations and synonyms, and using email and web domains.

Once you have a clear reference record available, it makes piecing together a customer’s hierarchy / family tree a whole lot easier. Customer insight is significantly enhanced by seeing how a departmental library relates to a paying top level institution, and this picture allows for more detailed market share reporting and sales prospecting.

If you’d like to know more, and discuss how automatching could add value to your data, do get in touch with us to arrange a demo.