A key part of the service we offer (through MasterVision) is helping publishers identify different versions of the same customer across various databases, and then joining those together into a single, unique record. This could mean untangling the multiplicity of ways organization names have been recorded in different source systems, or linking individuals to their author records and article submissions.
The varying levels of detail captured for each customer, combined with data entry errors etc., mean this can be far from straightforward, so we’ve developed a number of clever ways of identifying and confirming possible matches. In the case of authors and article submissions, we’re helped by widely accepted standard identifiers: DOIs for articles; ISSNs for journals; and ORCIDs for individuals.
But as there is not yet a free and open industry standard for identifying organizations, a single solution is less clear cut.
The issues
There are a number of different services currently available for identifying organizations, but each have their shortcomings and don’t completely address all the challenges:
Limited scope: Many of the services available are focused on specific areas and often don’t provide the coverage needed by publishers. For example, the Crossref Funder Registry concentrates specifically on funding sources.
Commercial cost: In order for a reference data set to be widely adopted as an industry standard, it should ultimately be free, open and accessible to all, rather than a commercial service available only to those who can afford it.
Inaccessible data: Restrictions may apply to obtaining data for integration – for example, one currently needs to be a member of ISNI in order to access their data in machine-readable format.
Governance: Both OrgRef and GRID provide free and open reference datasets run by private companies, which may not have the transparent governance needed to be truly independent.
Working party
Crossref, Datacite and ORCID are collaborating to explore the current landscape and investigate ways of solving this issue, and have released three papers for community review and feedback:
- ‘Organization Identifier Project: A Way Forward’ discusses the need for an independent, open, non-profit organization identifier registry.
- ‘Organization Identifier Provider Landscape’ surveys the key current providers.
- ‘Technical Considerations for an Organization Identifier Registry’ outlines technical use cases and best practice.
In summary, the working party proposes that organization identifiers should be:
- Unique (across space and time)
- Persistent
- Reliable
- Easy to find and use
- Free and open
- Mappable to other schemes
Several key challenges are also noted:
- How to include alternate forms of a name?
- How to express hierarchies, and what level(s) within the hierarchy should it be possible to link with?
- How to deal with name changes and mergers?
- How to handle affiliations between institutions?
- How to validate the information, and what mechanisms are provided for correcting it?
These questions are by no means easy to answer, and here at DataSalon we’ll be watching the output of the working party with interest.