Overlap between different services providing citation data

Hello - I’m looking for an open source of data for the number of citations and so am very excited to use Crossref. I’m curious if there is data somewhere that shows the number or fraction of publishers/journals/papers in Crossref that include reference data. When I spot check various dois at Crossref, google scholar, Web of Science, I see some variation. I sense google scholar would be inflated since they are counting citations for things that wouldn’t have doi’s, but what should we expect the overlap be between Crossref and google scholar?

Also, am I right that individual researchers can only get the number of citations a paper receives and not the actual doi’s for the cited papers?

Thanks,
Pat

Hi @pschloss,

Thanks for your message, and welcome to the community forum. Glad you found us and are finding our metadata useful!

First off, we don’t have a formal relationship with Google.

We do find that Crossref DOIs and metadata are indexed by Google’s crawlers and appear in search engine results but as their approach isn’t public, we don’t have any insight into their processes or timelines unfortunately.

We do know that our REST API is open to all, so it may be part of their process.

As for your question about reference metadata, our REST API is a great place to get this information:

Which members are registering references and how many DOIs have references registered (by member): http://0-api-crossref-org.lib.rivier.edu/works?filter=has-references:true&facet=publisher-name:*&rows=0&mailto=support@crossref.org

Comparatively, which members are NOT registering references and how many DOIs have no references registered (by member): https://0-api-crossref-org.lib.rivier.edu/works?filter=has-references:false&facet=publisher-name:*&rows=0&mailto=support@crossref.org

Also, am I right that individual researchers can only get the number of citations a paper receives and not the actual doi’s for the cited papers?

Yes, that’s correct. The cited-by counts are available in the REST API. Look for the is-referenced-by-count in the JSON output; that tells you how many other Crossref DOIs have cited the DOI in question. For instance, this DOI: https://0-api-crossref-org.lib.rivier.edu/works/10.1016/j.bioactmat.2020.11.006?mailto=support@crossref.org has been cited by 80 other Crossref DOIs. You’ll also see that DOI 10.1016/j.bioactmat.2020.11.006 has a references-count of 57. That means that this DOI has 57 references in its metadata record.

But, the details of which DOIs are citing the DOI in question is only available to our members for their own DOIs.

Please let me know if you have any additional questions,
Isaac

Thanks, this is very helpful!

1 Like

The is-referenced-by-count is very useful, but sometimes I require more flexibility of the REST API.

Are there any plans to make the cited-by DOI’s available from the CrossRef (REST) API for non-members i.e. the general public, like myself? If not why not? The data is available in CrossRef.

  • It allows an author to track their citations over time in general and of a specific article (DOI) in particular. Now it is only possible in e.g. Clarivate’s Web of Science (not open).
  • Following citations to a particular article DOI allows a (bibliometric) researcher to spot e.g. potential trends per journal, per research area, affiliation etc.

Cheers,

Dave

This is really interesting information. I had suspected that many crossref members were not providing references in their registration data, and this confirms it. Some of these publishers are quite professional (e.g, Oxford University Press, Elsevier, Springer, ACS, AMS). I can only presume that their workflow does not uniformly make it easy to capture this information. In our case we recently built a pipeline based on the assumption that authors write in LaTeX. This makes it really easy to extract bibliographic references in a structured format, and I’m surprised that other STEM publishers have not also done this. Even if they use a workflow based on something like Microsoft Word => PDF there are still tools like CERMINE that can be used to extract bibliographic references. Such tools are only approximate however, which is perhaps why publishers don’t use them. It’s unfortunate because this sounds like another case of “perfect is the enemy of good”. As it turns out, the docx format also has a semi-structured representation for bibliographic references but I’m not sure whether popular author tools support it.

Another problem I was reminded of with the cited-by service is that crossref is not the only DOI issuer, and apparently the cited-by service only has coverage for DOIs that they issue. As an example, the European Association for Theoretical Computer Science (EATCS) uses the prefix 10.4230 that is serviced by datacite. I presume that this means that crossref would not see references to crossref DOIs from datacite DOIs. I was unable to find any API from datacite that is comparable to the cited-by service of crossref.

Crossref is doing a great job of advancing the POSI (Principles of Open Scholarly Infrastructure) but we have a long ways to go. There will probably always be publications outside of the DOI world (e.g, USENIX, ARKs, stable URLs, etc), and it would probably help if we could standardize a way to embed XMP metadata for bibliographic references into the PDF. That’s probably outside the scope of what crossref’s mission is, but anything we can do to make bibliographic information more widely available will benefit scholarly publishing.

Thank you @mccurley . Let me just interject here that all of the publishers you mentioned are providing references for some of their works (as you can see from the links to their Participation Reports below)

All of those above have registered more than half of their current content with references and most of them are near or above 90% of works with references registered with Crossref.

I can only presume that their workflow does not uniformly make it easy to capture this information.

Yes, that is right.

Another problem I was reminded of with the cited-by service is that crossref is not the only DOI issuer, and apparently the cited-by service only has coverage for DOIs that they issue. As an example, the European Association for Theoretical Computer Science (EATCS) uses the prefix 10.4230 that is serviced by datacite. I presume that this means that crossref would not see references to crossref DOIs from datacite DOIs.

Yes, also correct. Since the reference metadata for a DataCite DOI is registered with DataCite, we won’t match to that DOI. It’s simply not in our corpus.

That’s probably outside the scope of what crossref’s mission is, but anything we can do to make bibliographic information more widely available will benefit scholarly publishing.

It is something we’re asked about and have certainly talked about internally, but it wouldn’t be an easy addition. If it’s good for the scholarly community, I wouldn’t classify it as outside our mission. We are focused on enabling connections between research outputs throughout the ecosystem. I don’t think we want to limit that to only the Crossref ecosystem (outputs registered with us), but there are practical hurdles there (i.e., partnering with a number of other registration agencies) to consider/that we haven’t solved.

-Isaac