Towards the long-overdue open graph of citations

April 7, 2017

It’s baffled me for years that there is no open graph of scholarly citations — a set of machine-readable statements that (for example) Taylor et al. 2009 cites Stevens and Parrish 1999, which cites Alexander 1985 and Hatcher 1901.

With such a graph, you would be able to answer question like “what subsequent publications have cited my 2005 paper but not my 2007 paper?” and of course “Has paper X been rebutted in print, or do I need to do it?”

At a more basic level, it’s ridiculous that every one of us maintains our own citation database for our own work. It makes no sense that there isn’t a single, global, universally accessible citation database which all of us can draw from for our bibliographies.

Today we welcome the Initiative for Open Citations (I4OC), which is going to fix that. I’m delighted that someone is stepping up to the plate. It’s been a critical missing piece of scholarly infrastructure.

As far as I can see, I4OC is starting out by encouraging publishers to sign up for CrossRef’s existing Cited-by service. This is a great way to capture citation information going forward; but I hope they also have plans for back-filling the last few centuries’ citations. There are a lot of ways this could be done, but one would be crowdsourcing contributions. They have good people involved, so I’m optimistic that they’ll get on this.

By the way, this kind of thing — machine-readable data — is one area where preprints genuinely lose out compared to publisher-mediated versions of articles. Publishers on the whole don’t do nearly enough to earn their very high fees, but one very real contribution they do make is the process that is still, for historical reasons, known as “typesetting” — transforming a human-readable manuscript into a machine-readable one from which useful data can be extracted. I wonder whether preprint repositories of the future will have ways to match this function?

6 Responses to “Towards the long-overdue open graph of citations”

  1. Stuart Taylor Says:

    Mike – just re your final sentence, the “Central Service” project from ASAPBio intends to do just that, i.e. turn the predominantly PDF corpus of preprints into XML

    http://asapbio.org/rfa

  2. Mike Taylor Says:

    That is great aspiration on ASAPBio’s part, but it strikes me as one quite orthogonal to the primary goal of their Central Services. (“We do the work; you do the pleasure … pleasure”)

  3. Andrew Stuck Says:

    The “Buried Treasure” series Dave Hone is doing on his blog really highlights the downside to the current state of affairs. The majority of guest papers he’s featured so far aren’t insignificant, and really should get more traction. I assume relevant researchers who could’ve cited said papers simply didn’t cast a wide enough net to find these important studies.

  4. Fair Miles Says:

    As was discussed in comments under your post @ https://svpow.com/2015/06/11/how-much-does-typesetting-cost/ it can much easier if the document was already structured when submitted for preprint (e.g., XML JATS).You can have specific software (desktop or web-based) provided by the preprint service/server to do it before/when submitting. I guess in the future it will be useful to replace writing tools (e.g., Overleaf https://www.overleaf.com), but in the mean time it can help its transformation from text or html (e.g., Marcalyc http://marcalyc.redalyc.org/ayuda/)
    Setup can done by the authors or by their institutions (e.g., libraries) as a service. A big database would be required to identify and mark references (e.g., DOI).
    You then link preprint/institutional servers with open peer-review services on machine-readable (and machine-printable) documents and… journals? what for?
    [though I imagine they’ll still be of great service if they transform from pre-communication-filters (or validation at the entry point) to post-communication-filters (to pick-up/comment/enrich/connect highlights in the “Faculty of 1000” style)]

  5. Mike Taylor Says:

    Of course, if preprints are submitted as JATS XML, everything can work nicely. But I have certainly never submitted a preprint in that format, and I doubt that anyone in palaeo ever has. Have you?

  6. Fair Miles Says:

    No, in fact I have never submitted a preprint at all… But if to produce a manuscript I first learnt to use a Commodore64, and then a PC with DOS, and then Windows, and then a word processor, and then a WYSIWYG word processor, and then everything again in open source software (in Linux), I guess I will be able to learn any new software that produces them from scratch or from my preferred (usual) format!
    Anyway, I do have some experience with markup to produce machine-readable documents. Though when I did it was rather cumbersome (macros in Word97 for SciELO format) it can be certainly done much more user-friendly (as Redalyc is doing nowadays with marcalyc to support its publications). So XML-JATS or anything like that is available as much as open repositories, publication software, open-review in validation/rewards platforms (think a mix of StackExchange with ResearchGate), etc. With some perspective (I already admitted starting with a C64!) you can see it moving pretty fast. If only the scientific+publication system was really interested…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: