Where do formatted references come from?
December 12, 2012
It’s an oddity to me that when publishers try to justify their existence with long lists of the valuable services they provide, they usually skip lightly over one of the few really big ones. For example, Kent Anderson’s exhausting 60-element list omitted it, and it had to be pointed out in a comment by Carol Anne Meyer:
One to add: Enhanced content linking, including CrossREF DOI reference linking, author name linking cited-by linking, related content linking, updates and corrections linking.
(Anderson’s list sidles up to this issue in his #28, “XML generation and DTD migration” and #29, “Tagging”, but doesn’t come right out and say it.)
Although there are a few journals whose PDFs just contain references formatted as in the manuscript — as we did for our arXiv PDF — nearly all mainstream publishers go through a more elaborate process that yields more information and enables the linking that Meyer is talking about. (This is true of the new kids on the block as well as the legacy publishers.)
The reference-formatting pipeline
When I submit a manuscript with formatted reference like:
Taylor, M.P., Hone, D.W.E., Wedel, M.J. and Naish, D. 2011. The long necks of sauropods did not evolve primarily through sexual selection. Journal of Zoology 285(2):150–161. doi:10.1111/j.1469-7998.2011.00824.x
(as indeed I did in that arXiv paper), the publisher will take that reference and break it down into structured data describing the specific paper I was referring to. It does this for various reasons: among them, it needs to provide this information for services like the Web Of Knowledge.
Once it has this structured representation of the reference, the publication process plays it out in whatever format the journal prefers: for example, had our paper appeared in JVP, Taylor and Francis’s publication pipeline would have rendered it:
Taylor, M. P., D. W. E. Hone, M. J. Wedel, and D. Naish. 2011. The long necks of sauropods did not evolve primarily through sexual selection. Journal of Zoology 285:150–161.
(With spaces between multiple initials, initials preceding surnames for all authors except the first, an “Oxford comma” before the last author, no italics for the journal name, no bold for the volume number, the issue number omitted altogether, and the DOI inexplicably removed.)
What’s needed in a submitted reference
Here’s the key point: so long as all the relevant information is included in some format (authors, year, article title, journal title, volume, page-range), it makes no difference how it’s formatted. Because the publication process involves breaking the reference down into its component fields, thus losing all the formatting, before reassembling it in the preferred format.
And this leads us the key question: why do journals insist that authors format their references in journal style at all? All the work that authors do to achieve this is thrown away anyway, when the reference is broken down into fields, so why do it?
And the answer of course is “there is no good reason”. Which is why several journals, including PeerJ, eLife, PLOS ONE and certain Elsevier journals have abandoned the requirement completely. (At the other end of the scale, JVP has been known to reject papers without review for such offences as using the wrong kind of dash in a page-range.)
Like so much of how we do things in scholarly publishing, requiring journal-style formatting at the submission stage is a relic of how things used to be done and makes no sense whatsoever in 2012. Before we had citation databases, the publication pipeline was much more straight-through, and the author’s references could be used “as is” in the final publication. Not any more.
How far can we go?
All of this leads me to wonder how far we can go in cutting down the author burden of referencing. Do we actually need to give all the author/title/etc. information for each reference?
In the case of references that have a DOI, I think not (though I’ve not yet discussed this with any publishers). I think that it suffices to give only the DOI. Because once you have a DOI, you can look up all the reference data. Go try it yourself: go to http://www.crossref.org/guestquery/ and paste my DOI “10.1111/j.1469-7998.2011.00824.x” into the DOI Query box at the bottom of the page. Select the “unixref” radio button and hit the Search button. Scroll down to the bottom of the results page, and voila! — an XML document containing everything you could wish to know about the referenced paper.
And the data in that structured document is of course what the publication process uses to render out the reference in the journal’s preferred style.
Am I missing something? Or is this really all we need?