Short post today. Go and read this paper: Academic urban legends (Rekdal 2014). It’s open access, and an easy and fascinating read. It unfolds a tale of good intentions gone wrong, a chain of failure, illustrating an important single crucial point of academic behaviour: read what you cite.

References

Rekdal, Ole Bjørn. 2014. Academic urban legends. Social Studies of Science 44(4):638-654. doi: 10.1177/0306312714535679

 

The LSE Impact blog has a new post, Berlin 11 satellite conference encourages students and early stage researchers to influence shift towards Open Access. Thinking about this,  Jon Tennant (@Protohedgehog) just tweeted this important idea:

Would be nice to see a breakdown of OA vs non-OA publications based on career-stage of first author. Might be a wake-up call.

It would be very useful. It makes me think of Zen Faulkes’s important 2011 blog-post, What have you done lately that needed tenure?. We should be seeing the big push towards open access coming from senior academics who are established in their roles don’t need to scrabble around for jobs like early-career researchers. Yet my impression is that in fact early-career researchers are doing a lot of the pro-open heavy lifting.

Is that impression true?

We should find out.

Here’s one possible experimental design: take a random sample of 100 Ph.D students, 100 post-docs, 100 early-career researchers in tenure-track jobs and 100 tenured researchers. For each of them, analyse their last ten years of publications and determine what proportion are paywalled, what proportion are free to read (e,g, on arXiv or in an all-rights-reserved IR), and what proportion are true (BOAI-compliant) open access.

An alternative approach would be to randomly sample 1000 open-access papers (from PLOS and BMC journals, for example), and 1000 paywalled papers (from Elsevier and Springer, say) and find the career-stage of their authors. I’m not sure which approach would be better?

Who is going to do this?

I think it would be a nice, tractable first project for someone who wants to get into academic research but hasn’t previously published. It would be hugely useful, and I’m guessing widely cited. Does anyone fancy it?

Update

Georg Walther has started a hackpad about this nascent project. Since Jon “Protohedgehog” Tennant has now tweeted about it, I assume it’s OK to publicise. If you’re interested, feel free to leap in!

I was astonished yesterday to read Understanding and addressing research misconduct, written by Linda Lavelle, Elsevier’s General Counsel, and apparently a specialist in publication ethics:

While uncredited text constitutes copyright infringement (plagiarism) in most cases, it is not copyright infringement to use the ideas of another. The amount of text that constitutes plagiarism versus ‘fair use’ is also uncertain — under the copyright law, this is a multi-prong test.

So here (right in the first paragraph of Lavelle’s article) we see copyright infringement equated with plagiarism. And then, for good measure, the confusion is hammered home by the depiction of fair use (a defence against accusations of copyright violation) depicted as a defence against accusations of plagiarism.

This is flatly wrong. Plagiarism and copyright violation are not the same thing. Not even close.

First, plagiarism is a violation of academic norms but not illegal; copyright violation is illegal, but in truth pretty ubiquitous in academia. (Where did you get that PDF?)

Second, plagiarism is an offence against the author, while copyright violation is an offence against the copyright holder. In traditional academic publishing, they are usually not the same person, due to the ubiquity of copyright transfer agreements (CTAs).

Third, plagiarism applies when ideas are copied, whereas copyright violation occurs only when a specific fixed expression (e.g. sequence of words) is copied.

Fourth, avoiding plagiarism is about properly apportioning intellectual credit, whereas copyright is about maintaining revenue streams.

Let’s consider four cases (with good outcomes is green and bad ones in red):

  1. I copy big chunks of Jeff Wilson’s (2002) sauropod phylogeny paper (which is copyright the Linnean Society of London) and paste it into my own new paper without attribution. This is both plagiarism against Wilson and copyright violation against the Linnean Society.
  2. I copy big chunks of Wilson’s paper and paste it into mine, attributing it to him. This is not plagiarism, but copyright violation against the Linnean Society.
  3. I copy big chunks of Rigg’s (1904) Brachiosaurus monograph (which is out of copyright and in the public domain) into my own new paper without attribution. This is plagiarism against Riggs, but not copyright violation.
  4. I copy big chunks of Rigg’s paper and paste it into mine with attribution. This is neither plagiarism nor copyright violation.

Plagiarism is about the failure to properly attribute the authorship of copied material (whether copies of ideas or of text or images). Copyright violation is about failure to pay for the use of the material.

Which of the two issues you care more about will depend on whether you’re in a situation where intellectual credit or money is more important — in other words, whether you’re an author or a copyright holder. For this reason, researchers tend to care deeply when someone plagiarises their work but to be perfectly happy for people to violate copyright by distributing copies of their papers. Whereas publishers, who have no authorship contribution to defend, care deeply about copyright violation.

One of the great things about the Creative Commons Attribution Licence (CC By) is that it effectively makes plagiarism illegal. It requires that attribution be maintained as a condition of the licence; so if attribution is absent, the licence does not pertain; which means the plagiariser’s use of the work is not covered by it. And that means it’s copyright violation. It’s a neat bit of legal ju-jitsu.

References

  • Riggs, Elmer S. 1904. Structure and relationships of opisthocoelian dinosaurs. Part II, the Brachiosauridae. Field Columbian Museum, Geological Series 2:229-247, plus plates LXXI-LXXV.
  • Wilson, Jeffrey A. 2002. Sauropod dinosaur phylogeny: critique and cladistic analysis. Zoological Journal of the Linnean Society 136:217-276.
Schachner et al 2013 fig-13-full

Schachner et al. (2013: Figure 13): Diagrammatic representations of the crocodilian (A) and avian (B) lungs in left lateral view with colors identifying proposed homologous characters within the bronchial tree and air sac system of both groups. The image of the bird is modified from Duncker (1971). Abbreviations: AAS, abdominal air sac; CAS, cervical air sac; CRTS, cranial thoracic air sac; CSS, caudal sac-like structure; CTS, caudal thoracic air sac; d, dorsobronchi; GL, gas-exchanging lung; HS, horizontal septum; IAS, interclavicular air sac; L, laterobronchi; NGL, non-gas-exchanging lung; ObS, oblique septum; P, parabronchi; Pb, primary bronchus; Tr, trachea; v, ventrobronchi.

Gah! No time, no time. I am overdue on some things, so this is a short pointer post, not the thorough breakdown this paper deserves. The short, short version: Schachner et al. (2013) is out in PeerJ, describing airflow in the lungs of Nile crocs, and showing how surprisingly birdlike croc lungs actually are. If you’re reading this, you’re probably aware of the papers by Colleen Farmer and Kent Sanders a couple of years ago describing unidirectional airflow in alligator lungs. Hang on to your hat, because this new work is even more surprising.

I care about this not only because dinosaurian respiration is near and dear to my heart but also because I was a reviewer on this paper, and I am extremely happy to say that Schachner et al. elected to publish the review history alongside the finished paper. I am also pleasantly surprised, because as you’ll see when you read the reviews and responses, the process was a little…tense. But it all worked out well in the end, with a beautiful, solid paper by Schachner et al., and a totally transparent review process available for the world to see. Kudos to Emma, John, and Colleen on a fantastic, important paper, and for opting for maximal transparency in publishing!

UPDATE the next morning: Today’s PeerJ Blog post is an interview with lead author Emma Schachner, where it emerges that open review was one of the major selling points of PeerJ for her:

Once I was made aware of the transparent peer review process, along with the fact that the journal is both open access and very inexpensive to publish in, I was completely sold. [...] The review process was fantastic. It was transparent and fast. The open review system allowed for direct communication between the authors and reviewers, generating a more refined final manuscript. I think that having open reviews is a great first step towards fixing the peer review system.

That post also links to this one, so now the link cycle is complete.

Reference

Schachner, E.R., Hutchinson, J.R., and Farmer, C.G. 2013. Pulmonary anatomy in the Nile crocodile and the evolution of unidirectional airflow in Archosauria. PeerJ 1:e60 http://dx.doi.org/10.7717/peerj.60

It’s an oddity to me that when publishers try to justify their existence with long lists of the valuable services they provide, they usually skip lightly over one of the few really big ones. For example, Kent Anderson’s exhausting 60-element list omitted it, and it had to be pointed out in a comment by Carol Anne Meyer:

One to add: Enhanced content linking, including CrossREF DOI reference linking, author name linking cited-by linking, related content linking, updates and corrections linking.

(Anderson’s list sidles up to this issue in his #28, “XML generation and DTD migration” and #29, “Tagging”, but doesn’t come right out and say it.)

Although there are a few journals whose PDFs just contain references formatted as in the manuscript — as we did for our arXiv PDF — nearly all mainstream publishers go through a more elaborate process that yields more information and enables the linking that Meyer is talking about. (This is true of the new kids on the block as well as the legacy publishers.)

The reference-formatting pipeline

When I submit a manuscript with formatted reference like:

Taylor, M.P., Hone, D.W.E., Wedel, M.J. and Naish, D. 2011. The long necks of sauropods did not evolve primarily through sexual selection. Journal of Zoology 285(2):150–161. doi:10.1111/j.1469-7998.2011.00824.x

(as indeed I did in that arXiv paper), the publisher will take that reference and break it down into structured data describing the specific paper I was referring to. It does this for various reasons: among them, it needs to provide this information for services like the Web Of Knowledge.

Once it has this structured representation of the reference, the publication process plays it out in whatever format the journal prefers: for example, had our paper appeared in JVP, Taylor and Francis’s publication pipeline would have rendered it:

Taylor, M. P., D. W. E. Hone, M. J. Wedel, and D. Naish. 2011. The long necks of sauropods did not evolve primarily through sexual selection. Journal of Zoology 285:150–161.

(With spaces between multiple initials, initials preceding surnames for all authors except the first, an “Oxford comma” before the last author, no italics for the journal name, no bold for the volume number, the issue number omitted altogether, and the DOI inexplicably removed.)

What’s needed in a submitted reference

Here’s the key point: so long as all the relevant information is included in some format (authors, year, article title, journal title, volume, page-range), it makes no difference how it’s formatted. Because the publication process involves breaking the reference down into its component fields, thus losing all the formatting, before reassembling it in the preferred format.

And this leads us the key question: why do journals insist that authors format their references in journal style at all? All the work that authors do to achieve this is thrown away anyway, when the reference is broken down into fields, so why do it?

And the answer of course is “there is no good reason”. Which is why several journals, including PeerJ, eLifePLOS ONE and certain Elsevier journals have abandoned the requirement completely. (At the other end of the scale, JVP has been known to reject papers without review for such offences as using the wrong kind of dash in a page-range.)

Like so much of how we do things in scholarly publishing, requiring journal-style formatting at the submission stage is a relic of how things used to be done and makes no sense whatsoever in 2012. Before we had citation databases, the publication pipeline was much more straight-through, and the author’s references could be used “as is” in the final publication. Not any more.

How far can we go?

All of this leads me to wonder how far we can go in cutting down the author burden of referencing. Do we actually need to give all the author/title/etc. information for each reference?

In the case of references that have a DOI, I think not (though I’ve not yet discussed this with any publishers). I think that it suffices to give only the DOI. Because once you have a DOI, you can look up all the reference data. Go try it yourself: go to http://www.crossref.org/guestquery/ and paste my DOI “10.1111/j.1469-7998.2011.00824.x” into the DOI Query box at the bottom of the page. Select the “unixref” radio button and hit the Search button. Scroll down to the bottom of the results page, and voila! — an XML document containing everything you could wish to know about the referenced paper.

And the data in that structured document is of course what the publication process uses to render out the reference in the journal’s preferred style.

Am I missing something? Or is this really all we need?

I just saw this tweet from palaeohistologist Sarah Werning, and it summed up what science is all about so well that I wanted to give it wider and more permanent coverage:

This is exactly right. Kudos to Sarah for saying it so beautifully.

(Sarah’s work can most recently be seen in Nesbitt et al.’s (2012) paper on a newly recognised ancient dinosaur or near dinosaur relative, and especially in the high-resolution supplementary images that she deposited at MorphoBank.)


[backup image]

In a third “open letter to the mathematics community”, Elsevier have announced that, for “the primary mathematics journals”, they now offer free access to all articles over four years old. The details page shows that 53 journals are involved.

I like to give credit where it’s due, and this is a significant move. It’s much more important than the initiatives we hear of from time to time when access to various journals is offered for a limited window: it means there is a substantial body of work that will now be freely and permanently available.

In a comment on John Baez’s Google+ post, Joerg Fliege comments:

One should also mention that opening up access to a handful of older issues of math journals will not affect the bottom line of Elsevier’s revenue much. They are giving something away that, in the greater scheme of things, has essentially a business value near 0.

How kind of them.

But I think this is unnecessarily cynical and negative. A move like this should be judged not on what it costs Elsevier to do, but on the benefit that it gives the research community. If they can find things to do that cost them little or nothing but provide a real benefit, then that’s all to the good — as I argued in the How Elsevier Can Save Itself posts [part 0, part 1, part 2, part 3]. They should not be criticised for that!

That said, Baez does raise a crucial question in that Google+ post:

Why just math journals? Because we’re the ones who are making the most noise! Folks from many other sciences have joined the boycott – but you need some leaders in your field to get aggressive if you want to get Elsevier to do you a favor like this.

An important challenge for Elsevier right now is to prove that they are really making an effort to contribute to the progress of research across the board, rather then just trying to buy off the mathematical community which has caused them the most irritation up to this point.

Can they meet that challenge?

Follow

Get every new post delivered to your Inbox.

Join 391 other followers