In a comment on a recent Guardian piece (not mine, but a response to it), Peter Morgan asked:

A separate concern is whether the OA business model is sustainable in the long term of decades or even centuries. By contract, OA content has almost no commercial value, unless it is re-published in a for-profit volume. How confident can we be that the content of an OA journal that goes bankrupt will be preserved in an openly accessible way?

Don’t worry — you can be very confident. Reputable open-access journals arrange for their content to be archived in well-trusted third-party archives such as PubMed Central and CLOCKSS. See for example PeerJ’s blog about the arrangements they’re making or this statement from PLOS ONE.

A much more serious problem is this: what happens to the content of a non-OA journal when it goes bankrupt? In general, copyright for the content of such journals is owned by the publisher. This not only means that informal archive arrangements such as BioTorrents and The Disks Of Millions can’t be used — worse, it means that content archived in PubMed Central or CLOCKSS may never become available. If a failing publisher sells its assets, that will include the copyrights — and since literally any unethical corporation might sniff an asset-stripping opportunity, that could be disastrous.

In short, you can be much more confident that PLOS’s content will still be around in 10, 20 and 100 years than you can that Elsevier’s will.

My new article is up at the Guardian. This time, I have taken off the Conciliatory Hat, and I’m saying it how I honestly believe it is: publishing your science behind a paywall is immoral. And the reasons we use to persuade ourselves it’s acceptable really don’t hold up.

Read Choose open access: publishing your science behind a paywall is immoral

Because for all that we rightly talk about the financial efficiencies of open access, when it comes right down to it OA is primarily a moral, or if you prefer idealogical, issue. It’s not really about saving money, though that’s a welcome side-effect. It’s about doing what’s right.

I’m expecting some kick-back on this one. Fire away; I’ll enjoy the discussion.

After the authors’ own work, the biggest contribution to a published paper is the reviews provided, gratis, by peers. When peer-review works as it’s supposed to, they add significant value to the final paper. But the actual reviews are never seen by anyone except the authors and the handling editor.

This is bad for several reasons.

First, good reviewers don’t get the credit they deserve. That’s unfair on those who do a good job — who generously invest a lot of time and effort in others’ work.

Second, bad reviewers don’t get the blame they deserve. That leaves them free to act in bad faith: blocking papers by people they don’t like, or whose work is critical of their own; or just doing a completely inadequate job. Because there are no negative consequences for doing a bad job, just people have no external incentive to straighten up and fly right.

Third, the effort that goes into reviewing is largely wasted. Often the reviews themselves are significant pieces of work (that’s certainly true when I’m the one giving the review) and the wider community could benefit from seeing them. Frequently reviews contain extended discussion, not only of the paper’s subject matter but of scientific philosophy such as approaches to taxonomy or narrative structure.

Fourth, editors’ decisions remain unexplained. Most editors handle manucripts efficiently and fairly, but there are cases when this isn’t the case — as for example when I was one of three reviewers who wholeheartedly recommended acceptance but the editor rejected the paper. Even discussing that situation was difficult, because the reviews in question were not available for the world to read.

Fifth, and more general than any of the above, the reviewing process is opaque to the world. In times past, logistical reasons such as lack of space in printed journals meant that the sausage-machine approach to the review process was the only feasible one: no-one wants to see what goes into the machine or what goes on inside, we only want the final product. But we like in an increasingly open world, and consensus is that pretty much all processes benefit from openness.

There are various initiatives under way to change the legacy system of reviewing, including F1000 Research and the eLife decision-letter system. But at the moment only a small minority of papers are submitted to such venues.

What to do about the others?

And so I found myself wondering … what would happen if I just unilaterally posted the reviews I receive? I already make pages on this site for each of my published papers (example): it would be easy to extend those pages by also adding:

  • The submitted version of the manuscript
  • All the reviews I received
  • The editor’s decision letter
  • My response letter to the editor
  • The final published paper.

I know this is “not done”. My question is: why not? Is there an actual reason, other than inertia? Wouldn’t we all be better off if this was standard operating procedure?

[Note that this is orthogonal to reviewer anonymity. As it happens, I think that is also a bad thing, but it's independent of what I'm proposing here. I could post an unsigned review as-is, without revealing who wrote it even if I knew.]

It’s an oddity to me that when publishers try to justify their existence with long lists of the valuable services they provide, they usually skip lightly over one of the few really big ones. For example, Kent Anderson’s exhausting 60-element list omitted it, and it had to be pointed out in a comment by Carol Anne Meyer:

One to add: Enhanced content linking, including CrossREF DOI reference linking, author name linking cited-by linking, related content linking, updates and corrections linking.

(Anderson’s list sidles up to this issue in his #28, “XML generation and DTD migration” and #29, “Tagging”, but doesn’t come right out and say it.)

Although there are a few journals whose PDFs just contain references formatted as in the manuscript — as we did for our arXiv PDF — nearly all mainstream publishers go through a more elaborate process that yields more information and enables the linking that Meyer is talking about. (This is true of the new kids on the block as well as the legacy publishers.)

The reference-formatting pipeline

When I submit a manuscript with formatted reference like:

Taylor, M.P., Hone, D.W.E., Wedel, M.J. and Naish, D. 2011. The long necks of sauropods did not evolve primarily through sexual selection. Journal of Zoology 285(2):150–161. doi:10.1111/j.1469-7998.2011.00824.x

(as indeed I did in that arXiv paper), the publisher will take that reference and break it down into structured data describing the specific paper I was referring to. It does this for various reasons: among them, it needs to provide this information for services like the Web Of Knowledge.

Once it has this structured representation of the reference, the publication process plays it out in whatever format the journal prefers: for example, had our paper appeared in JVP, Taylor and Francis’s publication pipeline would have rendered it:

Taylor, M. P., D. W. E. Hone, M. J. Wedel, and D. Naish. 2011. The long necks of sauropods did not evolve primarily through sexual selection. Journal of Zoology 285:150–161.

(With spaces between multiple initials, initials preceding surnames for all authors except the first, an “Oxford comma” before the last author, no italics for the journal name, no bold for the volume number, the issue number omitted altogether, and the DOI inexplicably removed.)

What’s needed in a submitted reference

Here’s the key point: so long as all the relevant information is included in some format (authors, year, article title, journal title, volume, page-range), it makes no difference how it’s formatted. Because the publication process involves breaking the reference down into its component fields, thus losing all the formatting, before reassembling it in the preferred format.

And this leads us the key question: why do journals insist that authors format their references in journal style at all? All the work that authors do to achieve this is thrown away anyway, when the reference is broken down into fields, so why do it?

And the answer of course is “there is no good reason”. Which is why several journals, including PeerJ, eLifePLOS ONE and certain Elsevier journals have abandoned the requirement completely. (At the other end of the scale, JVP has been known to reject papers without review for such offences as using the wrong kind of dash in a page-range.)

Like so much of how we do things in scholarly publishing, requiring journal-style formatting at the submission stage is a relic of how things used to be done and makes no sense whatsoever in 2012. Before we had citation databases, the publication pipeline was much more straight-through, and the author’s references could be used “as is” in the final publication. Not any more.

How far can we go?

All of this leads me to wonder how far we can go in cutting down the author burden of referencing. Do we actually need to give all the author/title/etc. information for each reference?

In the case of references that have a DOI, I think not (though I’ve not yet discussed this with any publishers). I think that it suffices to give only the DOI. Because once you have a DOI, you can look up all the reference data. Go try it yourself: go to http://www.crossref.org/guestquery/ and paste my DOI “10.1111/j.1469-7998.2011.00824.x” into the DOI Query box at the bottom of the page. Select the “unixref” radio button and hit the Search button. Scroll down to the bottom of the results page, and voila! — an XML document containing everything you could wish to know about the referenced paper.

And the data in that structured document is of course what the publication process uses to render out the reference in the journal’s preferred style.

Am I missing something? Or is this really all we need?

Today, PeerJ announced that it will open for submissions on December 3rd — next Monday. That’s great news for anyone who cares about the future of academic publishing: it’s out to make dramatic changes to the publishing workflow, including an integrated preprint server so that people can read your work while it’s in review. And it has every chance of succeeding because it’s run by people with an astonishing track record who know more about how to make open-access publishing successful than anyone in the world, and it has a stellar editorial board.

Oh, and it’s free to publish in forever once you’ve paid a one-off membership fee.

But that’s not why I’m writing. I’m writing because today they also released the instructions for authors, and they contain the following glorious passage:

Formatting tip!

We want authors spending their time doing science, not formatting.

We include reference formatting as a guide to make it easier for editors, reviewers, and PrePrint readers, but will not strictly enforce the specific formatting rules as long as the full citation is clear.

Styles will be normalized by us if your manuscript is accepted.

Having previously ranted extensively about the submission-time reference-formatting burden of every other journal, I can hardly overstate how happy this makes me. I am a scientist, not a secretary. And in 2012, PeerJ is the first journal to acknowledge that.

#tearsOfJoy

Update 1 (an hour later)

Ian Mulvaney pointed out that eLife also does not require a specific style at submission.

And an anonymous commenter pointed me to Free Radical Biology & Medicine‘s “Your Paper, Your Way” approach, which apparently is being piloted before expansion to other Elsevier journals.

So my apologies to both earlier examples that I missed, and kudos to both eLife and Elsevier. What I’d love to see now is the PLOS journals, and others, following the fine examples of these pioneers.

[See part 1, part 2 and part 3 from a few months ago.]

I’m horrified, but not as surprised as I would like to be, by a new paper (Welch 2012) which analyses peer-reviewer recommendations for eight prestigious journals in the field of economics.

The principle finding is that the reviewers’ recommendations were made up of 1/3 signal (i.e. consistent judgements on the quality of the manuscript) and 2/3 noise (i.e. randomness). Of that 2/3 noise, 1/3 was down to reviewer bias (some are nicer, some are nastier) and 2/3 seemed to be purely random.

And to quote directly from the study:

The bias measured by average generosity of the referee on other papers is about as important in predicting a referee’s recommendation as the opinion of another referee on the same paper.

What this means is that the likelihood of a submission being accepted depends more on a coin-toss than it does on how good your work is. Which seems to validate my earlier speculation that

The best analogy for our current system of pre-publication peer-review is that it’s a hazing ritual. It doesn’t exist because of any intrinsic value it has, and it certainly isn’t there for the benefit of the recipient. It’s basically a way to draw a line between In and Out. Something for the inductee to endure as a way of proving he’s made of the Right Stuff.

So: the principle value of peer-review is that it provides an opportunity for authors to demonstrate that they are prepared to undergo peer-review. 

There’s more discussion of this over on the Dynamic Ecology blog.

It’s also well worth reading Brian McGill’s comment on that post: he quotes multiple reviewers of a manuscript that he submitted, completely contradicting each other. Yes, this is merely anecdote, not data; but I have to admit that it chimes with my own experience.

If this research is correct, and if it applies to science as as it does to economics, then here is one horrible consequence: it suggests that best way to get your papers into the high-impact journals that make a career (Science, Nature, etc.) is not necessarily to do great research, but just to be very persistent in submitting everything to them. Keep rolling the dice till you get a double six. I would hate to think that prestige is allocated, and fields are shaped, on that basis.

I’d be really interested to know, from those of you who’ve had papers published in Science or Nature, roughly how many submissions you’ve made for each acceptance in those venues; and to what extent you feel that the ones that were accepted represent your best work.

References

Welch, Ivo. 12 October, 2012. Referee Recommendations. Social Science Research Network. doi:10.2139/ssrn.2137119.

Four things:

1. From the start of 2013, the Royal Society is abandoning issues for its journals (Proc. B, Phil. Trans., Biology Letters and more) and moving to a continuing publishing model — as already used for their open-access journal Open Biology. Excellent news: in a post-print world, issues achieve nothing but the imposition of arbitrary delays. As of next year, the first (online) published version of each Royal Society paper will be the Version Of Record.

2. IEEE, the Institute of Electrical and Electronics Engineers, is launching its own open-access megajournal. This is welcome news, because up till now IEEE has been one of the more access-hostile publishers. (For some reason, the new journal will come out in monthly issues rather than using the PLOS-like continuous publishing model that the Royal Society is adopting. But still.)

3. I really need to get around to writing about why CC BY is the right open-access licence for scholarship, especially given the comments on the last post. But until I do, this post by Claire Redhead, on the Open Access Scholarly Publishers Association site, is a good read.

4. Peter Suber reports that Belgium is following the UK’s lead in converting to open access as the default infrastructure for dissemination of research. Signatories “express their determination to be amongst the frontrunners in this evolution, both at European and worldwide level”.

It’s great to see the gathering momentum around the shift to open access (including the Royal Society’s shift to a less subscription-focussed schedule). What’s most encouraging is that it’s coming from all kinds of stakeholders: governments, other funders, scholarly societies, enlightened publishers, and of course researchers.

Excelsior!

When you start a blog, the natural thing is to want to feel that you’re in control of it, and that means controlling what can be posted there.  But that’s a mistake.  Moderation means that people can’t see their own comments, which is alienating; but more importantly, it means other people can’t see them, which in turn means that all discussion grinds to a halt until such time as you happen to moderate.

What that means is that the site is only really alive when you’re at the keyboard, constantly checking your inbox, so that you notice moderation requests as soon as they come in.  It means you’ll never have the experience of waking up in the morning and finding that a discussion has broken out on your blog.

.

But what about spam?  On a good platform, it’s not a problem.  Since we started SV-POW!, 6,539 comments have been posted, and 3,552 spam comments have been automatically detected and help for moderation.  My and Matt’s manual moderation of those suspected-spam comments shows that detection has been 99.92% accurate: there have been only three false negatives in five and a half years.  There have been 63 false positives, i.e. comments that looked like spam but weren’t.  Those were held for moderation, and passed.

So.  You don’t need to moderate to filter spam, and you don’t want to moderate to control discussion.  Just open it up. (If you’re using a platform with bad spam-filtering, you may have to move. We’re on WordPress.com, and very happy with it, but others platforms may be just as good or better.)

[Note. This is a re-post of the most important part of Tutorial 18: how to have fruitful discussions in your blog’s comments. I'm posting this bit separately so that I can link to this most important part without the distraction of the other parts.]

“But Mike”, you say, “What’s wrong with publishers making a profit?”

Nothing is wrong with publishers making a profit.

PLOS made an operating profit of 21.5% in 2011 (though they plough it back into their mission “to accelerate progress in science and medicine by leading a transformation in research communication”.) BioMed Central also makes a profit, and since they are a for-profit company they get to keep it, distribute it to shareholders, or what have you. Good on them.

If you can make money by publishing research, that’s great.

The issue is not publishers who make money. The issue is corporations that go by the title “publishers”, but which in fact make money by preventing publication.

Because “publish” means “make public”. The whole point of a publisher is to make things public. The reason the scientists of 30 years ago sent their papers to a publisher was because having a publisher print them on paper and ship them around the world was the most effective way to make them public. And subscriptions were the obvious way to pay for that work. But now that anything can be made public instantly — “Publishing is not a job any more, it’s a button”giving papers to a “publisher” that locks them behind a firewall is the opposite of publishing. It’s privating.

Yesterday we saw an appalling demonstration of why this is so important. The barrier-based textbook publisher Pearson found that in 2007 a teacher had posted a copy of the Beck Hopelessness Scale on his blog. It’s a 20-question list, intended to help prevent suicide, and totals 279 words. It was published in 1974, and Pearson holds the copyright, selling copies  for $120 – $6 per question, or 43¢ per word.

So naturally Pearson saw their profits being eaten into by the free availability of the Beck Scale. Naturally, rather than contacting the blog author, or the network that it’s part of, they sent a DMCA takedown notice to ServerBeach, who host the web server that the blog was on. And naturally ServerBeach shut down the entire site twelve hours later.

This site, Edublogs, is home to 1,451,943 teacher and student blogs. Yes, you read that right. One and a half million blogs.

So to recap: because a teacher five years ago posted a copy of 279-word, 38-year-old questionnaire that costs $120, the publisher shut down 1.5 million blogs. That works out at 0.008¢ per blog.

We could talk all day about all the things that went wrong here — the ludicrously unbalanced DMCA (“half a DeMoCrAcy”), the idiot response of ServerBeach — but I want to focus on one issue. The reason Pearson issued a DMCA takedown is because they make their money by preventing access. It’s the nature of the beast. If your business model is to prevent people from making things public, then this kind of thing is inevitable. Whereas it is literally impossible for PLOS or BMC ever to perpetrate this kind of idiocy because their business model is to make things public. When someone else takes a thing that they have made public and makes it more public, then great! No-one has to issue any DMCA takedowns!

And this is why there is a fundamental, unbridgeable divide between open-access publishers and barrier-based publishers. It’s why no amount of special programmes, limited-time zero-cost access options, reductions in subscription rates, access to back-issues and so on will ever really make any difference. The bottom line is that we want one thing — access to research — and barrier-based “publishers” want the exact opposite.

However nice they are, however much their hearts are in the right place, they want one thing and we want the opposite. And that just won’t do.

They’re going to have to go. All of them.

As things stand there are two principal types of written communication in science: papers and blog posts. We’ve discussed the relative merits of formally published papers and more informal publications such as blog-posts a couple of times, but perhaps never really dug into what the differences are between them.

Matt and I have been discussing this offline, and at one point Matt suggested that authorial intent is one of the key differences. When we write and submit a paper, we are sending a different message from when we post on a blog.

That’s true — at least in general, although there are edge-cases such as the formal research paper that Zen Faulkes recently posted as an entry on his blog. But even when it’s true, I’m not sure it’s relevant. As Matt pointed out, authorial intent ceases to be a factor once something is published. The audience will read it how they like and do with it what they want. So I think we need to consider the paper-vs.-blog-post question in terms of the artifact itself, and discount what the author intended.

When we do that, what differences do we see? Generalising, we find that:

  • Papers are PDF while blog-posts are HTML. (That’s not quite a trivial distinction: PDFs have less clutter.)
  • Blog-posts allow and invite comments, but papers do not.
  • Blog-posts are part of an ongoing discussion whereas papers are stand-alone.
  • Papers are archived on publisher sites, whereas blog-posts are on blogs, which may be more vulnerable or ephemeral.
  • Papers are immutable once published, whereas blog-posts can be edited after initial publication
  • Papers are peer-reviewed, while blog-posts are not.
  • Blog-posts are fast, but papers are slow.

Which of these are important? Which count as wins for papers and which as wins for blog-posts? Which of them are tied together with each other? Which are fundamentally properties of the medium, and which are associated with it only by tradition?

Comments, please!

Follow

Get every new post delivered to your Inbox.

Join 191 other followers