October 4, 2015
Preprints are in the air! A few weeks ago, Stephen Curry had a piece about them in the Guardian (Peer review, preprints and the speed of science) and pterosaur palaeontologist Liz Martin published Preprints in science on her blog Musings of Clumsy Palaeontologist. The latter in particular has spawned a prolific and fascinating comment stream. Then SV-POW!’s favourite journal, PeerJ, weighed in on its own blog with A PeerJ PrePrint – so just what is that exactly?.
Following on from that, I was invited to contribute a guest-post to the PeerJ blog: they’re asking several people about their experiences with PeerJ Preprints, and publishing the results in a series. I started to write my answers in an email, but they soon got long enough that I concluded it made more sense to write my own post instead. This is that post.
As a matter of fact, I’ve submitted four PeerJ preprints, and all of them for quite different reasons.
1. Barosaurus neck. I and Matt submitted the Barosaurus manuscript as a preprint because we wanted to get feedback as quickly as possible. We certainly got it: four very long detailed comments that were more helpful than most formally solicited peer-reviews that I’ve had. (It’s to our discredit that we didn’t then turn the manuscript around immediately, taking those reviews into a account. We do still plan to do this, but other things happened.)
2. Dinosaur diversity. Back in 2004 I submitted my first ever scientific paper, a survey of dinosaur diversity broken down in various ways. It was rejected (for what I thought were spurious reasons, but let it pass). The more time that passed, the more out of date the statistics became. As my interests progressed in other directions, I reached the point of realising that I was never going to get around to bringing that paper up to date and resubmitting it to a journal. Rather than let it be lost to the world, when I think it still contains much that is of interest, I published it as a pre-print (although it’s not pre- anything: what’s posted is the final version).
3. Cartilage angles. Matt and I had a paper published on PLOS ONE in 2013, on the effect that intervertebral cartilage had on sauropod neck posture. Only after it was published did I realise that there was a very simple way to quantify the geometric effect. I wrote what was intended to be a one-pager on that, planning to issue it as a sort of erratum. It ended up much longer than expected, but because I considered it to be material that should really have been in the original PLOS ONE paper, I wanted to get it out as soon as possible. So as soon as the manuscript was ready, I submitted it simultaneously as a preprint and onto the peer-review track at PeerJ. (It was published seven weeks later.)
4. Apatosaurine necks. Finally, I gave a talk at this year’s SVPCA (Symposium on Vertebrate Palaeontology and Comparative Anatomy), based on an in-progress manuscript in which I am second author to Matt. The proceedings of the symposium are emerging as a PeerJ Collection, and I and the other authors wanted our paper to be a part of that collection. So I submitted the abstract of the talk I gave, with the slide-deck as supplementary information. In time, this version of the preprint will be superseded by the completed manuscript, and eventually (we hope) by the peer-reviewed paper.
So the thing to take away from this is that there are lots of reasons to publish preprints. They open up different ways of thinking about the publication process.
September 10, 2015
Wouldn’t it be great if, after a meeting like the 2015 SVPCA, there was a published set of proceedings? A special issue of a journal, perhaps, that collected papers that emerge from the work presented there.
Of course the problem with special issues, and edited volumes in general, is that they take forever to come out. After the Dinosaurs: A Historical Perspective conference on 6 May 2008, I got my talk on the history of sauropod research written up and submitted on 7 August, just over three months later. It took another five and a half months to make it through peer-review to acceptance. And then … nothing. It sat in limbo for a year and nine months before it was finally published, because of course the book couldn’t be finalised until the slowest of the 50 or so authors, editors and reviewers had done their jobs.
There has to be a better way, doesn’t there?
Rhetorical question, there. There is a better way, and unsurprisingly to regular readers, it’s PeerJ that has pioneered it. In PeerJ Collections, papers can be added at any time, and each one is published as it’s ready. Better still, the whole lifecycle of the paper can (if the authors wish) be visible from the collection. You can start by posting the talk abstract, then replace it with a preprint of the complete manuscript when it’s ready, and finally replace that with the published version of the paper once it’s been through peer-review.
Take a look, for example, at the collection for the 3rd International Whale Shark Conference (which by the way was held at the Georgia Aquarium, Atlanta, which has awesome whale sharks on view.)
As you can see from the collection (at the time of writing), only one of the constituent papers — Laser photogrammetry improves size and demographic estimates for whale sharks — has actually been published so far. But a dozen other papers exist in preprint form. That means that the people who attended the conference, saw the talks and want to refer to them in their work have something to cite.
The hot news is that Mark Young and the other SVPCA 2015 organisers have arranged for PeerJ to set up an SPPC/SVPCA 2015 Collection. I think this is just marvellous — the best possible way to make a permanent record of an important event.
The collection is very new: at the time of writing, it hosts only five abstracts (one of them ours). We’re looking forward to seeing others added. Some of the abstracts (including ours) have the slides of the talk attached as supplementary information.
Although I’m lead author on the talk (because I prepared the slides and delivered the presentation), this project is really Matt’s baby. There is a Wedel et al. manuscript in prep already, so we hope that within a month or two we’ll be able to replace the abstract with a complete manuscript. Then of course we’ll put it through peer-review.
I hope plenty of other SVPCA 2015 speakers will do the same. Even those who, for whatever reason, don’t want to publish their work in PeerJ, can use the collection as a home for their abstracts and preprints, then go off and submit the final manuscript elsewhere.
June 11, 2015
We as a community often ask ourselves how much it should cost to publish an open-access paper. (We know how much it does cost, roughly: typically $3000 with a legacy publisher, or an average of $900 with a born-open publisher, or nothing at all for many journals.)
We know that peer-review is essentially free to publishers, being donated free by scholars. We know that most handling editors also work for free or for peanuts. We know that hosting things on the Web is cheap (“publishing [in this sense] is just a button“).
Publishers have costs associated with rejecting manuscripts — checking that they’re by real people at real institutions, scanning for obvious pseudo-scholarship, etc. But let’s ignore those costs for now, as being primarily for the benefit of the publishers rather than the author. (When I pay a publisher an APC, they’re not serving me directly by running plagiarism checks.)
The tendency of many discussions I’ve been involved with has been that the main technical contribution of publishers is the process that is still, for historical reasons, known as “typesetting” — that is, the transformation of the manuscript from from an opaque form like an MS-Word file (or indeed a stack of hand-written sheets) into a semantically rich representation such as JATS XML. From there, actual typesetting into HTML or a pretty PDF can be largely automated.
So: what does it cost to typeset a manuscript?
First data point: I have heard that Kaveh Bazargan’s River Valley Technologies (the typesetter that PeerJ and many more mainstream publishers use) charges between £3.50 and £9 per page, including XML, graphics, PDF generation and proof correction.
Second data point: in a Scholarly Kitchen post that Kent Anderson intended as a criticism of PubMed Central but which in fact makes a great case for what good value it provides, he quotes an email from Kent A. Smith, a former Deputy Director of the NLM:
Under the % basis I am using here $47 per article. John [Mullican, a program analyst at NCBI] and I looked at this yesterday and based the number on a sampling of a few months billings. It consists on the average of about $34-35 per tagged article plus $10-11 for Q/A plus administrative fees of $2-3, where applicable.
Using the quoted figure of $47 per PMC article and the £6.25 midpoint of River Valley’s range of per-page prices (= $9.68 per page), that would be consistent with typical PMC articles being a bit under five pages long. The true figure is probably somewhat higher — maybe twice as long or more — but this seems to be at least in the same ballpark.
Third data point: Charles H. E. Ault, in a comment on that Scholarly Kitchen post, wrote:
As a production director at a small-to-middling university press that publishes no journals, I’m a bit reluctant to jump into this fray. But I must say that I am astonished at how much PMC is paying for XML tagging. Most vendors looking for the small amount of business my press can offer (say, maybe 10,000 pages a year at most) charge considerably less than $0.50 per page for XML tagging. Assuming a journal article is about 30 pages long, it should cost no more than $15 for XML tagging. Add another few bucks for quality assurance, and you might cross the $20 threshold. Does PMC have to pay a federally mandated minimum rate, like bridge construction projects? Where can I submit a bid?
I find the idea of 50-cent-per-page typesetting hard to swallow — it’s more than an order of magnitude cheaper than the River Valley/PMC level, and I’d like to know more about Ault’s operation. Is what they’re doing really comparable with what the others are doing?
Are there other estimates out there?
May 26, 2015
Provoked by Mike Eisen’s post today, The inevitable failure of parasitic green open access, I want to briefly lay out the possible futures of scholarly publishing as I see them. There are two: one based on what we now think of as Gold OA, and one on what we think of as Green OA.
Eisen is of course quite right that the legacy publishers only ever gave their blessing to Green OA (self-archiving) so long as they didn’t see it as a threat, so the end of that blessing isn’t a surprise. (I think it’s this observation that Richard Poynder misread as “an OA advocate legitimising Elsevier’s action”!) It was inevitable that this blessing would be withdrawn as Green started to become more mainstream — and that’s exactly what we’ve seen, with Elsevier responding to the global growth in Green OA mandates with a regressive new policy that has rightly attracted the ire of everyone who’s looked closely at it.
So I agree with him that what he terms “parasitic Green OA” — self-archiving alongside the established journal system — is ultimately doomed. The bottom line is that while we as a community continue to give control of our work to the legacy publishers — follow closely here — legacy publishers will control our work. We know that these corporations’ interests are directly opposed to those of authors, science, customers, libraries, and indeed everyone but themselves. So leaving them in control of the scholarly record is unacceptable.
What are our possible futures?
We may find that in ten years’ time, all subscriptions journals are gone (perhaps except from a handful of boutique journals that a few people like, just as a few people prefer the sound of vinyl over CDs or MP3s).
We may find that essentially all new scholarship is published in open-access journals such as those of BioMed Central, PLOS, Frontiers and PeerJ. That is a financially sustainable path, in that publishers will be paid for the services they provide through APCs. (No doubt, volunteer-run and subsidised zero-APC journals will continue to thrive alongside them, as they do today.)
We may even find that some of the Gold OA journals of the future are run by organisations that are presently barrier-based publishers. I don’t think it’s impossible that some rump of Elsevier, Springer et al. will survive the coming subscription-journals crash, and go on to compete on the level playing-field of Gold OA publishing. (I think they will struggle to compete, and certainly won’t be able to make anything like the kind of money they do now, but that’s OK.)
This is the Gold-OA future that Mike Eisen is pinning his hopes on — and which he has done as much as anyone alive to bring into existence. I would be very happy with that outcome.
While I agree with Eisen that what he terms “parasitic Green” can’t last — legacy publishers will stamp down on it as soon as it starts to be truly useful — I do think there is a possible Green-based future. It just doesn’t involve traditional journals.
One of the striking things about the Royal Society’s recent Future of Scholarly Scientific Communication meetings was that during the day-two breakout session, so many of the groups independently came up with more or less the same proposal. The system that Dorothy Bishop expounded in the Guardian after the second meeting is also pretty similar — and since she wasn’t at the first meeting, I have to conclude that she also came up with it independently, further corroborating the sense that it’s an approach whose time has come.
(In fact, I started drafting an SV-POW! myself at that meeting describing the system that our break-out group came up with. But that was before all the other groups revealed their proposals, and it became apparent that ours was part of a blizzard, rather than a unique snowflake.)
Here are the features characterising the various systems that people came up with. (Not all of these features were in all versions of the system, but they all cropped up more than once.)
- It’s based around a preprint archive: as with arXiv, authors can publish manuscripts there after only basic editorial checks: is this a legitimate attempt at scholarship, rather than spam or a political opinion?
- Authors solicit reviews, as we did for for Barosaurus preprint, and interested others can offer unsolicited reviews.
- Reviewers assign numeric scores to manuscripts as well as giving opinions in prose.
- The weight given to review scores is affected by the reputation of reviewers.
- The reputation of reviewers is affected by other users’ judgements about their comments, and also by their reputation as authors.
- A stable user reputation emerges using a pagerank-like feedback algorithm.
- Users can acquire reputation by authoring, reviewing or both.
- Manuscripts have a reputation based on their score.
- There is no single moment of certification, when a manuscript is awarded a “this is now peer-reviewed” bit.
I think it’s very possible that, instead of the all-Gold future outlined above, we’ll land up with something like this. Not every detail will work out the way I suggested here, of course, but we may well get something along these lines, where the emphasis is on very rapid initial publication and continuously acquired reputation, and not on a mythical and misleading “this paper is peer-reviewed” stamp.
(There are a hundred questions to be asked and answered about such systems: do we want one big system, or a network of many? If the latter, how will they share reputation data? How will the page-rank-like reputation algorithm work? Will it need to be different in different fields of scholarship? I don’t want to get sidetracked by such issues at this point, but I do want to acknowledge that they exist.)
Is this “Green open access”? It’s not what we usually mean by the term; but in as much as it’s about scholars depositing their own work in archives, yes, it’s Green OA in a broader sense.
(I think some confusion arises because we’ve got into the habit of calling deposited manuscripts “preprints”. That’s a misnomer on two counts: they’re not printed, and they needn’t be pre-anything. Manuscripts in arXiv may go onto be published in journals, but that’s not necessary for them to be useful in advancing scholarship.)
So where now? We have two possible open-access futures, one based on open-access publishing and one based on open-access self-archiving. For myself, I would be perfectly happy with either of these futures — I’m not particularly clear in my own mind which is best, but they’re both enormously better than what we have today.
A case can be made that the Green-based future is maybe a better place to arrive, but that the Gold-based future makes for an easier transition. It doesn’t require researchers to do anything fundamentally different from what they do today, only to do it in open-access journals; whereas the workflow in the Green-based approach outlined above would be a more radical departure. (Ironically, this is the opposite of what has often been said in the past: that the advantage of Green is that it offers a more painless upgrade path for researchers not sold on the importance of OA. That’s only true so long as Green is, in Eisen’s terms, “parasitic” — that is, so long as the repositories contain only second-class versions of papers that have been published conventionally behind paywalls.)
In my own open-access advocacy, then, I’m always unsure whether to push Gold or Green. In my Richard Poynder interview, when asked “What should be the respective roles of Green and Gold OA?” I replied:
This actually isn’t an issue that I get very excited about: Open is so much more important than Green or Gold. I suppose I slightly prefer Gold in that it’s better to have one single definitive version of each article; but then we could do that with Green as well if only we’d stop thinking of it as a stopgap solution while the “real” article remains behind paywalls.
Two and a half years on, I pretty much stand by that (and also by the caveats regarding the RCUK policy’s handing of Gold and Green that followed this quote in the interview.)
But I’m increasingly persuaded that the variety of Green OA that we only get by the grace and favour of the legacy publishers is not a viable long-term strategy. Elsevier’s new regressive policy was always going to come along eventually, and it won’t be the last shot fired in this war. If Green is going to win the world, it will be by pulling away from conventional journals and establishing itself as a valid mode of publication in its own right. (Again, much as arXiv has done.)
Here’s my concern, though. Paul Royser’s response to Eisen’s post was “Distressing to see the tone and rancor of OA advocates in disagreement. My IR is a “parasite”? Really?” Now, I think that comment was based on a misunderstanding of Eisen’s post (and maybe only on reading the title) but the very fact that such a misunderstanding was possible should give us pause.
Richard Poynder’s reading later in the same thread was also cautionary: “Elsevier will hope that the push back will get side-tracked by in-fighting … I think it will take comfort if the OA movement starts in-fighting instead of pushing back.”
Folks, let’s not fall for that.
We all know that Stevan Harned, among many others, is committed to Green; and that Mike Eisen, among many others, has huge investment in Gold. We can, and should, have rigorous discussions about the strengths and weaknesses of both approaches. We should expect that OA advocates who share the same goal but have different backgrounds will differ over tactics, and sometimes differ robustly.
But there’s a world of difference between differing robustly and differing rancorously. Let’s all (me included) be sure we stay on the right side of that line. Let’s keep it clear in our minds who the enemy is: not people who want to use a different strategy to free scholarship, but those who want to keep it locked up.
And here ends my uncharacteristic attempt to position myself as The Nice, Reasonable One in this discussion — a role much better suited to Peter Suber or Stephen Curry, but it looks like I got mine written first :-)
May 19, 2015
Somehow this seems to have slipped under the radar: National Science Foundation announces plan for comprehensive public access to research results. They put it up on 18 March, two whole months ago, so our apologies for not having said anything until now!
This is the NSF’s rather belated response to the OSTP memo on Open Access, back in January 2013. This memo required all Federal agencies that spend $100 million in research and development each year to develop OA policies, broadly in line with the existing one of the NIH which gave us PubMed Central. Various agencies have been turning up with policies, but for those of us in palaeo, the NSF’s the big one — I imagine it funds more palaeo research than all the others put together.
So far, so awesome. But what exactly is the new policy? The press release says papers must “be deposited in a public access compliant repository and be available for download, reading and analysis within one year of publication”, but says nothing about what repository should be used. It’s lamentable that a full year’s embargo has been allowed, but at least the publishers’ CHORUS land-grab hasn’t been allowed to hobble the whole thing.
There’s a bit more detail here, but again it’s oddly coy about where the open-access works will be placed: it just says they must be “deposited in a public access compliant repository designated by NSF”. The executive summary of the actual plan also refers only to “a designated repository”
Only in the full 31-page plan itself does the detail emerge. From page 5:
In the initial implementation, NSF has identified the Department of Energy’s PAGES (Public Access Gateway for Energy and Science) system as its designated repository and will require NSF-funded authors to upload a copy of their journal articles or juried conference paper to the DOE PAGES repository in the PDF/A format, an open, non-proprietary standard (ISO 19005-1:2005). Either the final accepted version or the version of record may be submitted. NSF’s award terms already require authors to make available copies of publications to the Cognizant Program Officers as part of the current reporting requirements. As described more fully in Sections 7.8 and 8.2, NSF will extend the current reporting system to enable automated compliance.
Future expansions, described in Section 7.3.1, may provide additional repository services. The capabilities offered by the PAGES system may also be augmented by services offered by third parties.
So what is good and bad about this?
Good. It makes sense to me that they’re re-using an existing system rather than wasting resources and increasing fragmentation by building one of their own.
Bad. It’s a real shame that they mandate the use of PDF, “the hamburger that we want to turn back into a cow”. It’s a terrible format for automated analysis, greatly inferior to the JATS XML format used by PubMed Central. I don’t understand this decision at all.
Then on page 9:
In the initial implementation, NSF has identified the DOE PAGES system to support managing journal articles and juried conference papers. In the future, NSF may add additional partners and repository services in a federated system.
I’m not sure where this points. In an ideal world, it would mean some kind of unifying structure between PAGES and PubMed Central and whatever other repositories the various agencies decide to use.
Anyone else have thoughts?
Over on Google+, Peter Suber comments on this post. With his permission, I reproduce his observations here:
My short take on the policy’s weaknesses:
- will use Dept of Energy PAGES, which at least for DOE is a dark archive pointing to live versions at publisher web sites
- plans to use CHORUS (p. 13) in addition to DOE PAGES
- requires PDF
- silent on open licensing
- only mentions reuse for data (pp. v, 18), not articles, and only says it will explore reuse
- silent on reuse for articles even tho it has a license (p. 10) authorizing reuse
- silent on the timing of deposits
I agree with you that a 12 month embargo is too long. But that’s the White House recommended default. So I blame the White House for this, not NSF.
To be more precise, PAGES favors publisher-controlled OA in one way, and CHORUS does it in another way. Both decisions show the effect of publisher lobbying on the NSF, and its preference for OA editions hosted by publishers, not OA editions hosted by sites independent of publishers.
So all in all, the NSF policy is much less impressive than I’d initially thought and hoped.
In response to my post Copyright from the lens of reality and other rebuttals of his original post, Elseviers General Counsel Mark Seeley has provided a lengthy comment. Here’s my response (also posted as a comment on the original article, but I’m waiting for it to be moderated.)
Hi, Mark, thanks for engaging. You write:
With respect to the societal bargain, I would simply note that, in my view, the framers believed that by providing rights they would encourage creative works, and that this benefits society as a whole.
Here, at least, we are in complete agreement. Where we part company is that in my view the Eldred v. Ashcroft decision (essentially that copyright terms can be increased indefinitely) was a travesty of the original intent of copyright, and clearly intended for the benefit of copyright holders rather than that of society on general. (I further note in passing that those copyright holders are only rarely the creative people, but rights-holding corporations whose creative contribution is negligible.)
[Journal] services and competencies need to be supported through a business model, however, and in the mixed economy that we have at the moment, this means that many journals will continue to need subscription and purchase models.
This is a circular argument. It comes down to “we use restrictive copyright on scholarly works at present, so we therefore need to continue to do so”. In fact, this this is not an argument at all, merely an assertion. If you want it to stick, you need to demonstrate that the present “mixed economy” is a good thing — something that is very far from evident.
The alternatives to a sound business model rooted in copyright are in my view unsustainable. I worry about government funding, patronage from foundations, or funding by selling t-shirts—I am not sure that these are viable, consistent or durable. Governments and foundations can change their priorities, for example.
If governments and foundations decide to stop funding research, we’re all screwed, and retention of copyright on the papers we’re no longer able to research and write will be the least of our problems. The reality is that virtually everyone in research is already dependent on governments and foundations for the 99% of their funding that covers all the work before the final step of publication. Taking the additional step of relying on those same sources for the last 1% of funding is eminently sensible.
On Creative Commons licences, I don’t think we have any material disagreement.
Now we come to the crucial question of copyright terms (already alluded to via Eldred v. Ashcroft above). You content:
Copyright law was most likely an important spur for the author or publisher to produce and distribute the work [that is now in the public domain] in the first place.
In principle, I agree — as of course did the framers of the US Constitution and other lawmakers that have passed copyright laws. But as you will well know, the US’s original copyright act of 1790, which stated its purpose as “encouragement of learning”, offered a term of 14 years, with an optional renewal of a further 14 years if the author was still alive at the end of the initial term. This 14-year was considered quite sufficient to incentivise the creation of new works. The intent of the present law seems to be that authors who have been dead for 70 years still need to receive royalties for their works, and in the absence of such royalties would not have created in the first place. This is self-evident nonsense. No author in the history of the world every said “I would have written a novel if I’d continued to receive royalties until 70 years after my death, but since royalties will only last 28 years I’m not going to bother”.
But — and this can’t be stated strongly enough — even if there were some justification for the present ridiculous copyright terms in the area of creative works, it would still say nothing whatsoever about the need to copyright scientific writing. No scientific researcher ever wrote a paper who would not have written it in the absence of copyright. That’s what we’re talking about here. One of the tragedies of copyright is that it’s been extruded from a domain where it has some legitimate purpose into a domain where it has none.
The Budapest Open Access Initiative said it best and most clearly: “the only role for copyright in this domain [scholarly research] should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited“. (And several of the BOAI signatories have expressed regret over even the controlling-integrity-of-the-work part of this.)
See also David Roberts’ response to Seeley’s posting.