June 30, 2015
You know what’s wrong with scholarly publishing?
Wait, scrub that question. We’ll be here all day. Let me jump straight to the chase and tell you the specific problem with scholarly publishing that I’m thinking of.
There’s nowhere to go to find all open-access papers, to download their metadata, to access it via an open API, to find out what’s new, to act as a platform for the development of new tools. Yes, there’s PubMed Central, but that’s only for work funded by the NIH. Yes, there’s Google Scholar, but that has no API, and at any moment could go the way of Google Wave and Google Reader when Google loses interest.
Instead, we have something like 4000 repositories out there, balkanised by institution, by geographical region, and by subject area. They have different UIs, different underlying data models, different APIs (if any). They’re built on different software platforms. It’s a jungle out there!
As researchers, we don’t need 4000 repos. You know what we need? One Repo.
Hey! That would be a good name for a project!
I’ve mentioned before how awesome and pro-open my employers, Index Data, are. (For those who are not regular readers, I’m a palaeontologist only in my spare time. By day, I’m a software engineer.) Now we’re working on an index of green/gold OA publishing. Metadata of every article across every repository and publisher. We want it to be complete, in the sense that we will be going aggressively for the long tail as opposed to focusing on some region or speciality, or things that are easily harvestable by OAI-PMH or other standards. We want it to be of a high, consistent quality in terms of metadata. We want it to be up to date. And most importantly, we want it to be fully open for all and any kind of re-use, by any other actor. This will include downloadable data files, OAI-PMH access, search-retrieve web services, embeddable widgets and more. We also envisage a Linked Data representation with a CRUD interface that allows third parties to contribute supplemental information, entity reconciliation, tagging, etc.
Instead of 4000 fragments, one big, meaty chunk of data.
Because we at Index Data have spent the last ten years helping aggregators and publishers and others getting access to difficult-to-access information through all kinds of crazy mechanisms, we have a unique combination of the skills, the tools, and the desire to pursue this venture.
So The One Repo is born. At the noment, we have:
- Harvesting set up for an initial set of 20 repositories.
- A demonstrator of one possible UI.
- A whitepaper describing the motivation and some of the technical aspects.
- A blog about the project’s progress.
- An advisory board of some of the brightest, most experienced and wisest people in the world of open access.
We’ve been flying under the radar for the last month and a bit. Now we’re ready for the world to know what we’re up to.
The One Repo is go!
May 29, 2015
[I am using the term “megajournal” here to mean “journal that practices PLOS ONE-style peer-review for correctness only, ignoring guesses at possible impact”. It’s not a great term for this class of journals, but it seems to be becoming established as the default.]
Bo-Christer Björk’s (2015) new paper in PeerJ asks the question “Have the “mega-journals” reached the limits to growth?”, and suggests that the answer may be yes. (Although, frustratingly, you can’t tell from the abstract that this is the conclusion.)
I was a bit disappointed that the paper didn’t include a graph showing its conclusion, and asked about this (thanks to PeerJ’s lightweight commenting system). Björk’s response acknowledged that a graph would have been helpful, and invited me to go ahead and make one, since the underlying data is freely available. So using OpenOffice’s cumbersome but adequate graphing facilities, I plotted the numbers from Björk’s table 3.
As we can see, the result for total megajournal publications upholds the conclusion that megajournals have peaked and started to decline. But PLOS ONE (the dark blue line) enormously dominates all the other megajournals, with Nature’s Scientific Reports the only other publication to even be meaningfully visible on the graph. Since Scientific Reports seems to be still in the exponential phase of its growth and everything else is too low-volume to register, what we’re really seeing here is just a decline in PLOS ONE volume.
It’s interesting to think about what the fall-off in PLOS ONE volume means, but it’s certainly not the same thing as megajournals having topped out.
What do we see when we expand the lower part of the graph by taking out PLOS ONE and Scientific Reports?
Here, the picture is more confused. The numbers are dominated by BMJ Open, which is still growing, but its growth has levelled off. Springer Plus grew quickly, but seems to be falling away — perhaps reflecting an initial push, followed by author apathy for a megajournal run by a legacy publisher. AIP Advances (which I admit I’d not heard of) and SAGE Open both seem to have modest but healthy year-on-year growth. And of course PeerJ is growing fast, but it’s too young for us to have a meaningful sense of the trend.
What does it all mean?
The STM Report for 2015 (Ware and Mabe 2015) estimates that 2.5 million scholarly articles were published in English-language journals in 2014 (page 6). Björk’s data tells us that only 38 thousand of those were in megajournals — that’s less than 1/65th of all the articles. I find it very hard to believe that 1.5% of the total scholarly article market represents saturation for megajournals.
I suspect that what this study really shows us — and I’m sure the PLOS people would be the first to agree with this — is that we need a lot more megajournals out there than just PLOS ONE. Specifically:
- It’s well established that pure-OA journals offer better value for their APCs than hybrid ones.
- It’s at least strongly suspected (has there been a study?) that OA megajournals offer better value than selective OA journals.
- We want to get the APCs of OA megajournals down.
- PLOS ONE needs competition on price, to force down its increasingly unjustifiable APC of $1350.
- It’s a real shame that the eLIFE people have fallen into the impact-chasing trap and show no interest in running an eLIFE megajournals.
- I think the usually reliable Zen Faulks is dead wrong when he writes off what he calls “Zune journals“.
So the establishment of new megajournals is very much a good thing, and their growth is to be encouraged. Many of the newer megajournals may well find (and I hate to admit this) that their submission rates increase when they’re handed their first impact factor, as happened with PLOS ONE.
- Björk Bo-Christer. 2015. Have the “mega-journals” reached the limits to growth? PeerJ 3:e981. doi:10.7717/peerj.981
- Ware, Mark, and Michael Mabe. 2015. The STM Report: an overview of scientific and scholarly journal publishing. Fourth Edition, March 2015. International Association of Scientific, Technical and Medical Publishers, The Hague, Netherlands. 180 pages.
May 26, 2015
Provoked by Mike Eisen’s post today, The inevitable failure of parasitic green open access, I want to briefly lay out the possible futures of scholarly publishing as I see them. There are two: one based on what we now think of as Gold OA, and one on what we think of as Green OA.
Eisen is of course quite right that the legacy publishers only ever gave their blessing to Green OA (self-archiving) so long as they didn’t see it as a threat, so the end of that blessing isn’t a surprise. (I think it’s this observation that Richard Poynder misread as “an OA advocate legitimising Elsevier’s action”!) It was inevitable that this blessing would be withdrawn as Green started to become more mainstream — and that’s exactly what we’ve seen, with Elsevier responding to the global growth in Green OA mandates with a regressive new policy that has rightly attracted the ire of everyone who’s looked closely at it.
So I agree with him that what he terms “parasitic Green OA” — self-archiving alongside the established journal system — is ultimately doomed. The bottom line is that while we as a community continue to give control of our work to the legacy publishers — follow closely here — legacy publishers will control our work. We know that these corporations’ interests are directly opposed to those of authors, science, customers, libraries, and indeed everyone but themselves. So leaving them in control of the scholarly record is unacceptable.
What are our possible futures?
We may find that in ten years’ time, all subscriptions journals are gone (perhaps except from a handful of boutique journals that a few people like, just as a few people prefer the sound of vinyl over CDs or MP3s).
We may find that essentially all new scholarship is published in open-access journals such as those of BioMed Central, PLOS, Frontiers and PeerJ. That is a financially sustainable path, in that publishers will be paid for the services they provide through APCs. (No doubt, volunteer-run and subsidised zero-APC journals will continue to thrive alongside them, as they do today.)
We may even find that some of the Gold OA journals of the future are run by organisations that are presently barrier-based publishers. I don’t think it’s impossible that some rump of Elsevier, Springer et al. will survive the coming subscription-journals crash, and go on to compete on the level playing-field of Gold OA publishing. (I think they will struggle to compete, and certainly won’t be able to make anything like the kind of money they do now, but that’s OK.)
This is the Gold-OA future that Mike Eisen is pinning his hopes on — and which he has done as much as anyone alive to bring into existence. I would be very happy with that outcome.
While I agree with Eisen that what he terms “parasitic Green” can’t last — legacy publishers will stamp down on it as soon as it starts to be truly useful — I do think there is a possible Green-based future. It just doesn’t involve traditional journals.
One of the striking things about the Royal Society’s recent Future of Scholarly Scientific Communication meetings was that during the day-two breakout session, so many of the groups independently came up with more or less the same proposal. The system that Dorothy Bishop expounded in the Guardian after the second meeting is also pretty similar — and since she wasn’t at the first meeting, I have to conclude that she also came up with it independently, further corroborating the sense that it’s an approach whose time has come.
(In fact, I started drafting an SV-POW! myself at that meeting describing the system that our break-out group came up with. But that was before all the other groups revealed their proposals, and it became apparent that ours was part of a blizzard, rather than a unique snowflake.)
Here are the features characterising the various systems that people came up with. (Not all of these features were in all versions of the system, but they all cropped up more than once.)
- It’s based around a preprint archive: as with arXiv, authors can publish manuscripts there after only basic editorial checks: is this a legitimate attempt at scholarship, rather than spam or a political opinion?
- Authors solicit reviews, as we did for for Barosaurus preprint, and interested others can offer unsolicited reviews.
- Reviewers assign numeric scores to manuscripts as well as giving opinions in prose.
- The weight given to review scores is affected by the reputation of reviewers.
- The reputation of reviewers is affected by other users’ judgements about their comments, and also by their reputation as authors.
- A stable user reputation emerges using a pagerank-like feedback algorithm.
- Users can acquire reputation by authoring, reviewing or both.
- Manuscripts have a reputation based on their score.
- There is no single moment of certification, when a manuscript is awarded a “this is now peer-reviewed” bit.
I think it’s very possible that, instead of the all-Gold future outlined above, we’ll land up with something like this. Not every detail will work out the way I suggested here, of course, but we may well get something along these lines, where the emphasis is on very rapid initial publication and continuously acquired reputation, and not on a mythical and misleading “this paper is peer-reviewed” stamp.
(There are a hundred questions to be asked and answered about such systems: do we want one big system, or a network of many? If the latter, how will they share reputation data? How will the page-rank-like reputation algorithm work? Will it need to be different in different fields of scholarship? I don’t want to get sidetracked by such issues at this point, but I do want to acknowledge that they exist.)
Is this “Green open access”? It’s not what we usually mean by the term; but in as much as it’s about scholars depositing their own work in archives, yes, it’s Green OA in a broader sense.
(I think some confusion arises because we’ve got into the habit of calling deposited manuscripts “preprints”. That’s a misnomer on two counts: they’re not printed, and they needn’t be pre-anything. Manuscripts in arXiv may go onto be published in journals, but that’s not necessary for them to be useful in advancing scholarship.)
So where now? We have two possible open-access futures, one based on open-access publishing and one based on open-access self-archiving. For myself, I would be perfectly happy with either of these futures — I’m not particularly clear in my own mind which is best, but they’re both enormously better than what we have today.
A case can be made that the Green-based future is maybe a better place to arrive, but that the Gold-based future makes for an easier transition. It doesn’t require researchers to do anything fundamentally different from what they do today, only to do it in open-access journals; whereas the workflow in the Green-based approach outlined above would be a more radical departure. (Ironically, this is the opposite of what has often been said in the past: that the advantage of Green is that it offers a more painless upgrade path for researchers not sold on the importance of OA. That’s only true so long as Green is, in Eisen’s terms, “parasitic” — that is, so long as the repositories contain only second-class versions of papers that have been published conventionally behind paywalls.)
In my own open-access advocacy, then, I’m always unsure whether to push Gold or Green. In my Richard Poynder interview, when asked “What should be the respective roles of Green and Gold OA?” I replied:
This actually isn’t an issue that I get very excited about: Open is so much more important than Green or Gold. I suppose I slightly prefer Gold in that it’s better to have one single definitive version of each article; but then we could do that with Green as well if only we’d stop thinking of it as a stopgap solution while the “real” article remains behind paywalls.
Two and a half years on, I pretty much stand by that (and also by the caveats regarding the RCUK policy’s handing of Gold and Green that followed this quote in the interview.)
But I’m increasingly persuaded that the variety of Green OA that we only get by the grace and favour of the legacy publishers is not a viable long-term strategy. Elsevier’s new regressive policy was always going to come along eventually, and it won’t be the last shot fired in this war. If Green is going to win the world, it will be by pulling away from conventional journals and establishing itself as a valid mode of publication in its own right. (Again, much as arXiv has done.)
Here’s my concern, though. Paul Royser’s response to Eisen’s post was “Distressing to see the tone and rancor of OA advocates in disagreement. My IR is a “parasite”? Really?” Now, I think that comment was based on a misunderstanding of Eisen’s post (and maybe only on reading the title) but the very fact that such a misunderstanding was possible should give us pause.
Richard Poynder’s reading later in the same thread was also cautionary: “Elsevier will hope that the push back will get side-tracked by in-fighting … I think it will take comfort if the OA movement starts in-fighting instead of pushing back.”
Folks, let’s not fall for that.
We all know that Stevan Harned, among many others, is committed to Green; and that Mike Eisen, among many others, has huge investment in Gold. We can, and should, have rigorous discussions about the strengths and weaknesses of both approaches. We should expect that OA advocates who share the same goal but have different backgrounds will differ over tactics, and sometimes differ robustly.
But there’s a world of difference between differing robustly and differing rancorously. Let’s all (me included) be sure we stay on the right side of that line. Let’s keep it clear in our minds who the enemy is: not people who want to use a different strategy to free scholarship, but those who want to keep it locked up.
And here ends my uncharacteristic attempt to position myself as The Nice, Reasonable One in this discussion — a role much better suited to Peter Suber or Stephen Curry, but it looks like I got mine written first :-)
May 19, 2015
Somehow this seems to have slipped under the radar: National Science Foundation announces plan for comprehensive public access to research results. They put it up on 18 March, two whole months ago, so our apologies for not having said anything until now!
This is the NSF’s rather belated response to the OSTP memo on Open Access, back in January 2013. This memo required all Federal agencies that spend $100 million in research and development each year to develop OA policies, broadly in line with the existing one of the NIH which gave us PubMed Central. Various agencies have been turning up with policies, but for those of us in palaeo, the NSF’s the big one — I imagine it funds more palaeo research than all the others put together.
So far, so awesome. But what exactly is the new policy? The press release says papers must “be deposited in a public access compliant repository and be available for download, reading and analysis within one year of publication”, but says nothing about what repository should be used. It’s lamentable that a full year’s embargo has been allowed, but at least the publishers’ CHORUS land-grab hasn’t been allowed to hobble the whole thing.
There’s a bit more detail here, but again it’s oddly coy about where the open-access works will be placed: it just says they must be “deposited in a public access compliant repository designated by NSF”. The executive summary of the actual plan also refers only to “a designated repository”
Only in the full 31-page plan itself does the detail emerge. From page 5:
In the initial implementation, NSF has identified the Department of Energy’s PAGES (Public Access Gateway for Energy and Science) system as its designated repository and will require NSF-funded authors to upload a copy of their journal articles or juried conference paper to the DOE PAGES repository in the PDF/A format, an open, non-proprietary standard (ISO 19005-1:2005). Either the final accepted version or the version of record may be submitted. NSF’s award terms already require authors to make available copies of publications to the Cognizant Program Officers as part of the current reporting requirements. As described more fully in Sections 7.8 and 8.2, NSF will extend the current reporting system to enable automated compliance.
Future expansions, described in Section 7.3.1, may provide additional repository services. The capabilities offered by the PAGES system may also be augmented by services offered by third parties.
So what is good and bad about this?
Good. It makes sense to me that they’re re-using an existing system rather than wasting resources and increasing fragmentation by building one of their own.
Bad. It’s a real shame that they mandate the use of PDF, “the hamburger that we want to turn back into a cow”. It’s a terrible format for automated analysis, greatly inferior to the JATS XML format used by PubMed Central. I don’t understand this decision at all.
Then on page 9:
In the initial implementation, NSF has identified the DOE PAGES system to support managing journal articles and juried conference papers. In the future, NSF may add additional partners and repository services in a federated system.
I’m not sure where this points. In an ideal world, it would mean some kind of unifying structure between PAGES and PubMed Central and whatever other repositories the various agencies decide to use.
Anyone else have thoughts?
Over on Google+, Peter Suber comments on this post. With his permission, I reproduce his observations here:
My short take on the policy’s weaknesses:
- will use Dept of Energy PAGES, which at least for DOE is a dark archive pointing to live versions at publisher web sites
- plans to use CHORUS (p. 13) in addition to DOE PAGES
- requires PDF
- silent on open licensing
- only mentions reuse for data (pp. v, 18), not articles, and only says it will explore reuse
- silent on reuse for articles even tho it has a license (p. 10) authorizing reuse
- silent on the timing of deposits
I agree with you that a 12 month embargo is too long. But that’s the White House recommended default. So I blame the White House for this, not NSF.
To be more precise, PAGES favors publisher-controlled OA in one way, and CHORUS does it in another way. Both decisions show the effect of publisher lobbying on the NSF, and its preference for OA editions hosted by publishers, not OA editions hosted by sites independent of publishers.
So all in all, the NSF policy is much less impressive than I’d initially thought and hoped.
May 18, 2015
Matt drew my attention to an old paper I’d not seen before: Riggs (1903) on the vertebral column of Brontosaurus. The page I linked there shows only the first page (which in fact is half a page, since Riggs’ work is only in the right column).
Why only the first page? As Matt put it, “It’s been 110 years, just give us the PDF already. And they wonder (do they wonder?) why people don’t rush to embrace their stumbling broken halting limping steps toward OA.”
That’s exactly right. AAAS allows anyone to read the old Science papers anyway (good for them, as far as it goes), so why all the poxing about with registration? Just make it actual open access, as if you were good guys.
So, two observations, as promised.
First, here’s Matt’s observation: even making users register betrays a way of thinking wrongly about the material. It says, “This is ours but you can see it if you’ll jump through our hoops. Because it is ours.” Whereas real OA outlets say, “Hey, this is yours now, do what you want.”
And here’s mine: I sometimes wonder whether we’re headed for a world where the meaningful scientific literature is going to be from 1660-1923 and from 2010 onwards, with a big gap from 1924 to 2009 that just gets ignored. Because it’s the literature not old enough to be out of copyright but not new enough to be OA.
April 28, 2015
[This is a guest-post by Richard Poynder, a long-time observer and analyst of academic publishing now perhaps best known for the very detailed posts on his Open and Shut blog. It was originally part of a much longer post on that blog, the introduction to an interview with the publisher MDPI. I’m pleased to reproduce it here with Richard’s kind permission — Mike.]
In light of the current lack of information available to enable us to adequately judge the activities of scholarly publishers, or to evaluate the rigour of the publication process that research papers undergo, should not both scholarly publishers and the research community be committing themselves to much greater transparency than we see today?
For instance, should not open peer review now be the norm? Should not the reviews and the names of reviewers be routinely published alongside papers? Should not the eligibility criteria and application procedures for obtaining APC waivers be routinely published on a journal’s web site, along with regularly updated data on how many waivers are being granted? Should not publishers be willing to declare the nature and extent of the unsolicited email campaigns they engage in in order to recruit submissions?
Should not the full details of “big deals” and hybrid OA “offsetting agreements” be made publicly available? Should not publishers be more transparent about why they charge what they charge for APCs? Should not publishers be more transparent about their revenues and profits? For instance, should not privately owned publishers make their accounts available online (even where there is no legal obligation to do so), and should not public companies provide more detailed information about the money they earn from publicly-funded research and exactly how it was earned? And should not publishers whose revenue comes primarily from the public purse be entirely open about who owns the company, and where it is based?
Should not the research community refuse to deal with publishers unwilling to do all the above? Did not US Justice Louis D. Brandeis have a point when he said, “Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.”
Copied from an email exchange.
Did we know about the Royal Society’s PLOS ONE-clone?
I am in favour of this. I might well send them my next paper while the universal waiver is still in place.
Did not know about it. Their post-waiver APC is insane. How can they possibly justify $1600?
Well, I am obviously not a big fan of a $1600 APC; but it’s not a great deal more than PLOS ONE, and much less than PLOS Biology/Medicine.
But I think we’re converging on the idea that you can make a living running journals that charge $500 — see Ubiquity Press at http://www.ubiquitypress.com/site/publish/ – so I think anyone charging more than that has to explain why. In the case of the Royal Society, I assume it’s to fund their other activities; I am assured that I could get a waiver anyway, since I lack funding.
But are you saying you definitely won’t publish there even during the $0 phase?
Matt (with Mike’s previous post quoted):
Well, I am obviously not a big fan of a $1600 APC; but it’s not a great deal more than PLOS ONE
and much less than PLOS Biology/Medicine.
But I think we’re converging on the idea that you can make a living running journals that charge $500 – see Ubiquity Press at http://www.ubiquitypress.com/site/publish/ – so I think anyone charging more than that has to explain why. In the case of the Royal Society, I assume it’s to fund their other activities;
I am assured that I could get a waiver anyway, since I lack funding.
But are you saying you definitely won’t publish there even during the $0 phase?
Yes, APCs should be pushing downwards all the time now. I agree that the Royal Society coming in at a level above PLOS ONE doesn’t look good — indeed PLOS ONE’s own $1350 is also looking increasingly unfashionable in the light of (A) Ubiquity providing essentially the same service for 37% of the price, and (B) the fact that PLOS now runs at an operating surplus of 27%. To my mind, it’s well past time that PLOS ONE found a way to wind its APC down — really, down into triple figures ($999 would do), though even a nominal reduction of say $50 would send a good message.
You’re absolutely right that Royal Society Open Science is, by design, a PLOS ONE rather than a PLOS Biology: it reviews on correctness alone, not on guesswork about likely impact. So, yes, it’s PLOS ONE’s price-point that’s the correct comparison here.
Where you’re mistaken, though, is in assuming that the Royal Society has shareholders who might be skimming off the cream from the APC. There are none: the Society has nothing else to spend publishing profits on but furthering its scientific mission. (Of course, it doesn’t follow from this that is ought to be seeking to make a profit from publishing at all. It has other sources of income, and presently only 8% of its income is from publishing profits.)
But I hear you on the message sent by acquiescing to a $1600-APC journal, even if that APC is waived. We both want to shift towards a world where there are no journals that charge that kind of money — or at least, that if they do, it’s because they’re the kind of “selective” journal that thinks there’s something praiseworthy about rejecting most scientifically sound submissions. Journals of that kind don’t concern me one way or another, because I just don’t play that game.