Open-access journalist Richard Poynder posted a really good interview today with the Gates Foundation’s Associate Officer of Knowledge & Research Services, Ashley Farley. I feel bad about picking on one fragment of it, but I really can’t let this bit pass:

RP: As you said, Gates-funded research publications must now have a CC BY licence attached. They must also be made OA immediately. Does this imply that the Gates foundation sees no role for green OA? If it does see a role for green OA what is that role?

AF: I wouldn’t say that the foundation doesn’t see value or a role for green open access. However, the policy requires immediate access, reuse and copyright arrangements that green open access does not necessarily provide.

Before I get into this, let me say again that I have enormous admiration for what Ashley Farley and the Gates Foundation are doing for open access, and for open scholarship more widely. But:

The (excellent) Gates policy requires immediate access, reuse and copyright arrangements that gold open access does not necessarily provide, either. It provides them only because the Gates Foundation has quite rightly twisted publishers’ arms, and said you can only have our APCs if you meet our requirements.

And if green open access doesn’t provide immediate access and reuse, then that is because funders have not twisted publishers’ arms to allow this.

It’s perfectly possible to have a Green OA repository in which all the deposited papers are available immediately and licenced using CC By. It’s perfectly possible for a funder, university or other body to have a green OA policy that mandates this.

But it’s true that no-one seems to have a green OA policy that does this.

Why not?


In a recent blog-post, Kevin Smith tells it like it is: legacy publishers are tightening their grip in an attempt to control scholarly communications. “The same five or six major publishers who dominate the market for scholarly journals are engaged in a race to capture the terms of and platforms for scholarly sharing”, says Smith. “This is a serious threat to academic freedom.”

A fisted hand tightly gripping US Currency.People can legitimately have different ideas about precisely what it is that Elsevier intends to do with SSRN, now that it’s acquired it. But as we discuss the possible outcomes, we need to keep one principle in mind: it’s simply unrealistic to imagine that Elsevier, in controlling Mendeley and SSRN, will do anything other than what is best for Elsevier.

That’s not a criticism, or even a complaint. It’s a statement of what a for-profit corporation does. It’s in it’s nature. There’s no need for us to blame Elsevier for this, any more than we blame a fox when it eats a chicken. That’s what it does.

The appropriate response is simply to prevent any more of this kind of thing happening, by taking control of our own scholarly infrastructure.

The big problem with SSRN is the same as the big problem of Mendeley: being privately owned and for-profit, they owners were always going to be susceptible to a good enough offer. People starting private companies are looking to make money from them, and a corporation that comes along with a big offer is a difficult exit strategy to resist. When we entrusted preprints to SSRN, they were always vulnerable to being taken hostage, in a way that arXiv preprints are not.

Again: I am not blaming private companies’ owners for this. It’s in the nature of what a private company is. I recognise that and accept it. The thing is, I interpret it as damage and want to route around it.

So what is the solution?


It’s simple. We, the community, need to own our own infrastructure.

One one level, this is easy. We, the community, know how to do it. We have experience of good and bad infrastructure, we know the difference. We have excellent, clearly articulated principles for open scholarly infrastructure. We have top quality software engineers, interaction designers, UI experts and more.

What we don’t have is funding. And that is crippling.

We can’t build and maintain community-owned infrastructure without funding; and (to a first approximation anyway) no-one is funding it. It’s truly disgraceful that even such a crucial piece of infrastructure of arXiv is constantly struggling for funding. arXiv serves about a million articles per week, and is the primary source of publications in many scientific subfields, yet every year it struggles to bring in the less then a million dollars it costs to run. It’s ridiculous the the Gates Foundation or someone hasn’t come along with a a few tens of millions of dollars and set up a long-term endowment to make arXiv secure.

And when even something as proven as arXiv struggles for funding, what chance does anything else have?

The problem seems to be this: funders have a blind spot when it comes to funding infrastructure. That’s why we have no UK national repository; it’s why there is no longer an independent subject repository for social sciences; it’s why the two main preprint archives for bio-medicine (PeerJ Preprints and BioRxiv) are privately owned, and potentially vulnerable to the offer-you-can’t-refuse from Elsevier or one of the other legacy publishers in the oligopoly(*).


When you think about funders — RCUK, Wellcome, NIH, Gates, all of them — they are great at funding research; and terrible at funding the infrastructure that allows it to have actual benefit. Most funders even seem to have specific policies that they won’t fund infrastructure; those that don’t, simply lack a way to apply for infrastructure funding. It’s a horribly short-sighted approach, and we’re seeing its inevitable fruit in Elsevier’s accumulation of infrastructure.

We’ll look back at funding bodies in 10 or 20 years and say their single biggest mistake was failing to see the need to fund infrastructure.

Please, funders. Fix this. Make whatever changes you need to make, to ensure the the scholarly community owns and controls its own preprint archives, subject repositories, aggregators, text-mining tools, citation graphs, metrics tools and what have you. We’ve already seen what happens when we cede control of the scholarly record to corporations: spiralling prices, poor quality product, arbitrary barriers, and the retardation of all progress. Let’s not make the same mistake again with infrastructure.



(*) Actually, I don’t believe PeerJ’s owners would sell their preprint server to Elsevier for any amount of money — and the same may be true of the BioRxiv for all I know, I’ve never spoken with the owners. But who can tell what might happen?

A quick note to say that I got an email today — the University of Bristol Staff Bulletin — announcing some extremely welcome news:


(Admittedly it was only the third item on the bulletin, coming in just after “Staff Parking – application deadline Friday 18 September”, but you can’t have everything.)

This is excellent, and the nitty-gritty details are encouraging, too. Although HEFCE recently wound back its own policy, as a transition-period concession, to requiring deposit only at the time of publication, Bristol has quite properly gone with the more rigorous requirement that accepted manuscripts be deposited at the time of acceptance. This is wise for the university — it’s future-proofed against HEFCE’s eventual move back towards the deposit-on-acceptance policy that it wanted — and it’s good for the wider world, too.



You know what’s wrong with scholarly publishing?

Wait, scrub that question. We’ll be here all day. Let me jump straight to the chase and tell you the specific problem with scholarly publishing that I’m thinking of.

There’s nowhere to go to find all open-access papers, to download their metadata, to access it via an open API, to find out what’s new, to act as a platform for the development of new tools. Yes, there’s PubMed Central, but that’s only for work funded by the NIH. Yes, there’s Google Scholar, but that has no API, and at any moment could go the way of Google Wave and Google Reader when Google loses interest.

Instead, we have something like 4000 repositories out there, balkanised by institution, by geographical region, and by subject area. They have different UIs, different underlying data models, different APIs (if any). They’re built on different software platforms. It’s a jungle out there!


As researchers, we don’t need 4000 repos. You know what we need? One Repo.

Hey! That would be a good name for a project!

I’ve mentioned before how awesome and pro-open my employers, Index Data, are. (For those who are not regular readers, I’m a palaeontologist only in my spare time. By day, I’m a software engineer.) Now we’re working on an index of green/gold OA publishing. Metadata of every article across every repository and publisher. We want it to be complete, in the sense that we will be going aggressively for the long tail as opposed to focusing on some region or speciality, or things that are easily harvestable by OAI-PMH or other standards. We want it to be of a high, consistent quality in terms of metadata. We want it to be up to date. And most importantly, we want it to be fully open for all and any kind of re-use, by any other actor. This will include downloadable data files, OAI-PMH access, search-retrieve web services, embeddable widgets and more. We also envisage a Linked Data representation with a CRUD interface that allows third parties to contribute supplemental information, entity reconciliation, tagging, etc.

Instead of 4000 fragments, one big, meaty chunk of data.


Because we at Index Data have spent the last ten years helping aggregators and publishers and others getting access to difficult-to-access information through all kinds of crazy mechanisms, we have a unique combination of the skills, the tools, and the desire to pursue this venture.

So The One Repo is born. At the noment, we have:

  • Harvesting set up for an initial set of 20 repositories.
  • A demonstrator of one possible UI.
  • A whitepaper describing the motivation and some of the technical aspects.
  • A blog about the project’s progress.
  • An advisory board of some of the brightest, most experienced and wisest people in the world of open access.

We’ve been flying under the radar for the last month and a bit. Now we’re ready for the world to know what we’re up to.

The One Repo is go!

Provoked by Mike Eisen’s post today, The inevitable failure of parasitic green open access, I want to briefly lay out the possible futures of scholarly publishing as I see them. There are two: one based on what we now think of as Gold OA, and one on what we think of as Green OA.

Eisen is of course quite right that the legacy publishers only ever gave their blessing to Green OA (self-archiving) so long as they didn’t see it as a threat, so the end of that blessing isn’t a surprise. (I think it’s this observation that Richard Poynder misread as “an OA advocate legitimising Elsevier’s action”!) It was inevitable that this blessing would be withdrawn as Green started to become more mainstream — and that’s exactly what we’ve seen, with Elsevier responding to the global growth in Green OA mandates with a regressive new policy that has rightly attracted the ire of everyone who’s looked closely at it.

So I agree with him that what he terms “parasitic Green OA” — self-archiving alongside the established journal system — is ultimately doomed. The bottom line is that while we as a community continue to give control of our work to the legacy publishers — follow closely here — legacy publishers will control our work. We know that these corporations’ interests are directly opposed to those of authors, science, customers, libraries, and indeed everyone but themselves. So leaving them in control of the scholarly record is unacceptable.

What are our possible futures?

Gold bars

We may find that in ten years’ time, all subscriptions journals are gone (perhaps except from a handful of boutique journals that a few people like, just as a few people prefer the sound of vinyl over CDs or MP3s).

We may find that essentially all new scholarship is published in open-access journals such as those of BioMed Central, PLOS, Frontiers and PeerJ. That is a financially sustainable path, in that publishers will be paid for the services they provide through APCs. (No doubt, volunteer-run and subsidised zero-APC journals will continue to thrive alongside them, as they do today.)

We may even find that some of the Gold OA journals of the future are run by organisations that are presently barrier-based publishers. I don’t think it’s impossible that some rump of Elsevier, Springer et al. will survive the coming subscription-journals crash, and go on to compete on the level playing-field of Gold OA publishing. (I think they will struggle to compete, and certainly won’t be able to make anything like the kind of money they do now, but that’s OK.)

This is the Gold-OA future that Mike Eisen is pinning his hopes on — and which he has done as much as anyone alive to bring into existence. I would be very happy with that outcome.


While I agree with Eisen that what he terms “parasitic Green” can’t last — legacy publishers will stamp down on it as soon as it starts to be truly useful — I do think there is a possible Green-based future. It just doesn’t involve traditional journals.

One of the striking things about the Royal Society’s recent Future of Scholarly Scientific Communication meetings was that during the day-two breakout session, so many of the groups independently came up with more or less the same proposal. The system that Dorothy Bishop expounded in the Guardian after the second meeting is also pretty similar — and since she wasn’t at the first meeting, I have to conclude that she also came up with it independently, further corroborating the sense that it’s an approach whose time has come.

(In fact, I started drafting an SV-POW! myself at that meeting describing the system that our break-out group came up with. But that was before all the other groups revealed their proposals, and it became apparent that ours was part of a blizzard, rather than a unique snowflake.)

Here are the features characterising the various systems that people came up with. (Not all of these features were in all versions of the system, but they all cropped up more than once.)

  • It’s based around a preprint archive: as with arXiv, authors can publish manuscripts there after only basic editorial checks: is this a legitimate attempt at scholarship, rather than spam or a political opinion?
  • Authors solicit reviews, as we did for for Barosaurus preprint, and interested others can offer unsolicited reviews.
  • Reviewers assign numeric scores to manuscripts as well as giving opinions in prose.
  • The weight given to review scores is affected by the reputation of reviewers.
  • The reputation of reviewers is affected by other users’ judgements about their comments, and also by their reputation as authors.
  • A stable user reputation emerges using a pagerank-like feedback algorithm.
  • Users can acquire reputation by authoring, reviewing or both.
  • Manuscripts have a reputation based on their score.
  • There is no single moment of certification, when a manuscript is awarded a “this is now peer-reviewed” bit.

I think it’s very possible that, instead of the all-Gold future outlined above, we’ll land up with something like this. Not every detail will work out the way I suggested here, of course, but we may well get something along these lines, where the emphasis is on very rapid initial publication and continuously acquired reputation, and not on a mythical and misleading “this paper is peer-reviewed” stamp.

(There are a hundred questions to be asked and answered about such systems: do we want one big system, or a network of many? If the latter, how will they share reputation data? How will the page-rank-like reputation algorithm work? Will it need to be different in different fields of scholarship? I don’t want to get sidetracked by such issues at this point, but I do want to acknowledge that they exist.)

Is this “Green open access”? It’s not what we usually mean by the term; but in as much as it’s about scholars depositing their own work in archives, yes, it’s Green OA in a broader sense.

(I think some confusion arises because we’ve got into the habit of calling deposited manuscripts “preprints”. That’s a misnomer on two counts: they’re not printed, and they needn’t be pre-anything. Manuscripts in arXiv may go onto be published in journals, but that’s not necessary for them to be useful in advancing scholarship.)


So where now? We have two possible open-access futures, one based on open-access publishing and one based on open-access self-archiving. For myself, I would be perfectly happy with either of these futures — I’m not particularly clear in my own mind which is best, but they’re both enormously better than what we have today.

A case can be made that the Green-based future is maybe a better place to arrive, but that the Gold-based future makes for an easier transition. It doesn’t require researchers to do anything fundamentally different from what they do today, only to do it in open-access journals; whereas the workflow in the Green-based approach outlined above would be a more radical departure. (Ironically, this is the opposite of what has often been said in the past: that the advantage of Green is that it offers a more painless upgrade path for researchers not sold on the importance of OA. That’s only true so long as Green is, in Eisen’s terms, “parasitic” — that is, so long as the repositories contain only second-class versions of papers that have been published conventionally behind paywalls.)

In my own open-access advocacy, then, I’m always unsure whether to push Gold or Green. In my Richard Poynder interview, when asked “What should be the respective roles of Green and Gold OA?” I replied:

This actually isn’t an issue that I get very excited about: Open is so much more important than Green or Gold. I suppose I slightly prefer Gold in that it’s better to have one single definitive version of each article; but then we could do that with Green as well if only we’d stop thinking of it as a stopgap solution while the “real” article remains behind paywalls.

Two and a half years on, I pretty much stand by that (and also by the caveats regarding the RCUK policy’s handing of Gold and Green that followed this quote in the interview.)

But I’m increasingly persuaded that the variety of Green OA that we only get by the grace and favour of the legacy publishers is not a viable long-term strategy. Elsevier’s new regressive policy was always going to come along eventually, and it won’t be the last shot fired in this war. If Green is going to win the world, it will be by pulling away from conventional journals and establishing itself as a valid mode of publication in its own right. (Again, much as arXiv has done.)


Here’s my concern, though. Paul Royser’s response to Eisen’s post was “Distressing to see the tone and rancor of OA advocates in disagreement. My IR is a “parasite”? Really?” Now, I think that comment was based on a misunderstanding of Eisen’s post (and maybe only on reading the title) but the very fact that such a misunderstanding was possible should give us pause.

Richard Poynder’s reading later in the same thread was also cautionary: “Elsevier will hope that the push back will get side-tracked by in-fighting … I think it will take comfort if the OA movement starts in-fighting instead of pushing back.”

Folks, let’s not fall for that.

We all know that Stevan Harned, among many others, is committed to Green; and that Mike Eisen, among many others, has huge investment in Gold. We can, and should, have rigorous discussions about the strengths and weaknesses of both approaches. We should expect that OA advocates who share the same goal but have different backgrounds will differ over tactics, and sometimes differ robustly.

But there’s a world of difference between differing robustly and differing rancorously. Let’s all (me included) be sure we stay on the right side of that line. Let’s keep it clear in our minds who the enemy is: not people who want to use a different strategy to free scholarship, but those who want to keep it locked up.

And here ends my uncharacteristic attempt to position myself as The Nice, Reasonable One in this discussion — a role much better suited to Peter Suber or Stephen Curry, but it looks like I got mine written first :-)

Somehow this seems to have slipped under the radar: National Science Foundation announces plan for comprehensive public access to research results. They put it up on 18 March, two whole months ago, so our apologies for not having said anything until now!

This is the NSF’s rather belated response to the OSTP memo on Open Access, back in January 2013. This memo required all Federal agencies that spend $100 million in research and development each year to develop OA policies, broadly in line with the existing one of the NIH which gave us PubMed Central. Various agencies have been turning up with policies, but for those of us in palaeo, the NSF’s the big one — I imagine it funds more palaeo research than all the others put together.

So far, so awesome. But what exactly is the new policy? The press release says papers must “be deposited in a public access compliant repository and be available for download, reading and analysis within one year of publication”, but says nothing about what repository should be used. It’s lamentable that a full year’s embargo has been allowed, but at least the publishers’ CHORUS land-grab hasn’t been allowed to hobble the whole thing.

There’s a bit more detail here, but again it’s oddly coy about where the open-access works will be placed: it just says they must be “deposited in a public access compliant repository designated by NSF”. The executive summary of the actual plan also refers only to “a designated repository”

Only in the full 31-page plan itself does the detail emerge. From page 5:

In the initial implementation, NSF has identified the Department of Energy’s PAGES (Public Access Gateway for Energy and Science) system as its designated repository and will require NSF-funded authors to upload a copy of their journal articles or juried conference paper to the DOE PAGES repository in the PDF/A format, an open, non-proprietary standard (ISO 19005-1:2005). Either the final accepted version or the version of record may be submitted. NSF’s award terms already require authors to make available copies of publications to the Cognizant Program Officers as part of the current reporting requirements. As described more fully in Sections 7.8 and 8.2, NSF will extend the current reporting system to enable automated compliance.

Future expansions, described in Section 7.3.1, may provide additional repository services. The capabilities offered by the PAGES system may also be augmented by services offered by third parties.

So what is good and bad about this?

Good. It makes sense to me that they’re re-using an existing system rather than wasting resources and increasing fragmentation by building one of their own.

Bad. It’s a real shame that they mandate the use of PDF, “the hamburger that we want to turn back into a cow”. It’s a terrible format for automated analysis, greatly inferior to the JATS XML format used by PubMed Central. I don’t understand this decision at all.

Then on page 9:

In the initial implementation, NSF has identified the DOE PAGES system to support managing journal articles and juried conference papers. In the future, NSF may add additional partners and repository services in a federated system.

I’m not sure where this points. In an ideal world, it would mean some kind of unifying structure between PAGES and PubMed Central and whatever other repositories the various agencies decide to use.

Anyone else have thoughts?

Update from Peter Suber, later that day

Over on Google+, Peter Suber comments on this post. With his permission, I reproduce his observations here:

My short take on the policy’s weaknesses:

  • will use Dept of Energy PAGES, which at least for DOE is a dark archive pointing to live versions at publisher web sites
  • plans to use CHORUS (p. 13) in addition to DOE PAGES
  • requires PDF
  • silent on open licensing
  • only mentions reuse for data (pp. v, 18), not articles, and only says it will explore reuse
  • silent on reuse for articles even tho it has a license (p. 10) authorizing reuse
  • silent on the timing of deposits

I agree with you that a 12 month embargo is too long. But that’s the White House recommended default. So I blame the White House for this, not NSF.

To be more precise, PAGES favors publisher-controlled OA in one way, and CHORUS does it in another way. Both decisions show the effect of publisher lobbying on the NSF, and its preference for OA editions hosted by publishers, not OA editions hosted by sites independent of publishers.

So all in all, the NSF policy is much less impressive than I’d initially thought and hoped.

Just a quick post today, to refute an incorrect idea about open access that has unfortunately been propagated from time to time. That is the idea that if (say) PLOS were acquired by a barrier-based publisher such as Taylor and Francis, then its papers could be hidden behind paywalls and effectively lost to the world. For example, in Glyn Moody’s article The Open Access Schism, Heather Morrison is quoted as follows:

A major concern about the current move towards CC-BY is that it might allow re-enclosure by companies […] This is a scenario suggested by assistant professor in the School of Information Studies at the University of Ottawa Heather Morrison. As she explains, “There is nothing in the CC BY license that would stop a business from taking all of the works, with attribution, and selling them under a more restrictive license—not only a more restrictive CC-type license (STM’s license is a good indication of what could happen here), but even behind a paywall, then buying out the OA publisher and taking down the OA content.”

This is flatly incorrect.

Reputable open-access publishers not only publish papers on their own sites but also place them in third-party archives, precisely to guard against doomsday scenarios. If (say) PeerJ were made an offer they couldn’t refuse by Elsevier, then the new owners could certainly shut down the PeerJ site; but there’s nothing the could do about the copies of PeerJ articles on PubMed Central, in CLOCKSS and elsewhere. And of course everyone who already has copies of the articles would always be free to distribute them in any way, including posting complete archives on their own websites.

Let’s not accept this kind of scaremongering.