Freeing digitised journal archives

May 31, 2012

1. Publishing economics 101

Although publishing journal articles is now much less costly than it used to be (thanks to machine-readable submissions, paperless electronic distribution, etc.) it still costs some money to get a research paper from manuscript to its published form. So publishers — unless supported by grants, by government agencies or similar — need a revenue stream.

There are, roughly, two ways for academic journal publishers to make money. The traditional way is to lock up the papers and prohibit access to them unless a fee is paid. (I will not now recapitulate all the reasons why this is a terrible idea.) The other way — known as “Gold Open Access” — is for the author, or his project, department or funding body — to pay the publisher a one-off fee, after which the publisher releases the final version of the article to the world.

So those are the two choices: pay for access, or pay to publish.

2. How this works for old papers

To my mind, it’s morally unjustifiable to lock up new research behind paywalls, and we’re working on making sure that stops happening. But publishers do more than just publishing new papers. They also digitise old ones. For example, Elsevier has scans of Cretaceous Research going all the way back to 1980, long predating the current digital publication pipeline.  So they must have scanned the old issues — a non-trivial process when done well, and one that will have cost them something to do.

Of course, I want all those archives to be open, just like the new papers, and for the usual reason: locking them away impedes the progress of science. But how can the publisher recover the costs of scanning? The problem here is that there is no model analogous to Gold Open Access: the authors are not now, years after the event, going to pay for their works to be made OA.

Does that mean the dreaded paywall is the only option?

3. A possible solution

It occurred to me, in a tweet earlier today, that it might be possible to crowd-source a one-off payment to copyright holders, in exchange for which they would release a journal’s archives into the public domain (or as CC BY):

I thought I’d been very clever and inventive, until Ross Mounce pointed out that a very similar initiative already exists — one that was launched only a fortnight ago!

I feel particularly stupid about not having spotted this similarity because I (slightly) know Eric Hellman, the founder of Unglue.it, and I read his blog Go To Hellman pretty consistently. So I’ve actually known about Unglue.it ever since the idea was first floated, long before it was called Unglue.it.  So, it turns out I am a doofus.

4. Can it work?

This is not quite on-mission for Unglue.it, which allows you to “pledge toward creating ebooks that will be legally free, worldwide”. But it’s obviously in the same spirit. I don’t yet know whether Unglue.it would accept a campaign to free the archives of a journal, but even if it won’t, Kickstarter presumably will. So there are mechanisms that can be used.

One obvious roadblock would be if the publishers demanded silly money.  One could imaging, for example, Elsevier starting from the price of their Sponsored Article option, $3000 per article.  Then volume 19 (1998) of Cretaceous Research has 41 articles spread across six issues, so they could conceivably try to set a price of 41 × $3000 = $123,000. If that volume were representative (I have no idea whether it is), then the price for the whole run from 1980 till 2011 would be 32 times as much, which is $5M. And clearly no-one’s going to pay that.

But I assume that publishers have people with a reasonable notion of the true commercial value of a title. Finance people presumably know how much Elsevier make from Cretaceous Research historical archives in a year, and would be satisfied to sell the property for an up-front payment of, say, ten years’ worth. It would be interesting to know what such a price would come to … and unfortunately (I bet) very, very difficult to find out.

Now what?

I’m not sure whether to try to pursue this. Eric, can you comment whether this seems Unglue.it-friendly to you? Can anyone who works for a publishers give a ballpark estimate — even order-of-magnitude — of what kinds of price you imagine might be acceptable? Can anyone else volunteer an educated guess?

16 Responses to “Freeing digitised journal archives”

  1. spammer Says:

    I have no professional involvement with publishers, so my comment is rambling insomniac spouting based on two assumptions: firstly, that they’d price articles based on an alternative path of retaining rights in perpetuity; secondly, that they’d price them according to some construction of Present Value resembling that applied to simple financial instruments. Assumption 2b is that I can remember this stuff rightly.

    This way, the shortcut to find the present value of a perpetual annuity is PV = C/r, where C is anticipated annual cash flow, and r is the required rate of return.

    To make a very sketchy example, if it is assumed that one particular $37.95 article is purchased and cited five times a year, on average, and that Elsevier will require a base return of 36% (from http://www.guardian.co.uk/commentisfree/2011/aug/29/academic-publishers-murdoch-socialist), the price to buy it out from their ownership today would be

    189.75/0.36 = $ 527.09.

    While this is unlikely to be representative of all articles in reality (I think…), as an example, then, following from your stated assumption above and the associated figures, volume 19 would cost $21,610.42, and the full run would cost $691,533.33.

    From this, if Elsevier were looking to either discourage, or maximise their gain from, any buy-out of the archive, seeking the sponsored-article price would be the way to do it.

    On the other hand, the formula C/(r-g) would be appropriate for use where growth in value is anticipated. It would be fully possible to set an excessively high price by this model, just by making some daft assumptions about future changes in demand for articles, viable changes in individual article pricing, or future profit levels amongst academic publishers.

    … Err, yeah. Apologies are probably due for this, for both length and limited value. Sorry.

  2. gluejar Says:

    I’ve rather admired Elsevier for their business acumen. I expect that if the numbers work, they would go for an Unglue.it campaign; and all indications are that the numbers that work for them are $3000 per article. Because that’s number at which gold open access works enough in their favor so that the decision is easy for them.

    From the scholars’ point of view, I expect that $3000 per article works for 10% of the articles. I was a scientist once, and my view was that 90% of what was published was crap, but there were articles I would evangelize. A totally different type of peer review.

    The business model of the journal is to aggregate a lot of articles, each of which has a very small audience, but which together has sufficient audience to make publication of the aggregate worthwhile. We’ve not focused unglue.it on journal articles because the audiences are so small. For monographs, the aggregating entity is the library rather than the journal; and we’ve thought that it’s rational for libraries to act collectively to make monographs open to all. In addition to our method, there’s the initiative by Frances Pinter (KnowledgeUnlatched http://www.knowledgeunlatched.org/) to support academic monographs; we consider ourselves on the same team.

    One interesting question is this: at what granularity would it make sense to unglue journal archives? If at the article level, you could focus on ungluing the important stuff. If at the volume level, you should expect to get some sort of discount from the publisher, and as I mentioned, expect Elsevier to be rational but not generous.

    The unglue.it market dynamics are designed to allow the rights holder to reduce their price (but not increase it) during a campaign. So I would expect volume 19 to have an asking price of $123,000. But suppose that a campaign was run and on the last day of the campaign only $12,300 had been pledged. The rights holder would have the choice of taking $12,300 to make it free or $0 to keep it toll access. And then we’d find out what the bean counters really think. Who knows if libraries are capable of rational economic analysis.

    Another difficulty with running a campaign for Cretaceous Research’s archive is that supporters will assess the fund-worthiness of the rights holder. Assuming Elsevier is the rights holder, how many libraries and scholars would be willing to contribute knowing the money goes to Elsevier? Elsevier would need to burnish its goodwill in a community with different priorities.

    My suggestion would be to start as small as possible. Ungluing an article is both easily within reach ad not threatening to a publisher. A smart publisher will view the exercise as a risk-free way to build community. And publishing needs to be increasingly about connecting content with a community.

  3. Henry Cohn Says:

    I think Elsevier is setting an important precedent by opening up the archives of more than 40 of their math journals for free. (Only back to 1995 so far, unfortunately, but I think they will eventually have to go all the way back.) If we work at this, I think we’ll be able to establish a consensus that all journals should be open access after a suitable time window (perhaps five years), and we’ll end up getting all the back issues for free since no reputable publisher will want to defy this consensus. So I’d be reluctant to explore buying access to something I expect we’ll get anyway, especially because if the first people to try this negotiate poorly, it will set a terrible precendent and convince publishers that they’ll be able to get a lot of money for the back issues if they hold out long enough.

    As for costs of scanning, my understanding (from talking to a couple of nonprofit publishers with large scanning programs) is that in many cases this has already paid for itself.


  4. Henry is certainly right about digitising back issues of journals having paid for itself. I saw one figure on what one publisher spent, and yes, it was a lot of money. But it was less than one year’s worth of profit from Elsevier.

    I certainly agree that pushing to get older material (i.e. pre-1995) opened up is the best course of action. There is the old ‘but the old owners still want income’ excuse, but this cannot be true in all cases, where very often the bigger publisher simply bought the old publisher.

  5. Mike Taylor Says:

    Thanks, “spammer” (what an unflattering pseudonym to have chosen!) — very helpful, a completely different model for estimating candidate prices.

  6. Mike Taylor Says:

    Henry Cohn says: “I’d be reluctant to explore buying access to something I expect we’ll get anyway, especially because if the first people to try this negotiate poorly, it will set a terrible precendent and convince publishers that they’ll be able to get a lot of money for the back issues if they hold out long enough.”

    This is an excellent point, and makes me half wish I’d never suggested the idea. Especially when juxataposed with Eric’s (fascinating) comment suggesting that Elsevier right hold out fotr $3000 per article.

    These are strange and turbid waters.

  7. anon Says:

    Scan everything, mass release it without concern for the publishers (they don’t deserve it). You can’t put that kind of thing back in the box once it’s out there.

  8. Mike Taylor Says:

    [Administrative note: SV-POW! does not endorse the strategy suggested by “anon” in the previous comment. But we allowed the comment to be posted in accordance with our policy of allowing anything that’s not spam or personal abuse.]

  9. spammer Says:

    Dr. Taylor: as I understand it, that’s the most basic consideration that they’d make when deciding whether or not to make an investment in a financial instrument, or a project with a physical product. If Elsevier do not intend to use the research for other purposes beyond deriving an income from the rights (for example, they’re not, as far as I know, sponsoring any research which follows directly from papers in that journal), then it would seem plausible that they treat it purely as a perpetual source of income, and so that PV/NPV models would give a likely minimum price they’d be looking for.

    …It’s just occurred to me that, for articles produced multiple years ago, they may be more inclined to require the sponsored-article price plus whatever they reckon that would have gained in interest in the intervening period.

    From what I’ve seen of Elsevier’s position, and those of other publishers, and academics, a substantial part of the problem is that the publishers are thinking as corporate investors (theoretical responsibility to maximise profit/sharholder value; junk journals & bundling ensure an income from less popular products; 3-stage resistance to regulation), while the academics are, to a large extent, thinking in terms of corporate responsibility (responsibility to make past research accessible so that it can be used and built upon; desire to use new technology efficiently, despite journal opposition; attempts to find a mutually acceptable publishing model). Between that, the content and length of my posts, and the type of email address I default to when commenting on sites I am not registered with, “spammer” seems a reasonable description, if only because such explicit self-labelling is uncommon.

    Anon: you may be right that pulling back from that is logistically impossible, but do you really want them to go film-and-music-industry upon our collective a-s, with all that may involve in terms of SOPA equivalents, statutory fines and non-financial penalties?


  10. I’ve found that reference to what a ‘big publisher’ spent on digitising their entire catalogue, and yes! it was Elsevier.

    In this article of the Notices of the AMS

    Click to access comm-toped-web.pdf

    there is a reproduction of a letter from Robert Ross of Elsevier (itself originally printed in the European Mathematical Society Newsletter). He states:

    “Elsevier has invested US$160 million in digitizing and maintaining the digital archive of our entire journal program. This investment facilitates and assures electronic access and distribution of the research record, allowing instant access throughout the world or wherever and whenever the Internet is available.”

    A measly $160 million! Less than 3 months worth of last year’s *profit*. Never again listen to Elsevier (or indeed any other of the big four publishers) when they say they need to recoup the costs of scanning old articles.

  11. Mike Taylor Says:

    Excellent work, David, many thanks for that link!

  12. Henry Cohn Says:

    Comparing scanning costs to yearly profit isn’t really the right calculation, since it’s asking whether the rest of Elsevier could subsidize the scanning. What I’d love to know is whether they have broken even already through fees (http://www.info.sciverse.com/sciencedirect/content/backfiles) or other business that could be attributed to the scanning. That’s a much closer call, and I don’t know of publicly available information that would settle it, but I suspect the answer is either yes or near to yes. If that’s true, then they really have no excuse.

  13. Mike Taylor Says:

    That’s a good point, Henry. Anyone have any idea how we might go about trying to discover what Elsevier’s revenues from their digitised back-issues are?


  14. Well to cut it down a bit, we could estimate their total revenue from journals (say for the past 5 years). Access to the backfiles is of course a ‘one off fee’, but only continues as long as one has a current ScienceDirect subscription, and one needs a ScienceDirect subscription to access them in the first place. I would take this as saying the general income from journals subsidises the scanning project, otherwise Elsevier would be confident that people could just pay once for permanent access without conditions on other subscriptions.

    Who was it in the blogosphere that had the most reliable figures on Elsevier’s income from journals? I’ve seen several figures bandied around. If we can get the most recent figures, we can extrapolate backwards using a 5% deflation for each year for 5 years to arrive at an approximate figure for journal income.

    As far as finding out how much access to the backfiles themselves cost, we may have to ask some librarians at universities that have made public their deals with Elsevier, or else ask the people who like tracking down that sort of thing.

    From the point of view of mathematics, there are roughly 2250 journals in the pre-1995 backfiles (see http://www.info.sciverse.com/sciencedirect/content/backfiles/collections, with much more detail available from http://www.info.sciverse.com/sciencedirect/content/journals/titles) with 63 of them being mathematics. Even if one takes into account the fact there are mathematics journals stretching back longer than most subject areas, less than 5% of all these journals–and probably more like less than 3%–are mathematics. Taking the smaller figure, this means that it cost on the order of $5 million to scan the mathematics literature that Elsevier owns.

    Individual subject estimates like this might be easier to estimate income from universities purchasing access to the backfiles, due to the individual nature of the subject collections.

    I emphasise mathematics (although this is a general conversation), because I hold that mathematicians, more than any other subject, need access to the old literature.


  15. Actually, looking at lists like http://www.info.sciverse.com/techsupport/journals/bfmath.htm, on the whole mathematics journals _don’t_ go back further than other journals (there was one outlier, and Elsevier apparently don’t publish any of the really old journals). I think I could conservatively say that mathematics is on the order of 2% or less of Elsevier’s backfiles. That’s a cost of around $3 million, and less than Harvard pays each year for their subscriptions. Of course, Henry’s comment about the rest of Elsevier subsidising the scanning still applies, but with several thousand universities out there … (looks up figures) … apparently the total is over 17k, but let’s say 3000 buy access to the mathematics collection. This means that at $1000 per university, the scanning costs for the mathematics journals held by Elsevier are paid for.

    Let’s scale that back up. Now let’s say 5000 universities buy access to the whole backfiles (averaging out those who buy bits and pieces). Then $32k per university (on average) will pay for all the scanning for all subjects. To get an idea if this is a reasonable estimate, Elsevier charge for single titles in the back issue at 2.5% of current list price per year accessed (subject to having current subscription). Let us say 20 years per journal over 1500 journals, at an average list price of $2000. This comes to $1.5 million. And that would be per university, if they bought titles in the backfiles one by one.

    Clearly Elsevier are padding the individual-access cost, but even if in the full backfiles package worked out to be at 0.5% of current list price per journal per year, this would be $300k per university (of my hypothetical 5000) to pay for all of Elsevier’s scanning. Now this sort of figure looks reasonable. I could imagine a university paying something on the order of this for access to electronic copies of all they owned in paper form. If we could find a few sympathetic librarians at universities who refuse to sign secret deals, then we could verify this sort of figure. Of course, the real trick would be to find out how many universities have bought access to the collection, but finding the figures on research libraries in the US, Europe and elsewhere in the world would be a start. Then there would be the larger number of libraries that have just bought access to subject collections….


  16. Hmm, this article:

    http://www.reuters.com/article/2012/06/12/us-science-publishing-open-access-idUSBRE85B0SH20120612

    says that “[[Reed Elsevier] has also spent 600 million pounds over the last 12 years on digitizing the research in its archives.”

    There are several options here:

    * Robert Ross of Elsevier had his details wrong in his open letter,
    * The Reuters article has its details wrong,
    * The Reuters article is using a monetary figure based on different assumptions to the earlier article
    * Elsevier has spend a huge amount of money in the time between the two articles.

    I find the last reason reasonable(ish), but conflicting with the 2006 statement that Elsevier spent the lesser figure on “digitizing and maintaining the digital archive of our entire journal program”. Also, spending $160million (presumably USD) in the first 7 years of the program but 600million pounds overall (12 years) seems a bit asymmetric.

    The second-to-last reason is also reasonable, but unless someone from Elsevier lets us know, we are a bit stuck. I could imagine that “digitizing the research in its archives” includes publishing current research and investment in developing their current systems, but I would prefer to see that figure separate to the cost of scanning the old, pre-1995 articles.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: