“We don’t need OA in our field, everything is on arXiv”. Nope.

June 9, 2016

In discussions of open access, it’s pretty common for us biologists to suffer from arXiv envy: the sense that mathematicians and physicists have the access problem solved, because they all put their work on arXiv.

That’s a widespread idea, which is why we see tweets like this one, which floated past in my stream today:

Turns out, not so much. A preprint by Larivière et al. (2013) looked at various aspects of the relationship between papers on arXiv and their corresponding versions in journals, as indexed by the Web of Science. they were interested in several other things (like the average delay between arXiv publication and journal publication) but the aspect of their work that struck me was this:


Even in mathematics, the field that is most committed to arXiv, only a feeble 21.5% of published papers are also available on arXiv! In physics, it’s 20%, and “Earth and Space” it’s a smidge under 12%. For everything else, it’s virtually nothing.

Does that come as a shock to anyone else? I’ve not seen figures before, but I always thought the numbers were more like 90-95% in maths, physics and astronomy.

Even within the most arXiv-aware subfields, the numbers are disappointing:


Even the very best subfield manages to get only about 72% of its publications into arXiv. After the second best (69%), no other subfield does better than a frankly abject 31%.

So unless I am badly misunderstanding this study, it seems the old idea that you don’t need open-access journals in maths and physics because everything’s on arXiv is way off base.

How very disappointing.


  • Larivière, Vincent, Cassidy R. Sugimoto, Benoit Macaluso, Staša Milojević, Blaise Cronin, Mike Thelwall. 2013. arXiv e-prints and the journal of record: An analysis of roles and relationships. arXiv 1306.3261.


20 Responses to ““We don’t need OA in our field, everything is on arXiv”. Nope.”

  1. gowers Says:

    In mathematics at least, there’s a strong correlation between the quality of a paper and whether it is put on arXiv. So the statistic that matters to me is not the percentage of papers that appear, but the percentage of papers of interest to me that appear. And that does seem to be very high these days in my subfield — in fact, so high that it’s effectively become 100% because I probably won’t look at a paper that isn’t put on arXiv. So I’m with Dr Marvin on this one.

  2. Mike Taylor Says:

    “… in fact, so high that it’s effectively become 100% because I probably won’t look at a paper that isn’t put on arXiv.”

    Wait a minute, Tim — you seem to be saying that you will only look at papers that are on arXiv, and that 100% of the papers you do look at are on arXiv. How could it be otherwise?

  3. Mark C. Wilson Says:

    As they say, YMMV. Tim may well be right about his particular sub(sub)field. I subscribe to daily mailings from arXiv and there are a lot of papers to look at, perhaps more than I can even scan. But the lack of coverage is a problem for two reasons: it makes the job of algorithmic recommender systems harder, and makes it less likely that the long tail of papers, each of which may not be exciting enough for Tim to look at but which may in totality be every important for the advancement of science, receive appropriate readership. On balance I agree that this lack of coverage is a real shame. I wonder how many of the non-arXiv papers were available in any open repository.

  4. Mark, this is why Scott Morrison’s Mathematics Literature Project was underway, and it’s a pity it has evaporated from his website. We could see the numbers of papers on a) the arXiv b) webpages/non-arXiv repos and c) only legally available from the publisher.

  5. dale Says:

    From an outsiders perspective, I don’t see where it states how much a paper costs/ how deep the backlog is/ or for that matter whether it’s easier to go elsewhere if the editors have such high standards that it makes it worth while to publish elsewhere. Time is also important here. Can a single editorial staff manage such an influx of papers ?? Going OA is almost irrelevant unless the other ducks are down isn’t it ?? My 2 cents.

  6. brembs Says:

    Given that essentially every single one of my N=4-5 math/phys acquaintances state that they don’t ever read journals and exclusively use arXiv (i.e., mirroring the quoted statements at the outset of Mike’s post), I wonder what the citation advantage of arXiv papers is? If arXiv really is what most people use for their daily literature consumption, journal articles must be vastly under-cited.
    In other words, if there isn’t a massive citation advantage of arXiv papers, I’d not only question the statement that “everything is on arXiv”, but also begin to question the often cited statements that people only use arXiv.

  7. Mike Taylor Says:

    Dale, I am not sure I understand your comment. “… whether it’s easier to go elsewhere if the editors have such high standards that it makes it worth while to publish elsewhere.” <– What editors? arXiv has no editors, only a very basic is-this-an-attempt-at-science? filter.

  8. gowers Says:

    Things would be different if I thought that I was missing out by sticking to papers that are easy to find online (I’m talking about recent papers — for older papers it’s more complicated). But I don’t.

  9. Philipp Zumstein Says:

    There are IMO two use cases here:
    i) Find a paper, which I already know exists, e.g. I have seen it in the bibliography of another paper.
    ii) Find out what is going on in one research field.

    For i), one also want to make sure, that one has access to it and therefore can read it. If a version of the paper is freely available (either on arXiv or somewhere else as Open Access), then it is normally not hard to find it trough some suitable search engine. Moreover, it might be possible just to ask the authors for a copy of the paper by email.

    For ii), it is not (immediately) important whether the paper is OA, on arXiv or published in a journal. The title, authors and an abstract should be enough to see what new papers are published in a given research field. Afterwards only, if I want to access the interesting papers, then the availability matters. The natural candidates for ii), I would say, are indexing services like MathSciNet or zbMath. Moreover, you can for example play with alerting services in GoogleScholar.

    I think the better question is maybe: how much do paywalled journals help us to in the use cases i) and ii)?

  10. Olivia Says:

    If you don’t want to wastetime on submitting to a preprint archive, you can just write on one from the get go – http://www.authorea.com
    Check them out

  11. […] and the journal of record: An analysis of roles and relationships”. It seems to indicate that not all articles are free on arXiv, and that much is slipping through the sieve, even for fields that profess to adore […]

  12. Valerie Says:

    Additional consideration beyond sharing the information – In the UK there are some concerns that some papers on arXiv are not the specified final agreed text version to satisfy REF OA requirements. Some publishers do not permit peer reviewed text to be posted on arXiv.

  13. Mike Taylor Says:

    Every time I read a sentence that begins “Some publishers do not permit”, something inside me dies.

    Like I said: these publishers are not our friends. They should be part of the solution; but they have instead chosen to make themselves part of the problem.

  14. Nima Says:

    Indeed, Mike. It is painful to just read it. I see so many beautiful papers that exist only behind paywalls or as tiny thumbnails of titanosaur dorsal photos on google images… only to find a $60 fee for a 4 page paper. With a very ‘respectable’ and ‘official’ Wiley or Elsevier logo conspicuously visible.

    I have yet to find any of the papers I was looking for on arXiv (even the few times I actually have looked for maths and physics papers). For paleo papers, I just happened to get very lucky in having met some people at SVP (who shall remain anonymous) as well as through DA and one of my Polish friends at Dinozaury, and thus gain access to papers I needed for some of the skeletals I am working on.

    And as for the whole “we need these entirely reasonable fees to improve access to research” plea … that has to take the cake for self-serving lies. Like a fox assuredly “needs” control of the henhouse to “improve” public access to affordable organic chicken.

  15. Māris Ozols Says:

    In my subfield (which is theory of quantum computing and quantum information, generally listed under quant-ph) the fraction of papers on arXiv is very high.

    Perhaps, one of the contributing factors – at least for the younger crowd of researchers – might be that papers from arXiv automatically appear also on SciRate (https://scirate.com/), a website that lets people “up-vote” papers they like the most (occasionally some papers also get discussed in the comments section).

    People are generally curious whether others “like” their work, so a large fraction of people in my field use this website on a daily basis. In principle it can be used by anyone to rate or comment on any paper on arXiv. It just so happens that for historical reasons it is mostly used only by people in my subfield, because it was created by people from the same community.

    I think that having websites that are well-integrated with arXiv and that provide useful functionality on top of what arXiv provides, could be one way of actually increasing the popularity of arXiv itself.

  16. In theoretical physics, we indeed do not need OA because everything is on arXiv.

    We still have to publish in journals in order to satisfy funders and administrators. We still cite journal versions rather than arXiv versions, because journals and search engines work that way. However, the arXiv version is often scientifically better than the journal version, if only because it is much easier to amend.

    So we almost always read articles on arXiv, and the SCOAP3 idea of paying for OA is a senseless waste of money.

  17. Mike Taylor Says:

    In theoretical physics, we indeed do not need OA because everything is on arXiv.

    According to the article cited above, only 20% of physics articles are on arXiv. Even in the most arXiv-rich subfields (Astrophysics, Nuclear & Particle Physics) the rate doesn’t get above 70%.

    (Of course the study could be just plain wrong — but I’ve yet to come across one that contradicts its findings: only anecdote.)

  18. Yes. But physics is vast.

    I suspect there is be a sharp transition between communities where arXiv is marginal, and communities where it is practically mandatory. I belong to one of the latter communities.

    In my community, if your paper is not on arXiv, it will not be read. It is a mystery to me how people in other communities follow the literature, if they cannot look up new papers – all new papers – on arXiv every morning.

    The article cite above is probably too coarse-grained to detect these phenomenons.

  19. Mike Taylor Says:

    Very interesting. So the lesson here may be not that arXiv’s coverage is less comprehensive than we thought, but that it’s concentrated in smaller patches.

  20. Philipp Zumstein Says:

    So we almost always read articles on arXiv, and the SCOAP3 idea of paying for OA is a senseless waste of money.

    There is also a consortium of institutions paying money for arXiv (click at the upper right corner on the website) and actually they have IMO a similar sponsorship model. However, the system with preprints and (costly) journal publications might not be ideal.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: