It’s now widely understood among researchers that the impact factor (IF) is a statistically illiterate measure of the quality of a paper. Unfortunately, it’s not yet universally understood among administrators, who in many places continue to judge authors on the impact factors of the journals they publish in. They presumably do this on the assumption that impact factor is a proxy for, or predictor of, citation count, which is turn is assumed to correlate with influence.

As shown by Lozano et al. (2012), the correlation between IF and citations is in fact very weak — r2 is about 0.2 — and has been progressively weakening since the dawn of the Internet era and the consequent decoupling of papers from the physical journal that they appear in. This is a counter-intuitive finding: given that the impact factor is calculated from citation counts you’d expect it to correlate much more strongly. But the enormous skew of citation rates towards a few big winners renders the average used by the IF meaningless.

To bring this home, I plotted my own personal impact-factor/citation-count graph. I used Google Scholar’s citation counts of my articles, which recognises 17 of my papers; then I looked up the impact factors of the venues they appeared in, plotted citation count against impact factor, and calculated a best-fit line through my data-points. Here’s the result (taken from a slide in my Berlin 11 satellite conference talk):

berlin11-satellite-taylor-what-we-can-do--impact-factor-graph

I was delighted to see that the regression slope is actually negative: in my case at least, the higher the impact factor of the venue I publish in, the fewer citations I get.

There are a few things worth unpacking on that graph.

First, note the proud cluster on the left margin: publications in venues with impact factor zero (i.e. no impact factor at all). These include papers in new journals like PeerJ, in perfectly respectable established journals like PaleoBios, edited-volume chapters, papers in conference proceedings, and an arXiv preprint.

My most-cited paper, by some distance, is Head and neck posture in sauropod dinosaurs inferred from extant animals (Taylor et al. 2009, a collaboration between all three SV-POW!sketeers). That appeared in Acta Palaeontologia Polonica, a very well-respected journal in the palaeontology community but which has a modest impact factor of 1.58.

My next most-cited paper, the Brachiosaurus revision (Taylor 2009), is in the Journal of Vertebrate Palaeontology – unquestionably the flagship journal of our discipline, despite its also unspectacular impact factor of 2.21. (For what it’s worth, I seem to recall it was about half that when my paper came out.)

In fact, none of my publications have appeared in venues with an impact factor greater than 2.21, with one trifling exception. That is what Andy Farke, Matt and I ironically refer to as our Nature monograph (Farke et al. 2009). It’s a 250-word letter to the editor on the subject of the Open Dinosaur Project. (It’ a subject that we now find profoundly embarrassing given how dreadfully slowly the project has progressed.)

Google Scholar says that our Nature note has been cited just once. But the truth is even better: that one citation is in fact from an in-prep manuscript that Google has dug up prematurely — one that we ourselves put on Google Docs, as part of the slooow progress of the Open Dinosaur Project. Remove that, and our Nature note has been cited exactly zero times. I am very proud of that record, and will try to preserve it by persuading Andy and Matt to remove the citation from the in-prep paper before we submit. (And please, folks: don’t spoil my record by citing it in your own work!)

What does all this mean? Admittedly, not much. It’s anecdote rather than data, and I’m posting it more because it amuses me than because it’s particularly persuasive. In fact if you remove the anomalous data point that is our Nature monograph, the slope becomes positive — although it’s basically meaningless, given that all my publications cluster in the 0–2.21 range. But then that’s the point: pretty much any data based on impact factors is meaningless.

References

 

Let’s take another look at that Giraffatitan cervical. MB.R.2180:C5, from a few days ago:

FigureA-Giraffatitan-SI-C5

That’s a pretty elongate vertebra, right? But how elongate, exactly? How can we quantify whether it’s more or less elongate than some other vertebra?

The traditional answer is that we quantify elongation using the elongation index, or EI. This was originally defined by Upchurch (1998:47) as “the length of a vertebral centrum divided by the width across its caudal face”. Measuring from the full-resolution version of the image above, I make that 1779/529 pixels, or 3.36.

But then those doofuses Wedel et al. (2000:346) came along and said:

When discussing vertebral proportions Upchurch (1998) used the term elongation index (EI), defined as the length of the centrum divided by the width of the cotyle. Although they did not suggest a term for the proportion, Wilson & Sereno (1998) used centrum length divided by the height of the cotyle as a character in their analysis. We prefer the latter definition of this proportion, as the height of the cotyle is directly related to the range of motion of the intervertebral joint in the dorsoventral plane. For the purposes of the following discussion, we therefore redefine the EI of Upchurch (1998) as the anteroposterior length of the centrum divided by the midline height of the cotyle.

Since then, the term EI has mostly been used in this redefined sense — but I think we all agree now that it would have been better for Wedel et al to have given a new name to Wilson and Sereno’s ratio rather than apply Upchurch’s name to it.

Aaaanyway, measuring from the image again, I give that vertebra an EI (sensu Wedel et al. 2000) of 1779/334 = 5.33. Which is 58% more elongate than when using the Upchurch definition! This of course follows directly from the cotyle being 58% wider than tall (529/334 pixels).

So one of principal factors determining how elongate a vertebra seems to be is the shape of its cotyle. And that’s troublesome, because the cotyle is particularly subject to crushing — and it’s not unusual for even consecutive vertebrae from the same column to be crushed in opposite directions, giving them (apparently) wildly different EIs.

Here’s an example (though not at all an extreme one): cervicals 4 and 6 of the same specimen, MB.R.2180 (formerly HM SI), as the multi-view photo above:

DSCN5527-5535-SI-c4-and-c6-posterior

Measuring from the photos as before, I make the width:height ratio of C4 683/722 pixels = 0.95, and that of C6  1190/820 pixels = 1.45. So these two vertebrae — from the same neck, and with only one other vertebrae coming in between them — differ in preserved cotyle shape by a factor of 1.53.

And by the way, this is one of the best preserved of all sauropod neck series.

Let’s take a look at the canonical well-preserved sauropod neck: the Carnegie Diplodocus, CM 84. Here are the adjacent cervicals 13 and 14, in posterior view, from Hatcher (1901: plate VI):

Hatcher1901-plate-VI-C13-C14-posterior

For C14 (on the left), I get a width:height ratio of 342/245 pixels = 1.40. For C13 (on the right), I get 264/256 pixels = 1.03. So C14 is apparently 35% broader than its immediate predecessor. I absolutely don’t buy that this represents how the vertebrae were in life.

FOR EXTRA CREDIT: what does this tell us about the reliability of computer models that purport to tell us about neck posture and flexibility, based on the preserved shapes of their constituent vertebrae?

So what’s to be done?

The first thing, as always in science, is to be explicit about what statements we’re making. Whenever we report an elongation index, we need to clearly state whether it’s EI sensu Upchurch 1998 or EI sensu Wedel et al. 2000. Since that’s so cumbersome, I’m going propose that we introduce two new abbreviations: EIH (Elongation Index Horizonal), which is Upchurch’s original measure (length over horizontal width of cotyle) and EIV (Elongation Index Vertical), which is Wilson and Sereno’s measure (length over vertical height of cotyle). If we’re careful to report EIH and EIV (or better still both) rather than an unspecified EI, then at least we can avoid comparing apples with oranges.

But I think we can do better, by combining the horizontal and vertical cotyle measurements in some way, and dividing the length by the that composite. This would give us an EIA (Elongation Index Average), which we could reasonably expect to preserve the original cotyle size, and so to give a more reliable indication of “true” elongation.

The question is, how to combine the cotyle width and height? There are two obvious candidates: either take the arithmetic mean (half the sum) or the geometric mean (the square root of the product). Note that for round cotyles, both these methods will give the same result as each other and as EIH and EIV — which is what we want.

Which mean should we use for EIA? to my mind, it depends which is best preserved when a vertebra is crushed. If a 20 cm circular cotyle is crushed vertically to 10cm, does it tend to smoosh outwards to 30 cm (so that 10+30 = the original 20+20) or to 40 cm (so that 10 x 40 = the original 20 x 20)? If the former, then we should use arithmetic mean; if the latter, then geometric mean.

Does anyone know how crushing works in practice? Which of these models most closely approximates reality? Or can we do better than either?

Update (8:48am): thanks for Emanuel Tschopp for pointing out (below) what I should have remembered: that Chure et al.’s (2010) description of Abydosaurus introduces “aEI”, which is the same as one of my proposed definitons of EIA. So we should ignore the last four paragraphs of this post and just use aEI. (Their abbreviation is better, too.)

 

References

  • Hatcher, Jonathan B. 1901. Diplodocus (Marsh): its osteology, taxonomy and probable habits, with a restoration of the skeleton. Memoirs of the Carnegie Museum 1:1-63 and plates I-XIII.
  • Upchurch, Paul. 1998. The phylogenetic relationships of sauropod dinosaurs. Zoological Journal of the Linnean Society 124:43-103.
  • Wedel, Mathew J., Richard L. Cifelli and R. Kent Sanders. 2000b. Osteology, paleobiology, and relationships of the sauropod dinosaur Sauroposeidon. Acta Palaeontologica Polonica 45(4):343-388.
  • Wilson, J. A. and Paul C. Sereno. 1998. Early evolution and higher-level phylogeny of sauropod dinosaurs. Society of Vertebrate Paleontology, Memoir 5:1-68.
Currey Alexander 1985 fig 1

Figure 1 from Currey and Alexander (1985)

This post pulls together information on basic parameters of tubular bones from Currey & Alexander (1985), on ASP from Wedel (2005), and on calculating the densities of bones from Wedel (2009: Appendix). It’s all stuff we’ve covered at one point or another, I just wanted to have it all in one convenient place.

Definitions:

  • R = outer radius = r + t
  • r = inner radius = R – t
  • t = bone wall thickness = R – r

Cross-sectional properties of tubular bones are commonly expressed in R/t or K (so that r = KR). K is defined as the inner radius divided by the outer radius (r/R). For bones with elliptical or irregular cross-sections, it’s best to measure two radii at right angles to each other, or use a different measure of cross-sectional geometry (like second moment of area, which I’m not getting into here).

R/t and K can be converted like so:

  • R/t = 1/(1-K)
  • K = 1 – (1/(R/t))

ASP (air space proportion) and MSP (marrow space proportion) measure the cross-sectional area of an element not taken up by bone tissue. ASP and MSP are the same measurement–the amount of non-bone space in a bony element divided by the total–we just use ASP for air-filled bones and MSP for marrow-filled bones. See Tutorial 6 and these posts: one, two, three.

For tubular bones, ASP (or MSP) can be calculated from K:

  • ASP = πr^2/πR^2 = r^2/R^2 = (r/R)^2 = K^2

Obviously R/t and K don’t work for bones like vertebrae that depart significantly from a tubular shape. But if you had a vertebra or other irregular bone with a given ASP and you wanted to see what the equivalent tubular bone would look like, you could take the square root of ASP to get K and then use that to draw out the cross-section of that hypothetical tubular bone.

To estimate the density of an element (at least near the point of a given cross-section), multiply the proportional areas of bone and air, or bone and marrow, by the specific gravities of those materials. According to Currey and Alexader (1985: 455), the specific gravities of fatty marrow and bone tissue are 0.93 and 2.1, respectively.

For a marrow-filled bone, the density of the element (or at least of the part of the shaft the section goes through) is:

  • 0.93MSP + 2.1(1-MSP)

Air is matter and therefore has mass and density, but it is so light (0.0012-0.0013 g/mL) that we can effectively ignore it in these calculations. So the density of a pneumatic element is: 2.1(1-ASP) For the three examples in the figure at the top of the post, the ASP/MSP values and densities are:

  • (b) alligator femur (marrow-filled), K = 0.35, MSP = K^2 = 0.12, density = (0.93 x 0.12) + (2.1 x 0.88) = 1.96 g/mL
  • (c) camel tibia (marrow-filled), K = 0.57, MSP = K^2 = 0.32, density = (0.93 x 0.32) + (2.1 x 0.68) = 1.73 g/mL
  • (d) Pteranodon first phalanx (air-filled), K = 0.91, ASP = K^2 = 0.83, density = (2.1 x 0.17) = 0.36 g/mL

What if we switched things up, and imagined that the alligator and camel bones were pneumatic and the Pteranodon phalanx was marrow-filled? The results would then be:

  • (b) alligator femur (hypothetical air-filled), K = 0.35, ASP = K^2 = 0.12, density = (2.1 x 0.88) = 1.85 g/mL
  • (c) camel tibia (hypothetical air-filled), K = 0.57, ASP = K^2 = 0.32, density = (2.1 x 0.68) = 1.43 g/mL
  • (d) Pteranodon first phalanx (hypothetical marrow-filled), K = 0.91, MSP = K^2 = 0.83, density = (0.93 x 0.83) + (2.1 x 0.17) = 1.13 g/mL

In the alligator femur, the amount of non-bone space is so small that it does much matter whether that space is filled by air or marrow–replacing the marrow with air only lowers the density of the element by 5-6%. The Pteranodon phalanx is a lot less dense than the alligator femur for two reasons. First, there is much less bony tissue–the hypothetical marrow-filled phalanx is 42% less dense as the alligator femur. Second, the marrow is replaced by air, which reduces the density by an additional 40% relative to the alligator.

Next time: how to write punchier endings for tutorial posts.

References

I recently reread Dubach (1981), “Quantitative analysis of the respiratory system of the house sparrow, budgerigar and violet-eared hummingbird”, and realized that she reported both body masses and volumes in her Table 1. For each of the three species, here are the sample sizes, mean total body masses, and mean total body volumes, along with mean densities I calculated from those values.

  • House sparrow, Passer domesticus, n = 16, mass = 23.56 g, volume = 34.05 mL, density = 0.692 g/mL
  • Budgerigar, Melopsittacus undulatus, n = 19, mass = 38.16 g, volume = 46.08 mL, density = 0.828 g/mL
  • Sparkling violetear,* Colibri coruscans, n = 12, mass = 7.28 g, volume = 9.29 mL, density = 0.784 g/mL

* This is the species examined by Dubach (1981), although not specified in her title; there are four currently-recognized species of violetears. And apparently ‘violetear’ has overtaken ‘violet-eared hummingbird’ as the preferred common name. And as long as we’re technically on a digression,  I’m almost certain those volumes do not include feathers. Every volumetric thing I’ve seen on bird masses assumes plucked birds (read on).

This is pretty darned interesting to me, partly because I’m always interested in how dense animals are, and partly because of how the results compare to other published data on whole-body densities for birds. The other results I am most familiar with are those of Hazlehurst and Rayner (1992) who had this to say:

There are relatively few values for bird density. Welty (1962) cited 0.9 g/mL for a duck, and Alexander (1983) 0.937 g/mL for a domestic goose, but those values may not take account of the air sacs. Paul (1988) noted 0.8 g/mL for unspecified bird(s). To provide more reliable estimates, the density of 25 birds of 12 species was measured by using the volume displacement method. In a dead, plucked bird the air-sac system was reinflated (Saunder and Manton 1979). The average density was 0.73 g/mL, suggesting that the lungs and air sacs occupy some quarter of the body.

That result has cast a long shadow over discussions of sauropod masses, as in this paper and these posts, so it’s nice to see similar results from an independent analysis.  If you’re curious, the weighted mean of the densities calculated from Duchard’s Dubach’s (1981) data is 0.77. I’d love to see the raw data from Hazlehurst and Rayner (1992) to see how much spread they got in their density measurements.  Unfortunately, they did not say which birds they used or give the raw data in the paper (MYDD!), and I have not asked them for it because doing so only just occurred to me as I was writing this post.

There will be more news about hummingbirds here in the hopefully not-too-distant future. Here’s a teaser:

SkeletonFULL

Yes, those are its hyoids wrapped around the back of its head–they go all the way around to just in front of the eyes, as in woodpeckers and other birds that need hyper-long tongue muscles. There are LOADS of other interesting things to talk about here, but it will be faster and more productive if I just go write the paper like I’m supposed to be doing.

Oh, all right, I’ll say a little more. This is a  young adult female Anna’s hummingbird, Calypte anna, who was found by then-fellow-grad-student Chris Clark at a residential address in Berkeley in 2005. She was unable to fly and died of unknown causes just a few minutes after being found. She is now specimen 182041 in the ornithology collection at the Museum of Vertebrate Zoology at Berkeley. Chris Clark and I had her microCTed back in 2005, and that data will finally see the light of day thanks to my current grad student, Chris Michaels, who generated the above model.

This bird’s skull is a hair over an inch long, and she had a body mass of 3.85 grams at the time of her death. For comparison, those little ketchup packets you get at fast-food burger joints each contain 8-9 grams of ketchup, more than twice the mass of this entire bird when it was alive!

References

  • Dubach, M. 1981. Quantitative analysis of the respiratory system of the house sparrow, budgerigar and violet-eared hummingbird. Respiration Physiology 46(1): 43-60.
  • Hazlehurst, G.A., and Rayner, J.M. 1992. Flight characteristics of Triassic and Jurassic Pterosauria: an appraisal based on wing shape. Paleobiology 18(4): 447-463.

 

It’s well worth reading this story about Thomas Herndon, a graduate student who as part of his training set out to replicate a well-known study in his field.

The work he chose, Growth in a Time of Debt by Reinhart and Rogoff, claims to show that “median growth rates for countries with public debt over roughly 90 percent of GDP are about one percent lower than otherwise; average (mean) growth rates are several percent lower.” It has been influential in guiding the economic policy of several countries, reaffirming an austerity-based approach.

So here is Lesson zero, for policy makers: correllation is not causation.

To skip ahead to the punchline, it turned out that Reinhart and Rogoff made a trivial but important mechanical mistake in their working: they meant to average values from 19 rows of their spreadsheet, but got the formula wrong and missed out the last five. Those five included three countries which had experienced high growth while deep in debt, and which if included would have undermined the conclusions.

Therefore, Lesson one, for researchers: check your calculations. (Note to myself and Matt: when we revise the recently submitted Taylor and Wedel paper, we should be careful to check the SUM() and AVG() ranges in our own spreadsheet!)

Herndon was able to discover this mistake only because he repeatedly hassled the authors of the original study for the underlying data. He was ignored several times, but eventually one of the authors did send the spreadsheet. Which is just as well. But of course he should never have had to go chasing the authors for the spreadsheet because it should have been published alongside the paper.

Lesson two, for researchers: submit your data alongside the paper that uses it. (Note to myself and Matt: when we submit the revisions of that paper, submit the spreadsheets as supplementary files.)

Meanwhile, governments around the world were allowing policy to be influenced by the original paper without checking it — policies that affect the disposition of billions of pounds. Yet the paper only got its post-publication review because of an post-grad student’s exercise. That’s insane. It should be standard practice to have someone spend a day or two analysing a paper in detail before letting it have such a profound effect.

And so Lesson three, for policy makers: replicate studies before trusting them.

Ironically, this may be a case where the peer-review system inadvertently did actual harm. It seems that policy makers may have shared the widespread superstition that peer-reviewed publications are “authoritative”, or “quality stamped”, or “trustworthy”. That would certainly explain their allowing it to affect multi-billion-pound policies without further validation. [UPDATE: the paper wasn't peer-reviewed after all! See the comment below.]

Of course, anyone who’s actually been through peer-review a few times knows how hit-and-miss the process is. Only someone who’s never experienced it directly could retain blind faith in it. (In this respect, it’s a lot like cladistics.)

If a paper has successfully made it through peer-review, we should afford it a bit more respect than one that hasn’t. But that should never translate to blind trust.

In fact, let’s promote that to Lesson four: don’t blindly trust studies just because they’re peer-reviewed.

There’s been a lot of concern in some corners of the world about the Finch Report‘s preference for Gold open access, and the RCUK policy‘s similar leaning. Much of the complaining has focussed on the cost of Gold OA publishing: Article Processing Charges (APCs) are very offputting to researchers with limited budgets. I thought it would be useful to provide a page that I (and you) can link to when facing such concerns.

This is long and (frankly) a bit boring. But I think it’s important and needs saying.

1. How much does the Finch Report suggest APCs cost?

Worries about high publishing costs are exacerbated by the widely reported estimate of £2000 for a typical APC, attributed to the Finch Report. In fact, that is not quite what the report (page 61) says:

Subsequent reports also suggest that the costs for open access journals average between £1.5k and £2k, which is broadly in line with the average level of APCs paid by the Wellcome Trust in 2010, at just under £1.5k.

Still, the midpoint of Finch’s “£1.5k-£2k” range is £1750, which is still a hefty amount. Where does it come from? A footnote elucidates:

Houghton J et al, op cit; Heading for the Open Road: costs and benefits of transitions in scholarly communications, RIN, PRC, Wellcome Trust, JISC, RLUK, 2011. See also Solomon, D, and Björk, B-Christer,. A study of Open Access Journals using article processing charges. Journal of the American Society for Information Science and Technology , which suggests an average level of APCs for open access journal (including those published at very low cost in developing countries) of just over $900. It is difficult to judge – opinions differ – whether costs for open access journals are on average likely to rise as higher status journals join the open access ranks; or to fall as new entrants come into the market.

[An aside: these details would probably be better known, and the details of the Finch report would be discussed in a more informed way, if the report were available on the Web in a form where individual sections could be linked, rather than only as a PDF.]

The first two cited sources look good and authoritative, being from JISC and a combination of well-respected research organisations. Nevertheless, the high figure that they cite is misleading, and unnecessarily alarming, for several reasons.

2. Why the Finch estimate is misleading

2.1. It ignores free-to-the-author journals.

The Solomon and Björk analysis that the Finch Report rather brushes over is the only one of the three to have attempted any rigorous numerical analysis, and it found as follows (citing an earlier study, subsequently written up):

Almost 23,000 authors who had published an article in an OA journal where asked about how much they had paid. Half of the authors had not paid any fee at all, and only 10% had paid fees exceeding 1,000 Euros [= £812, less than half of the midpoint of Finch's range].

And the proportion of journals that charge no APC (as opposed to authors who paid no fee) is even higher — nearly three quarters:

As of August 2011 there were 1,825 journals listed in the Directory of Open Access Journals (DOAJ) that, at least by self-report, charge APCs. These represent just over 26% of all DOAJ journals.

So there are a lot of a zero-cost options. And there are by no means all low-quality journals: they include, for example, Acta Palaeontologica Polonica and Palaeontologia Electronica in our own field of palaeontology, the Journal of Machine Learning Research in computer science and Theory and Applications of Categories in maths.

2.2. It ignores the low average price found by the Solomon and Björk analysis.

The Solomon and Björk paper is full of useful information and well worth detailed consideration. They make it clear in their methodology section that their sample was limited only to those journals that charge a non-zero APC, and their analysis concluded:

[We studied] 1,370 journals that published 100,697 articles in 2010. The average APC was 906 US Dollars (USD) calculated over journals and 904 US Dollars USD calculated over articles.

(The closeness of the average across journals and dollars is important: it shows that the average-by-journals is not being artificially depressed by a large number of very low-volume journals that have low APCs.)

2.3. It focusses on authors who are spending Other People’s Money.

Recall that Finch’s “£1.5k-£2k” estimate is justified in part by the observation that the APC paid by the Wellcome Trust in 2010 was just under £1.5k. But it’s well established that people spending Other People’s Money get less good value than when they spend their own: that’s why travellers who fly business class when their employer is paying go coach when they’re paying for themselves. (This is an example of the principal-agent problem.)

It’s great that the Wellcome Trust, and some other funders, pay Gold OA fees. For researchers in this situation, APCs should not be problem; but for the rest of us (and, yes, that includes me — I’ve never had a grant in my life) there are plenty of excellent lower-cost options.

And as noted above, lower cost, or even no cost, does not need to mean lower quality.

2.4. It ignores the world’s leading open-access journal.

PLOS ONE publishes more articles than any other journal in the world, has very high production values, and for those who care about such things has a higher impact-factor than almost any specialist palaeontology journal. Its APC is $1350, which is currently about £839 — less than half of the midpoint of Finch’s “£1.5k-£2k” range.

Even PLOS’s flagship journal — PLOS Biology, which is ranked top in the JCR’s biology section, charges $2900, about £1802, which is well within the Finch range.

Meanwhile, over in the humanities (where much of the negative reaction to Finch and RCUK is to be found), the leading open-access megajournal is much cheaper even than PLOS ONE: SAGE Open currently offers an introductory APC of $195 (discounted from the regular price of $695).

2.5. It ignores waivers

The most important, and most consistently overlooked fact among those who complain about how they don’t have any funds for Gold-OA publishing is that many Gold-OA journals offer waivers.

For example, PLOS co-founder Michael Eisen affirms (pers. comm.) that it’s explicitly part of the PLOS philosophy that no-one should be prevented from publishing in a PLOS journal by financial issues. And that philosophy is implemented in the PLOS policy of offering waivers to anyone who asks for one. (For example, my old University of Portsmouth colleagues, Mark Witton and Darren Naish certainly had no funds from UoP to support publication of their azhdarchid palaeobiology paper in PLOS ONE; they asked for a waiver and got it, no questions asked.)

Other major open-access publishers have similar polices.

2.6. It doesn’t recognise how the publishing landscape is changing.

It’s not really a criticism of the Finch Report — at least, not a fair one — that its coverage of eLife and PeerJ is limited to a single passing mention on page 58. Neither of these initiatives had come into existence when the report was drafted. Nevertheless, they have quickly become hugely important in shaping the world of publishing — it’s not a stretch to say that they have already joined BMC and PLOS in defining the shape of the open access world.

For the first few years of operation, eLife is waiving all APCs. It remains to be seen what will happen after that, but I think there are signs that their goal may be to retain the no-APC model indefinitely. PeerJ does charge, but is ridiculously cheap: a one-off payment of $99 pays for a publication every year for life; or $299 for any number of publications at any time. Those numbers are going to skew the average APC way, way down even from their current low levels.

2.7. I suspect it concentrates on hybrid-OA journals.

There are all sorts of reasons to mistrust hybrid journals, including the difficulty of finding the open articles; the very high APCs that they charge is only one.

Why do people use hybrid journals when they are more expensive than fully OA journals and offer so much less (e.g. limited length, no colour, number of figures)? I suspect hybrid OA is the lazy option for researchers who have to conform to an OA mandate but don’t want to invest any time or effort in thinking about open-access options. It’s easy to imagine such researchers just shoving their work into in the traditional paywalled journal, and letting the Wellcome grant pick up the tab. After all, it’s Other People’s Money.

If grant-money for funding APCs becomes more scarce as it’s required to stretch further, then researchers who’ve been taking this sort of box-checking approach to fulfilling OA mandates are going to be forced to think more about what they’re doing. And that’s a good thing.

3. What is the true average cost?

If we put all this together, and assume that researchers working from RCUK funds will make some kind of effort to find good-value open-access journals for their work instead of blindly throwing it at traditional subscription journals and expecting RCUK to pick up the fee, here’s where we land up.

  • About half of authors currently pay no fee at all.
  • Among those that do pay a fee, the average is $906.
  • So the overall average fee is about $453.
  • That’s about £283, which is less than one sixth of what Finch suggests.

4. What are we comparing with?

It’s one thing to find a more realistic cost for an average open-access article. But we also need to realise that we’re not comparing with zero. Authors have always paid publication fees in certain circumstances — subscription journals have levied page charges, extra costs for going past a certain length, for colour figures, etc. For example, Elsevier’s American Journal of Pathology charges authors “$550 per color figure, $50 per black & white or grayscale figure, and $50 per composed table, per printed page”. So a single colour figure in that journal costs more than the whole of a typical OA article.

But that’s not the real cost to compare with.

The real cost is what the world at large pays for each paywalled article. As we discussed here in some detail, the aggregate subscription paid to access an average paywalled article is about $5333. That’s as much as it costs to publish nearly twelve average open-access articles — and for that, you get much less: people outside of universities can’t get it even after the $5333 has been paid.

5. Directing our anger properly

Now think about this: the Big Four academic publishers have profit-margins between 32.4% and 42%. Let’s pick an typical profit margin of 37% — a little below the middle of that range. Assuming this is pretty representative across all subscription publishers — and it will be, since the Big Four control so much of the volume of subscription publishing — that means that 37% of the $5333 of an average paywalled article’s subscription money is pure profit. So $1973 is leaving academia every time a paper is “published” behind a paywall.

So every time a university sends a paper behind a paywall, the $1973 that it burns could have funded four average-priced Gold-OA APCs. Heck, even if you want to discount all the small publishers and put everything in PLOS — never taking a waiver — it would pay for one and a half PLOS ONE articles.

So let me leave you with this. In recent weeks, I’ve seen a fair bit of anger directed at the Finch Report and the RCUK policy. Some researchers have been up in arms at the prospect of having to “pay to say“. I want to suggest that this anger is misdirected. Rather than being angry with a policy that says you need to find $453 when you publish, direct your anger at publishers who remove $1973 from academia every time you give them a paper.

Folks, we have to have the vision to look beyond what is happening right now in our departments. Gold OA does, for sure, mean a small amount of short-term pain. It also means a massive long-term win for us all.

A couple of weeks ago we tried to work out what it costs the global academic community when you publish a paper behind an Elsevier paywall instead of making it open access. The tentative conclusion was that it’s somewhere between £3112 and £6224 (or about $4846-9692), which is about 3.6-7.2 times the cost of publishing in PLoS ONE.

That calculation was fraught with uncertainty, because it’s so difficult to get solid numbers out of Elsevier. So let’s try a simpler one.

In 2009, The STM report: an overview of scientific and scholarly journal publishing reported (page 5) that:

The annual revenues generated from English-language STM journal publishing are estimated at about $8 billion in 2008, up by 6-7% compared to 2007, within a broader STM publishing market worth some $16 billion.
[...]
There were about 25,400 active scholarly peer-reviewed journals in early 2009, collectively publishing about 1.5 million articles a year.

8 billion dollars divided by 1.5 million articles yields a per-article revenue to the STM industry of $5333. And since publisher revenue is the same as academia’s expenditure on publishing, that is the per-article cost to Academia.

(What about the articles currently published as gold open access? Don’t they cut down the number that are being bought through subscriptions, and so raise the average price of a paywalled article? Yes, but not by much: according to page 7 of the report, “about 2% of articles are published in full open access journals” — a small enough proportion that we can ignore it for the purposes of this calculation.)

What can we make of this $5333 figure? For a start, it’s towards the bottom of the $4846-9692 Elsevier range — only 10% of the way up that range. So the balance of probability strongly suggests that Elsevier’s prices are above the industry-wide average, but not hugely above — somewhere between 10% below and 80% above the average.

More importantly, each paywalled article costs the world as much as four PLoS ONE articles. In other worlds, if we all stopped submitted to paywalled journals today and sent all our work to PLoS ONE instead, the total scholarly publishing bill would fall by 75%, from $8 billion to $2 billion.

Why am I comparing with PLoS ONE’s $1350? There are other comparisons I could use — for example, the average cost of $906 calculated by of Solomon and Björk across 100,697 open-access articles in 1,370 journals. But that figure is probably propped up by journals that are deliberately being run at a loss in order to gain visibility or prestige. PLoS ONE is a more conservative comparison point because we know its $1350 is enough for it to run at a healthy operating profit. So we know that a switch to PLoS ONE and similar journals would be financially sustainable.

But there’s certainly no reason to think that PLoS ONE’s price of $1350 is as low as you can go and still have good-quality peer-reviewed gold open access. For example, PLoS ONE’s long-time Editor-in-Chief, Pete Binfield, thinks that it can be done, at a profit, for $99 — a staggering 92% price-cut from the $1350 figure we’ve been using. If he’s right — and he’s betting his mortgage that he is — then we could have 54 per-reviewed articles in PeerJ for every one that goes behind a paywall.

It’s too early to know whether PeerJ will work (and I’ll talk about that more another time). But the very fact that someone as experienced and wily as Binfield thinks it will — and was able to attract venture capital from a disinterested and insightful party — strongly indicates that this price-point is at least in the right ballpark.

Which is more than can be said for the Finch Report’s ludicrous over-estimate of £1500-£2000.

What does it cost to publish a paper  in a non-open access Elsevier journal? The immediate cost to the author is often zero (though page charges, and fees for colour illustrations mean this is not always true). But readers have to pay to see the paper, either directly in the case of private individuals or through library budgets in the case of university staff and students. What is the total cost to the world?

Previous attempts

It’s a calculation that I’ve taken a couple of stabs at in public forums, but in both cases space restraints meant that I couldn’t lay out the reasoning in the detail I’d like — and as a result I couldn’t get the kind of detailed feedback that would allow me to refine the numbers. So I am trying again here.

The first version of the calculation was in my article Open, moral and pragmatic at Times Higher Education:

According to Elsevier’s annual report for 2010, it publishes about “200,000 new science & technology research articles each year”. The same report reveals revenues for 2010 of £2.026 billion. This works out as £10,130 per article, each made available only to the tiny proportion of the world’s population that has access to a subscribing library.

As Kent Anderson pointed out in an otherwise misleading comment, that calculation was flawed in that I was using the total of Elsevier revenue rather than just the portion that comes from journal subscriptions. Trying to fix this, and using more up-to-date figures, I provided a better estimate in Academic Publishing Is Broken at The Scientist:

To publish in an Elsevier journal … appears to cost some $10,500. In 2011, 78 percent of Elsevier’s total revenue, or £1,605 million, was contributed by journal subscriptions. In the same year, Elsevier published 240,000 articles, making the average cost per article some £6,689, or about $10,500 US.

But this, it turns out, is also an over-estimate, because it’s 78% of Elsevier’s Scientific, Technical and Medical revenue that comes from journal subscriptions; the other half of Elsevier, their Health Sciences division, has its own revenues.

The data we have to work with

Here’s what I have right now — using data from 2010, the last complete year for which numbers are available.

Bear in mind that Elsevier is a publisher, and Reed Elsevier is a larger company that owns Elsevier and a bunch of other businesses such as Lexis Nexus. According to the notes from a Reed Elsevier investment seminar that took place on December 6, 2011 in London:

  • Page 2: 34% of Reed Elsevier’s total 2010 revenue of £6,055M (i.e. £2058.7M) was from “Science and Medical”, which I take to mean Elsevier. This is in keeping with the total revenue number from Elsevier’s annual report.
  • Page 8: Elsevier’s revenues are split 50-50 between the Scientific & Technical division and the Health Sciences division. 39% of total Elsevier revenue (i.e. £803M) is from research journals in the S&T sector. No percentage is given for research journal revenue in Health Sciences.
  • Page 18: confirmation that 78% of Scientific & Technical revenue (i.e. 39% of total Elsevier revenue) is from research journals.
  • Page 21: total number of articles published in 2010 seems to be about 258,000 (read off from the graph).
  • Page 22 confirms “>230,000 articles per year”.
  • Page 23, top half, says “>80% of revenue derived from subscriptions, strongly recurring revenues”. Bottom half confirms earlier revenue of 78% for research journals. I suppose that the “subscriptions” amounting to >80% must include database subscriptions.

The other important figure is the proportion of Elsevier journal revenue that comes from Gold OA fees rather than subscriptions. The answer is, almost none. Figures for 2010 are no longer on Elsevier’s Sponsored Articles page, but happily we quoted it in an older SV-POW! post:

691 Elsevier articles across some six hundred journals were sponsored in 2010. Sponsorship revenues from these articles amounted to less than 0.1% of Elsevier’s total revenues.

So for the purposes of these rough-and-ready calculations, we can ignore Elsevier’s Gold-OA revenue completely and assume that all research-journal revenue is from subscriptions.

The data we don’t have

The crucial piece of information we don’t have is this: how much of Elsevier Health Sciences revenue is from journal subscriptions? This information is not included in the investor report, and my attempts to determine it have so far been wholly unsuccessful. Back in March, I contacted Liz Smith (VP/Director of Global Internal Communications), Alicia Wise (Director of Universal Access), Tom Reller (VP of Global Corporate Relations), Ron Mobed (CEO of Scientific & Technical) and Michael Hansen (CEO of Health Sciences). Of these, only Tom Reller got back to me — he was helpful, and pointed me to the investor report that I cite heavily above — but wasn’t able to give me a figure.

If anyone knows the true percentage — or can even narrow the range a bit — I would love to know about it. Please leave a comment.

In the mean time, I will proceed with calculations on two different bases:

  1. That Health Sciences revenue is proportioned the same as Scientific & Technical, i.e. 78% comes from journal subscriptions;
  2. That Health Sciences has no revenue from journal subscriptions. This seems very unrealistic to me, but will at least give us a hard lower bound.

Calculation

It’s pretty simple.

If HS journal-subscription revenue is zero, then Elsevier’s total from journal subscriptions in 2010 was £803M. On the other hand, if HS revenue proportions are about the same as in S&T, then total journal-subscription revenue was twice this, £1606M.

Across the 258,000 or so articles published in 2010, that yields either £803M / 258,000 = £3112 per article, or £1606M / 258,000 = £6224 per article. At current exchange rates, that’s $4816 or $9632. My guess is that the true figure is somewhere between these extremes. If I had to give a single figure, I guess I’d split the difference and go with £4668, which is about $7224.

Remember: this is what it costs the academic world to get access to your article when you give it to an Elsevier journal. Those parts of the academic world that have access, that is — don’t forget that many universities and almost everyone outside a university won’t be able to access it at all.

This is less than my previous estimates. It’s still an awful lot.

Why this matters

Over on Tim Gowers’ blog, he’s recently announced the launch of a new open-access maths journal, Forum of Mathematics, to be published by Cambridge University Press. The new journal will have an article processing fee of £500 after the first three years, during which all fees will be waived. I’ve been shocked at the vehemence with which a lot of commenters have objected to the ideas of any article processing fee.

Here’s the thing. For each maths article that’s sent to an Elsevier journal, costing the worldwide maths community between £3112 and £6224, that same worldwide maths community could instead pay for six to twelve open-access articles in the new journal. And those articles would then be available to anyone who wanted them, not only people affiliated with subscribing institutions.

To me, the purely economic argument for open access is unanswerable. Even if you leave aside the moral argument, the text-mining argument, and so on, you’re left with a very stark financial equation. It’s madness to give research to subscription publishers.

In the middle of February, Times Higher Education ran a piece by Elsevier boycott originator Tim Gowers, entitled Occupy publishing.  A week ago, they published a letter in response, written by Elsevier Senior VP David Clark, under the title If it ain’t broke, don’t bin it, in which he argued that “there is little merit in throwing away a system that works in favour of one that has not even been developed yet”.

Seeing the current journal system, with its arbitrary barriers, economic inefficiencies and distorted perspective on impact, described as “a system that works” was more than I could bear.  So I sent a letter in response, and it’s published in today’s issue as Open, moral and pragmatic.

Space limitations of THE letters meant that I was only able to address one aspect — the economics.  Based on numbers in their own annual report, I show that the cost of each article that Elsevier makes available to subscribers is twelve times the cost of each article that PLoS makes available to the world.  And since Elsevier’s 200,000 articles per year are about a seventh of the total global output, the money paid to Elsevier alone would easily pay for every single paper to be published as open access.  Easily.

No doubt there are errors in some of the numbers, which are necessarily estimates; and the calculation is overly simplistic.  But even allowing for that, there is plenty enough slop in the figures that the conclusion stands.  If we stopped paying Elsevier subscriptions alone — we can keep Wiley, Springer and the rest — the money we save would pay for all our work to be available to the whole world, with hundreds of millions of pounds left over to fund more research.

Worried about the lack of jobs in palaeontology?  Concerned that universities are reducing the number of tenure-track positions?  Disturbed by the elimination of curators and preparators from museums?  We need to cut the inefficient, profiteering publishers out of the loop.

How many open-access papers are getting published these days?  And who’s doing it?  Inspired by a tweet from @labroides (link at the end so as not to give away the punchline), I went looking for numbers.

We’ll start with our old friends Elsevier, since they are the world’s largest academic publisher by volume and by revenue.  One often reads statements such as “Elsevier is committed to Universal Access, Quality and Sustainability … Elsevier wants to enable the broadest possible access to quality research content in sustainable ways that meet our many constituents’ needs” (from their page Elsevier’s position on Access).  Even their submission to the OSTP call for comments begins by saying “One of Elsevier’s primary missions is to work towards providing universal access to high-quality scientific information in sustainable ways. We are committed to providing the broadest possible access to our publications.”

The most important way Elsevier does this is by allowing authors to pay a fee, currently $3000, to “sponsor” their articles, so that they are made freely available to readers (though we still don’t know under what specific licence!).  While that fee is more than twice the $1350 that PLoS ONE charges, it’s comparable to the $2900 PLoS Biology fee and identical to Springer’s $3000 fee.  Elsevier have rather a good policy in connection with their “sponsored article” fee: “Authors can only select this option after receiving notification that their article has been accepted for publication. This prevents a potential conflict of interest where a journal would have a financial incentive to accept an article.”

According to the page linked above, “691 Elsevier articles across some six hundred journals were sponsored in 2010. Sponsorship revenues from these articles amounted to less than 0.1% of Elsevier’s total revenues.”  (And indeed, 691 × $3000 = $2.073 M, which is about 0.065% of their 2010 revenue of £2026 M ≈ $3208 M.)  As Elsevier publishes 2639 journals in all, that amounts to just over a quarter of one open-access article per journal across the year.

I find that disappointing.

In the other corner (I won’t call it red or blue because of the political implications of those colours, which by the way are the opposite way around on different sides of the Atlantic.  Anyway …)  In the other corner, we have PLoS ONE.  According to its Advanced Search engine, this journal alone published 6750 open-access articles in 2010 — about ten times as many as all Elsevier journals combined.  Indeed, in the last month of that year alone, PLoS ONE’s 847 articles comfortably exceeded Elsevier’s output for the year.  That’s one journal, in one month, up against a stable of 2639 journals across a whole year.

What can we take away from this?  Maybe not very much: Elsevier offer their sponsored-article option to all authors, after all, and they can hardly be blamed if the authors don’t take them up on it.

But why don’t they?  Tune in next time for some thoughts on that.

And, finally, here is the tweet that started this line of thought:

@labroides Joshua Drew

@PublicAccessYAY @PLoSOne published more #OA articles in Dec ’10 than ALL of #elsevier‘s journals had the entire year

Food for thought.

Follow

Get every new post delivered to your Inbox.

Join 346 other followers