Correlation of impact factor and citations: a personal case-study
December 13, 2013
It’s now widely understood among researchers that the impact factor (IF) is a statistically illiterate measure of the quality of a paper. Unfortunately, it’s not yet universally understood among administrators, who in many places continue to judge authors on the impact factors of the journals they publish in. They presumably do this on the assumption that impact factor is a proxy for, or predictor of, citation count, which is turn is assumed to correlate with influence.
As shown by Lozano et al. (2012), the correlation between IF and citations is in fact very weak — r2 is about 0.2 — and has been progressively weakening since the dawn of the Internet era and the consequent decoupling of papers from the physical journal that they appear in. This is a counter-intuitive finding: given that the impact factor is calculated from citation counts you’d expect it to correlate much more strongly. But the enormous skew of citation rates towards a few big winners renders the average used by the IF meaningless.
To bring this home, I plotted my own personal impact-factor/citation-count graph. I used Google Scholar’s citation counts of my articles, which recognises 17 of my papers; then I looked up the impact factors of the venues they appeared in, plotted citation count against impact factor, and calculated a best-fit line through my data-points. Here’s the result (taken from a slide in my Berlin 11 satellite conference talk):
I was delighted to see that the regression slope is actually negative: in my case at least, the higher the impact factor of the venue I publish in, the fewer citations I get.
There are a few things worth unpacking on that graph.
First, note the proud cluster on the left margin: publications in venues with impact factor zero (i.e. no impact factor at all). These include papers in new journals like PeerJ, in perfectly respectable established journals like PaleoBios, edited-volume chapters, papers in conference proceedings, and an arXiv preprint.
My most-cited paper, by some distance, is Head and neck posture in sauropod dinosaurs inferred from extant animals (Taylor et al. 2009, a collaboration between all three SV-POW!sketeers). That appeared in Acta Palaeontologia Polonica, a very well-respected journal in the palaeontology community but which has a modest impact factor of 1.58.
My next most-cited paper, the Brachiosaurus revision (Taylor 2009), is in the Journal of Vertebrate Palaeontology — unquestionably the flagship journal of our discipline, despite its also unspectacular impact factor of 2.21. (For what it’s worth, I seem to recall it was about half that when my paper came out.)
In fact, none of my publications have appeared in venues with an impact factor greater than 2.21, with one trifling exception. That is what Andy Farke, Matt and I ironically refer to as our Nature monograph (Farke et al. 2009). It’s a 250-word letter to the editor on the subject of the Open Dinosaur Project. (It’ a subject that we now find profoundly embarrassing given how dreadfully slowly the project has progressed.)
Google Scholar says that our Nature note has been cited just once. But the truth is even better: that one citation is in fact from an in-prep manuscript that Google has dug up prematurely — one that we ourselves put on Google Docs, as part of the slooow progress of the Open Dinosaur Project. Remove that, and our Nature note has been cited exactly zero times. I am very proud of that record, and will try to preserve it by persuading Andy and Matt to remove the citation from the in-prep paper before we submit. (And please, folks: don’t spoil my record by citing it in your own work!)
What does all this mean? Admittedly, not much. It’s anecdote rather than data, and I’m posting it more because it amuses me than because it’s particularly persuasive. In fact if you remove the anomalous data point that is our Nature monograph, the slope becomes positive — although it’s basically meaningless, given that all my publications cluster in the 0–2.21 range. But then that’s the point: pretty much any data based on impact factors is meaningless.
- Farke, Andrew A., Michael P. Taylor and Mathew J. Wedel. 2009. Sharing: public databases combat mistrust and secrecy. Nature 461:1053.
- Lozano, George A., Vincent Larivière and Yves Gingras. 2012. The weakening relationship between the impact factor and papers’ citations in the digital age. Journal of the American Society for Information Science and Technology 63(11):2140-2145. doi:10.1002/asi.22731 [arXiv preprint]
- Taylor, Michael P. 2009. A re-evaluation of Brachiosaurus altithorax Riggs 1903 (Dinosauria, Sauropoda) and its generic separation from Giraffatitan brancai (Janensch 1914). Journal of Vertebrae Paleontology 29(3):787-806.
- Taylor, Michael P., Mathew J. Wedel and Darren Naish. 2009. Head and neck posture in sauropod dinosaurs inferred from extant animals. Acta Palaeontologica Polonica 54(2):213-230.