March 22, 2017
The previous post (Every attempt to manage academia makes it worse) has been a surprise hit, and is now by far the most-read post in this blog’s nearly-ten-year history. It evidently struck a chord with a lot of people, and I’ve been surprised — amazed, really — at how nearly unanimously people have agreed with it, both in the comments here and on Twitter.
But I was brought up short by this tweet from Thomas Koenig:
That is the question, isn’t it? Why do we keep doing this?
I don’t know enough about the history of academia to discuss the specific route we took to the place we now find ourselves in. (If others do, I’d be fascinated to hear.) But I think we can fruitfully speculate on the underlying problem.
Let’s start with the famous true story of the Hanoi rat epidemic of 1902. In a town overrun by rats, the authorities tried to reduce the population by offering a bounty on rat tails. Enterprising members of the populace responded by catching live rats, cutting off their tails to collect the bounty, then releasing the rats to breed, so more tails would be available in future. Some people even took to breeding rats for their tails.
Why did this go wrong? For one very simple reason: because the measure optimised for was not the one that mattered. What the authorities wanted to do was reduce the number of rats in Hanoi. For reasons that we will come to shortly, the proxy that they provided an incentive for was the number of rat tails collected. These are not the same thing — optimising for the latter did not help the former.
The badness of the proxy measure applies in two ways.
First, consider those who caught rats, cut their tails off and released them. They stand as counter-examples to the assumption that harvesting a rat-tail is equivalent to killing the rat. The proxy was bad because it assumed a false equivalence. It was possible to satisfy the proxy without advancing the actual goal.
Second, consider those who bred rats for their tails. They stand as counter-examples to the assumption that killing a rat is equivalent to decreasing the total number of live rats. Worse, if the breeders released their de-tailed captive-bred progeny into the city, their harvests of tails not only didn’t represent any decrease in the feral population, they represented an increase. So the proxy was worse than neutral because satisfying it could actively harm the actual goal.
So far, so analogous to the perverse academic incentives we looked at last time. Where this gets really interesting is when we consider why the Hanoi authorities chose such a terribly counter-productive proxy for their real goal. Recall their object was to reduce the feral rat population. There were two problems with that goal.
First, the feral rat population is hard to measure. It’s so much easier to measure the number of tails people hand in. A metric is seductive if it’s easy to measure. In the same way, it’s appealing to look for your dropped car-keys under the street-lamp, where the light is good, rather than over in the darkness where you dropped them. But it’s equally futile.
Second — and this is crucial — it’s hard to properly reward people for reducing the feral rat population because you can’t tell who has done what. If an upstanding citizen leaves poison in the sewers and kills a thousand rats, there’s no way to know what he has achieved, and to reward him for it. The rat-tail proxy is appealing because it’s easy to reward.
The application of all this to academia is pretty obvious.
First the things we really care about are hard to measure. The reason we do science — or, at least, the reason societies fund science — is to achieve breakthroughs that benefit society. That means important new insights, findings that enable new technology, ways of creating new medicines, and so on. But all these things take time to happen. It’s difficult to look at what a lab is doing now and say “Yes, this will yield valuable results in twenty years”. Yet that may be what is required: trying to evaluate it using a proxy of how many papers it gets into high-IF journals this year will most certainly mitigate against its doing careful work with long-term goals.
Second we have no good way to reward the right individuals or labs. What we as a society care about is the advance of science as a whole. We want to reward the people and groups whose work contributes to the global project of science — but those are not necessarily the people who have found ways to shine under the present system of rewards: publishing lots of papers, shooting for the high-IF journals, skimping on sample-sizes to get spectacular results, searching through big data-sets for whatever correlations they can find, and so on.
In fact, when a scientist who is optimising for what gets rewarded slices up a study into multiple small papers, each with a single sensational result, and shops them around Science and Nature, all they are really doing is breeding rats.
If we want people to stop behaving this way, we need to stop rewarding them for it. (Side-effect: when people are rewarded for bad behaviour, people who behave well get penalised, lose heart, and leave the field. They lose out, and so does society.)
Q. “Well, that’s great, Mike. What do you suggest?”
A. Ah, ha ha, I’d been hoping you wouldn’t bring that up.
No-will be surprised to hear that I don’t have a silver bullet. But I think the place to start is by being very aware of the pitfalls of the kinds of metrics that managers (including us, when wearing certain hats) like to use. Managers want metrics that are easy to calculate, easy to understand, and quick to yield a value. That’s why articles are judged by the impact factor of the journal they appear in: the calculation of the article’s worth is easy (copy the journal’s IF out of Wikipedia); it’s easy to understand (or, at least, it’s easy for people to think they understand what an IF is); and best of all, it’s available immediately. No need for any of that tedious waiting around five years to see how often the article is cited, or waiting ten years to see what impact it has on the development of the field.
Wise managers (and again, that means us when wearing certain hats) will face up to the unwelcome fact that metrics with these desirable properties are almost always worse than useless. Coming up with better metrics, if we’re determined to use metrics at all, is real work and will require an enormous educational effort.
One thing we can usefully do, whenever considering a proposed metric, is actively consider how it can and will be hacked. Black-hat it. Invest a day imagining you are a rational, selfish researcher in a regimen that uses the metric, and plan how you’re going to exploit it to give yourself the best possible score. Now consider whether the course of action you mapped out is one that will benefit the field and society. If not, dump the metric and start again.
Q. “Are you saying we should get rid of metrics completely?”
A. Not yet; but I’m open to the possibility.
Given metrics’ terrible track-record of hackability, I think we’re now at the stage where the null hypothesis should be that any metric will make things worse. There may well be exceptions, but the burden of proof should be on those who want to use them: they must show that they will help, not just assume that they will.
And what if we find that every metric makes things worse? Then the only rational thing to do would be not to use any metrics at all. Some managers will hate this, because their jobs depend on putting numbers into boxes and adding them up. But we’re talking about the progress of research to benefit society, here.
We have to go where the evidence leads. Dammit, Jim, we’re scientists.
March 17, 2017
I’ve been on Twitter since April 2011 — nearly six years. A few weeks ago, for the first time, something I tweeted broke the thousand-retweets barrier. And I am really unhappy about it. For two reasons.
First, it’s not my own content — it’s a screen-shot of Table 1 from Edwards and Roy (2017):
And second, it’s so darned depressing.
The problem is a well-known one, and indeed one we have discussed here before: as soon as you try to measure how well people are doing, they will switch to optimising for whatever you’re measuring, rather than putting their best efforts into actually doing good work.
In fact, this phenomenon is so very well known and understood that it’s been given at least three different names by different people:
- Goodhart’s Law is most succinct: “When a measure becomes a target, it ceases to be a good measure.”
- Campbell’s Law is the most explicit: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
- The Cobra Effect refers to the way that measures taken to improve a situation can directly make it worse.
As I say, this is well known. There’s even a term for it in social theory: reflexivity. And yet we persist in doing idiot things that can only possibly have this result:
- Assessing school-teachers on the improvement their kids show in tests between the start and end of the year (which obviously results in their doing all they can depress the start-of-year tests).
- Assessing researchers by the number of their papers (which can only result in slicing into minimal publishable units).
- Assessing them — heaven help us — on the impact factors of the journals their papers appear in (which feeds the brand-name fetish that is crippling scholarly communication).
- Assessing researchers on whether their experiments are “successful”, i.e. whether they find statistically significant results (which inevitably results in p-hacking and HARKing).
What’s the solution, then?
I’ve been reading the excellent blog of economist Tim Harford, for a while. That arose from reading his even more excellent book The Undercover Economist (Harford 2007), which gave me a crash-course in the basics of how economies work, how markets help, how they can go wrong, and much more. I really can’t say enough good things about this book: it’s one of those that I feel everyone should read, because the issues are so important and pervasive, and Harford’s explanations are so clear.
In a recent post, Why central bankers shouldn’t have skin in the game, he makes this point:
The basic principle for any incentive scheme is this: can you measure everything that matters? If you can’t, then high-powered financial incentives will simply produce short-sightedness, narrow-mindedness or outright fraud. If a job is complex, multifaceted and involves subtle trade-offs, the best approach is to hire good people, pay them the going rate and tell them to do the job to the best of their ability.
I think that last part is pretty much how academia used to be run a few decades ago. Now I don’t want to get all misty-eyed and rose-tinted and nostalgic — especially since I wasn’t even involved in academia back then, and don’t know from experience what it was like. But could it be … could it possibly be … that the best way to get good research and publications out of scholars is to hire good people, pay them the going rate and tell them to do the job to the best of their ability?
[Read on to Why do we manage academia so badly?]
- Edwards, Marc A., and Siddhartha Roy. 2017. Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition. Environmental Engineering Science 34(1):51-61.
- Harford, Tim. 2007. The Undercover Economist. Abacus (Little, Brown). 384 pages. [Amazon US, Amazon UK]
Here is a nicely formatted full-page version of the Edwards and Roy table, for you to print out and stick on all the walls of your university. My thanks to David Roberts for preparing it.
January 26, 2017
It’s now been widely discussed that Jeffrey Beall’s list of predatory and questionable open-access publishers — Beall’s List for short — has suddenly and abruptly gone away. No-one really knows why, but there are rumblings that he has been hit with a legal threat that he doesn’t want to defend.
To get this out of the way: it’s always a bad thing when legal threats make information quietly disappear; to that extent, at least, Beall has my sympathy.
That said — over all, I think making Beall’s List was probably not a good thing to do in the first place, being an essentially negative approach, as opposed to DOAJ’s more constructive whitelisting approach. But under Beall’s sole stewardship it was a disaster, due to his well-known ideological opposition to all open access. So I think it’s a net win that the list is gone.
But, more than that, I would prefer that it not be replaced.
Researchers need to learn the very very basic research skills required to tell a real journal from a fake one. Giving them a blacklist or a whitelist only conceals the real issue, which is that you need those skills if you’re going to be a researcher.
Finally, and I’m sorry if this is harsh, I have very little sympathy with anyone who is caught by a predatory journal. Why would you be so stupid? How can you expect to have a future as a researcher if your critical thinking skills are that lame? Think Check Submit is all the guidance that anyone needs; and frankly much more than people really need.
Here is the only thing you need to know, in order to avoid predatory journals, whether open-access or subscription-based: if you are not already familiar with a journal — because it’s published research you respect, or colleagues who you respect have published in it or are on the editorial board — then do not submit your work to that journal.
It really is that simple.
So what should we do now Beall’s List has gone? Nothing. Don’t replace it. Just teach researchers how to do research. (And supervisors who are not doing that already are not doing their jobs.)
March 11, 2014
I hate to keep flogging a dead horse, but since this issue won’t go away I guess I can’t, either.
1. Two years ago, I wrote about how you have to pay to download Elsevier’s “open access” articles. I showed how their open-access articles claimed “all rights reserved”, and how when you use the site’s facilities to ask about giving one electronic copy to a student, the price is £10.88. As I summarised at the time: “Free” means “we take the author’s copyright, all rights are reserved, but you can buy downloads at a 45% discount from what they would otherwise cost.” No-one from Elsevier commented.
2. Eight months ago, Peter Murray-Rust explained that Elsevier charges to read #openaccess articles. He showed how all three of the randomly selected open-access articles he looked at had download fees of $31.50. No-one from Elsevier commented (although see below).
3. A couple of days ago, Peter revisited this issue, and found that Elsevier are still charging THOUSANDS of pounds for CC-BY articles. IMMORAL, UNETHICAL , maybe even ILLEGAL.This time he picked another Elsevier OA article at random, and was quoted £8000 for permission to print 100 copies. The one he looked at says “Open Access” in gold at the top and “All rights reserved” at the bottom. Its “Get rights and content” link takes me to RightsLink, where I was quoted £1.66 to supply a single electronic copy to a student on a course at the University of Bristol:
(Why was I quoted a wildly different price from Peter? I don’t know. Could be to do with the different university, or because he proposed printing copies instead of using an electronic one.)
On Peter’s last article, an Elsevier representative commented:
Alicia Wise says:
March 10, 2014 at 4:20 pm
As noted in the comment thread to your blog back in August we are improving the clarity of our OA license labelling (eg on ScienceDirect) and metadata feeds (eg to Rightslink). This is work in progress and should be completed by summer. I am working with the internal team to get a more clear understanding of the detailed plan and key milestones, and will tweet about these in due course.
With kind wishes,
Dr Alicia Wise
Director of Access and Policy
(Oddly, I don’t see the referenced comment in the August blog-entry, but perhaps it was on a different article.)
Now here is my problem with this.
First of all, either this is deliberate fraud on Elsevier’s part — charging for the use of something that is free to use — or it’s a bug. Following Hanlon’s razor, I prefer the latter explanation. But assuming it’s a bug, why has it taken two years to address? And why is it still not fixed?
Elsevier, remember, are a company with an annual revenue exceeding £2bn. That’s £2,000,000,000. (Rather pathetically, their site’s link to the most recent annual report is broken, but that’s a different bug for a different day.) Is it unreasonable to expect that two years should be long enough for them to fix a trivial bug?
All that’s necessary is to change the “All rights reserved” message and the “Get rights and content” link to say “This is an open-access article, and is free to re-use”. We know that the necessary metadata is there because of the “Open Access” caption at the top of the article. So speaking from my perspective as a professional software developer of more than thirty years’ standing, this seems like a ten-line fix that should take maybe a man-hour; at most a man-day. A man-day of programmer time would cost Elsevier maybe £500 — that is, 0.000025% of the revenue they’ve taken since this bug was reported two years ago. Is it really too much to ask?
(One can hardly help comparing this performance with that of PeerJ, who have maybe a ten-thousandth of Elsevier’s income and resources. When I reported three bugs to them in a course of a couple of days, they fixed them all with an average report-to-fix time of less than 21 hours.)
Now here’s where it turns sinister.
The PeerJ bugs I mentioned above cost them — not money, directly, but a certain amount of reputation. By fixing them quickly, they fixed that reputation damage (and indeed gained reputation by responding so quickly). By contrast, the Elsevier bug we’re discussing here doesn’t cost them anything. It makes them money, by misleading people into paying for permissions that they already have. In short, not fixing this bug is making money for Elsevier. It’s hard not to wonder: would it have remained unfixed for two years if it was costing them money?
But instead of a rush to fix the bug, we have this kind of thing:
I find that very hard to accept. However complex your publishing platform is, however many different modules interoperate, however much legacy code there is — it’s not that hard to take the conditional that emits “Open Access” in gold at the top of the article, and make the same test in the other relevant places.
As John Mark Ockerbloom observes:
Come on, Elsevier. You’re better than this. Step up. Get this done.
Ten days layer, Elsevier have finally responded. To give credit where it’s due, it’s actually pretty good: it notes how many customers made payments they needn’t have made (about 50), how much they paid in total (about $4000) and says that they are actively refunding these payments.
It would be have been nice, mind you, had this statement contained an actual apology: the words “sorry”, “regret” and “apologise” are all notably absent.
And I remain baffled that the answer to “So when will this all be reliable?” is “by the summer of 2014”. As noted above, the pages in question already have the information that the articles are open access, as noted in the gold “Open Access” text at top right of the pages. Why it’s going to take several more months to use that information elsewhere in the same pages is a mystery to me.
As noted by Alicia in a comment below, Elsevier employee Chris Shillum has posted a long comment on Elsevier’s response, explaining in more detail what the technical issues are. Unfortunately there seems to be no way to link directly to the comment, but it’s the fifth one.
December 13, 2013
It’s now widely understood among researchers that the impact factor (IF) is a statistically illiterate measure of the quality of a paper. Unfortunately, it’s not yet universally understood among administrators, who in many places continue to judge authors on the impact factors of the journals they publish in. They presumably do this on the assumption that impact factor is a proxy for, or predictor of, citation count, which is turn is assumed to correlate with influence.
As shown by Lozano et al. (2012), the correlation between IF and citations is in fact very weak — r2 is about 0.2 — and has been progressively weakening since the dawn of the Internet era and the consequent decoupling of papers from the physical journal that they appear in. This is a counter-intuitive finding: given that the impact factor is calculated from citation counts you’d expect it to correlate much more strongly. But the enormous skew of citation rates towards a few big winners renders the average used by the IF meaningless.
To bring this home, I plotted my own personal impact-factor/citation-count graph. I used Google Scholar’s citation counts of my articles, which recognises 17 of my papers; then I looked up the impact factors of the venues they appeared in, plotted citation count against impact factor, and calculated a best-fit line through my data-points. Here’s the result (taken from a slide in my Berlin 11 satellite conference talk):
I was delighted to see that the regression slope is actually negative: in my case at least, the higher the impact factor of the venue I publish in, the fewer citations I get.
There are a few things worth unpacking on that graph.
First, note the proud cluster on the left margin: publications in venues with impact factor zero (i.e. no impact factor at all). These include papers in new journals like PeerJ, in perfectly respectable established journals like PaleoBios, edited-volume chapters, papers in conference proceedings, and an arXiv preprint.
My most-cited paper, by some distance, is Head and neck posture in sauropod dinosaurs inferred from extant animals (Taylor et al. 2009, a collaboration between all three SV-POW!sketeers). That appeared in Acta Palaeontologia Polonica, a very well-respected journal in the palaeontology community but which has a modest impact factor of 1.58.
My next most-cited paper, the Brachiosaurus revision (Taylor 2009), is in the Journal of Vertebrate Palaeontology — unquestionably the flagship journal of our discipline, despite its also unspectacular impact factor of 2.21. (For what it’s worth, I seem to recall it was about half that when my paper came out.)
In fact, none of my publications have appeared in venues with an impact factor greater than 2.21, with one trifling exception. That is what Andy Farke, Matt and I ironically refer to as our Nature monograph (Farke et al. 2009). It’s a 250-word letter to the editor on the subject of the Open Dinosaur Project. (It’ a subject that we now find profoundly embarrassing given how dreadfully slowly the project has progressed.)
Google Scholar says that our Nature note has been cited just once. But the truth is even better: that one citation is in fact from an in-prep manuscript that Google has dug up prematurely — one that we ourselves put on Google Docs, as part of the slooow progress of the Open Dinosaur Project. Remove that, and our Nature note has been cited exactly zero times. I am very proud of that record, and will try to preserve it by persuading Andy and Matt to remove the citation from the in-prep paper before we submit. (And please, folks: don’t spoil my record by citing it in your own work!)
What does all this mean? Admittedly, not much. It’s anecdote rather than data, and I’m posting it more because it amuses me than because it’s particularly persuasive. In fact if you remove the anomalous data point that is our Nature monograph, the slope becomes positive — although it’s basically meaningless, given that all my publications cluster in the 0–2.21 range. But then that’s the point: pretty much any data based on impact factors is meaningless.
- Farke, Andrew A., Michael P. Taylor and Mathew J. Wedel. 2009. Sharing: public databases combat mistrust and secrecy. Nature 461:1053.
- Lozano, George A., Vincent Larivière and Yves Gingras. 2012. The weakening relationship between the impact factor and papers’ citations in the digital age. Journal of the American Society for Information Science and Technology 63(11):2140-2145. doi:10.1002/asi.22731 [arXiv preprint]
- Taylor, Michael P. 2009. A re-evaluation of Brachiosaurus altithorax Riggs 1903 (Dinosauria, Sauropoda) and its generic separation from Giraffatitan brancai (Janensch 1914). Journal of Vertebrae Paleontology 29(3):787-806.
- Taylor, Michael P., Mathew J. Wedel and Darren Naish. 2009. Head and neck posture in sauropod dinosaurs inferred from extant animals. Acta Palaeontologica Polonica 54(2):213-230.
September 20, 2013
I was astonished yesterday to read Understanding and addressing research misconduct, written by Linda Lavelle, Elsevier’s General Counsel, and apparently a specialist in publication ethics:
While uncredited text constitutes copyright infringement (plagiarism) in most cases, it is not copyright infringement to use the ideas of another. The amount of text that constitutes plagiarism versus ‘fair use’ is also uncertain — under the copyright law, this is a multi-prong test.
So here (right in the first paragraph of Lavelle’s article) we see copyright infringement equated with plagiarism. And then, for good measure, the confusion is hammered home by the depiction of fair use (a defence against accusations of copyright violation) depicted as a defence against accusations of plagiarism.
This is flatly wrong. Plagiarism and copyright violation are not the same thing. Not even close.
First, plagiarism is a violation of academic norms but not illegal; copyright violation is illegal, but in truth pretty ubiquitous in academia. (Where did you get that PDF?)
Second, plagiarism is an offence against the author, while copyright violation is an offence against the copyright holder. In traditional academic publishing, they are usually not the same person, due to the ubiquity of copyright transfer agreements (CTAs).
Third, plagiarism applies when ideas are copied, whereas copyright violation occurs only when a specific fixed expression (e.g. sequence of words) is copied.
Fourth, avoiding plagiarism is about properly apportioning intellectual credit, whereas copyright is about maintaining revenue streams.
Let’s consider four cases (with good outcomes is green and bad ones in red):
- I copy big chunks of Jeff Wilson’s (2002) sauropod phylogeny paper (which is copyright the Linnean Society of London) and paste it into my own new paper without attribution. This is both plagiarism against Wilson and copyright violation against the Linnean Society.
- I copy big chunks of Wilson’s paper and paste it into mine, attributing it to him. This is not plagiarism, but copyright violation against the Linnean Society.
- I copy big chunks of Rigg’s (1904) Brachiosaurus monograph (which is out of copyright and in the public domain) into my own new paper without attribution. This is plagiarism against Riggs, but not copyright violation.
- I copy big chunks of Rigg’s paper and paste it into mine with attribution. This is neither plagiarism nor copyright violation.
Plagiarism is about the failure to properly attribute the authorship of copied material (whether copies of ideas or of text or images). Copyright violation is about failure to pay for the use of the material.
Which of the two issues you care more about will depend on whether you’re in a situation where intellectual credit or money is more important — in other words, whether you’re an author or a copyright holder. For this reason, researchers tend to care deeply when someone plagiarises their work but to be perfectly happy for people to violate copyright by distributing copies of their papers. Whereas publishers, who have no authorship contribution to defend, care deeply about copyright violation.
One of the great things about the Creative Commons Attribution Licence (CC By) is that it effectively makes plagiarism illegal. It requires that attribution be maintained as a condition of the licence; so if attribution is absent, the licence does not pertain; which means the plagiariser’s use of the work is not covered by it. And that means it’s copyright violation. It’s a neat bit of legal ju-jitsu.
- Riggs, Elmer S. 1904. Structure and relationships of opisthocoelian dinosaurs. Part II, the Brachiosauridae. Field Columbian Museum, Geological Series 2:229-247, plus plates LXXI-LXXV.
- Wilson, Jeffrey A. 2002. Sauropod dinosaur phylogeny: critique and cladistic analysis. Zoological Journal of the Linnean Society 136:217-276.