THIS POST IS RETRACTED. The reasons are explained in the next post. I wish I had never posted this, but you can’t undo what is done, especially on the Internet, so I am not deleting it but marking it as retracted. I suggest you don’t bother reading on, but it’s here if you want to.

 


Neil Brocklehurst, Elsa Panciroli, Gemma Louise Benevento and Roger Benson have a new paper out (Brocklehurst et al. 2021, natch), showing that the post-Cretaceous radiation of modern mammals was not primarily due to the removal of dinosaurs, as everyone assumed, but of more primitive mammal-relatives. Interesting stuff, and it’s open access. Congratulations to everyone involved!

Neil Brocklehurt’s “poster” explaining the new paper in broad detail. From the tweet linked below.

Neil summarised the new paper in a thread of twelve tweets, but it was the last one in the thread that caught my eye:

Thanks to all my co-authors for their tireless work on this, pushing it through eight rounds of review (my personal best)

I’m impressed that Neil has maintained his equanimity about this — in public at least — but if he is not going to be furious about it then we, the community, need to be furious on his behalf. Pushed to explain, Neil laid it out in a further tweet:

Was just one reviewer who really didn’t seem to like certain aspects, esp the use of discrete character matrices. Fair enough, can’t please everyone, but the editor just kept sending it back even when two others said our responses to this reviewer should be fine.

Again, somehow this tweet is free of cursing. He is a better man than I would be in that situation. He also doesn’t call out the reviewer by name, nor the spineless handling editor, which again shows great restraint — though I am not at all sure it’s the right way to go.

There is so, so much to hate about this story:

  • The obstructive peer reviewer, who seems to have to got away with his reputation unblemished by these repeated acts of vandalism. (I’m assuming he was one of the two anonymous reviewers, not the one who identified himself.)
  • The handling editor who had half a dozen opportunities to put an end to the round-and-round, and passed on at least five of them. Do your job! Handle the manuscript! Don’t just keep kicking it back to a reviewer who you know by this stage is not acting in good faith.
  • The failure of the rest of the journal’s editorial board to step in and bring some sanity to the situation.
  • The normalization of this kind of thing — arguably not helped by Neil’s level-headed recounting of the story as though it’s basically reasonable — as someting authors should expect, and just have to put up with.
  • The time wasted: the other research not done while the authors were pithering around back and forth with the hostile reviewer.

It’s the last of these that pains me the most. Of all the comforting lies we tell ourselves about conventionl peer review, the worst is that it’s worth all the extra time and effort because it makes the paper better.

It’s not worth it, is it?

Maybe Brocklehurst et al. 2021 is a bit better for having gone through the 3rd, 4th, 5th, 6th, 7th and 8th rounds of peer review. But if it is, then it’s a marginal difference, and my guess is that in fact it’s no better and no worse that what they submitted after the second round. All that time, they could have been looking at specimens, generating hypotheses, writing descriptions, gathering data, plotting graphs, writing blogs, drafting papers — instead they have been frittering away their time in a pointless and destructive conflict with someone whose only goal was to prevent the advancement of science because an aspect of the paper happened to conflict with a bee he had in his bonnet. We have to stop this waste.

This incident has reinforced my growing conviction that venues like Qeios, Peer Community in Paleontology and BiorXiv (now that it’s moving towards support for reviewing) are the way to go. Our own experience at Qeios has been very good — if it works this well the next time we use it, I think think it’s a keeper. Crucially, I don’t believe our paper (Taylor and Wedel 2021) would have been stronger if it had gone through the traditional peer-review gauntlet; instead, I think it’s stronger than it would have been, because it’s received reviews from more pairs of eyes, and each of them with a constructive approach. Quicker publication, less work for everyone involved, more collegial process, better final result — what’s not to like?

References

Cool URIs don’t change

November 26, 2020

It’s now 22 years since Tim Berners-Lee, inventor of the World Wide Web, wrote the classic document Cool URIs don’t change [1]. It’s core message is simple, and the title summarises it. Once an organization brings a URI into existence, it should keep it working forever. If the document at that URI moves, then the old URI should become a redirect to the new. This really is Web 101 — absolute basics.

So imagine my irritation when I went to point a friend to Matt’s and my 2013 paper on whether neural-spine bifurcation is an ontogenetic character (spoiler: no), only to find that the paper no longer exists.

Wedel and Taylor (2013b: figure 15). An isolated cervical of cf. Diplodocus MOR 790 8-10-96-204 (A) compared to D. carnegii CM 84/94 C5 (B), C9 (C), and C12 (D), all scaled to the same centrum length. Actual centrum lengths are 280 mm, 372 mm, 525 mm, and 627 mm for A-D respectively. MOR 790 8-10-96-204 modified from Woodruff & Fowler (2012: figure 2B), reversed left to right for ease of comparison; D. carnegii vertebrae from Hatcher (1901: plate 3).

Well — it’s not quite that bad. I was able to go to the web-site’s home page, navigate to the relavant volume and issue, and find the new location of our paper. So it does still exist, and I was able to update my online list of publications accordingly.

But seriously — this is a really bad thing to do. How many other links might be out there to our paper? All of them are now broken. Every time someone out there follows a link to a PalArch paper — maybe wondering whether that journal would be a good match for their own work — they are going to run into a 404 that says “We can’t run our website properly and can’t be trusted with your work”.

“But Mike, we need to re-organise our site, and —” Ut! No. Let’s allow Sir Tim to explain:

We just reorganized our website to make it better.

Do you really feel that the old URIs cannot be kept running? If so, you chose them very badly. Think of your new ones so that you will be able to keep then running after the next redesign.

Well, we found we had to move the files…

This is one of the lamest excuses. A lot of people don’t know that servers such as Apache give you a lot of control over a flexible relationship between the URI of an object and where a file which represents it actually is in a file system. Think of the URI space as an abstract space, perfectly organized. Then, make a mapping onto whatever reality you actually use to implement it. Then, tell your server.

If you are a responsible organization, then one of the things you are responsible for is ensuring that you don’t break inbound links. If you want to reorganize, fine — but add the redirects.

And look, I’m sorry, I really don’t want to pick on PalArch, which is an important journal. Our field really needs diamond OA journals: that is, venues where vertebrate paleontology articles are free to read and also free to authors. It’s a community-run journal that is not skimming money out of academia for shareholders, and Matt’s and my experience with their editorial handling was nothing but good. I recommend them, and will proabably publish there again (despite my current irritation). But seriously, folks.

And by the way, there are much worse offenders than PalArch. Remember Aetogate, the plagiarism-and-claim-jumping scandal in New Mexico that the SVP comprehensively fudged its investigation of? The documents that the SVP Ethics Committee produced, such they were, were posted on the SVP website in early 2008, and my blog-post linked to them. By July, they had moved, and I updated my links. By July 2013, they had moved again, and I updated my links again. By October 2015 they had moved for a third time: I both updated my links, and made my own copy in case they vanished. Sure enough, by February 2019 they had gone again — either moved for a fourth time or just quietly discarded. This is atrocious stewardship by the flagship society of our discipline, and they should be heartily ashamed that in 2020, anyone who wants to know what they concluded about the Aetogate affair has to go and find their documents on a third-party blog.

Seriously, people! We need to up our game on this!

Cool URIs don’t change.

 

 


[1] Why is this about URIs instead of URLs? In the end, no reason. Technically, URIs are a broader category than URLs, and include URNs. But since no-one anywhere in the universe has ever used a URN, in practice URL and URI are synonymous; and since TBL wrote his article in 1998, “URL” has clearly won the battle for hearts and minds and “URI” has diminished and gone into the West. If you like, mentally retitle the article “Cool URLs don’t change”.

Here’s an odd thing. Over and over again, when a researcher is mistreated by a journal or publisher, we see them telling their story but redacting the name of the journal or publisher involved. Here are a couple of recent examples.

First, Daniel A. González-Padilla’s experience with a journal engaging in flagrant citation-pumping, but which he declines to name:

Interesting highlight after rejecting a paper I submitted.
Is this even legal/ethical?
EDITOR-IN-CHIEF’S COMMENT REGARDING THE INCLUSION OF REFERENCES TO ARTICLES IN [REDACTED]
Please note that if you wish to submit a manuscript to [REDACTED] in future, we would prefer that you cite at least TWO articles published in our journal WITHIN THE LAST TWO YEARS. This is a polict adopted by several journals in the urology field. Your current article contains only ONE reference to recent articles in [REDACTED].

We know from a subsequent tweet that the journal is published by Springer Nature, but we don’t know the name of the journal itself.

And here is Waheed Imran’s experience of editorial dereliction:

I submitted my manuscript to a journal back in September 2017, and it is rejected by the journal on September 6, 2020. The reason of rejection is “reviewers declined to review”, they just told me this after 3 years, this is how we live with rejections. @AcademicChatter
@PhDForum

My, my question is, why in such situations do we protect the journals in question? In this case, I wrote to Waheed urging him to name the journal, and he replied saying that he will do so once an investigation is complete. But I find myself wondering why we have this tendency to protect guilty journals in the first place?

Thing is, I’ve done this myself. For example, back in 2012, I wrote about having a paper rejected from “a mid-to-low ranked palaeo journal” for what I considered (and still consider) spurious reasons. Why didn’t I name the journal? I’m not really sure. (It was Palaeontologia Electronica, BTW.)

In cases like my unhelpful peer-review, it’s not really a big deal either way. In cases like those mentioned in the tweets above, it’s a much bigger issue, because those (unlike PE) are journals to avoid. Whichever journal sat on a submission for three years before rejecting it because it couldn’t find reviewers is not one that other researchers should waste their time on in the future — but how can they avoid it if they don’t know what journal it is?

So what’s going on? Why do we have this widespread tendency to protect the guilty?

Black Lives Matter

June 9, 2020

Mark Witton says this better than I could:

Like many white folks, I have traditionally assumed that simply not being racist was doing my part, and that the actions of others would eventually convert society at large to seeing race as the non-issue it should be. I have also felt that, as a white, straight male from a middle-class background, my voice would add nothing to this conversation or – worse – be seen as patronising or virtue signalling.

I now realise that this view was incorrect. The fact that people of colour are still fighting against global systemic marginalisation and persecution shows that being non-racist isn’t enough, and that we must be outspokenly anti-racist, even if we have never experienced racial discrimination ourselves. Some may accuse me of jumping on a bandwagon with this. That’s accurate, but I don’t care. This is a wagon we should all be on, and I’m ashamed for not being on-board earlier.

Go and read his post. I endorse it.

And remember: the opposite of Black Lives Matter is not All Lives Matter; it’s Black Lives Don’t Matter. Don’t be That Guy.

In the last post, I catalogued some of the reasons why Scientific Reports, in its cargo-cult attempts to ape print journals such as its stablemate Nature, is an objectively bad journal that removes value from the papers submitted to it: the unnatural shortening that relagates important material into supplementary information, the downplaying of methods, the tiny figures that ram unrelated illustrations into compound images, the pointless abbreviating of author names and journal titles.

This is particularly odd when you consider the prices of the obvious alternative megajournals:

So to have your paper published in Scientific Reports costs 10% more than in PLOS ONE, or 56% more than in PeerJ; and results in an objectively worse product that slices the paper up and dumps chunks of it in the back lot, compresses and combines the illustrations, and messes up the narrative.

So why would anyone choose to publish in it?

Well, the answer is depressingly obvious. As a colleague once expressed it to me “until I have a more stable job I’ll need the highest IFs I can pull off to secure a position somewhere“.

It’s as simple as that. PeerJ‘s impact factor at the time of writing is 2.353; PLOS ONE‘s is ‎2.776; That of Scientic Reports is ‎4.525. And so, it in the idiotic world we live in, it’s better for an author’s career to pay more for a worse version of his article in Scientific Reports than it is to pay less for a better version in PeerJ or PLOS ONE. Because it looks better to have got into Scientific Reports.

BUT WAIT A MINUTE. These three journals are all “megajournals”. They all have the exact same editorial criteria, which is that they accept any paper that is scientifically sound. They make no judgement about novelty, perceived importance or likely significance of the work. They are all completely up front about this. It’s how they work.

In other words, “getting into” Scientific Reports instead of PeerJ says absolutely nothing about the quality of your work, only that you paid a bigger APC.

Can we agree it’s insane that our system rewards researchers for paying a bigger APC to get a less scientifically useful version of their work?

Let me say in closing that I intend absolutely no criticism of Daniel Vidal or his co-authors for placing their Spinophorosaurus posture paper in Scientific Reports. He is playing the ball where it lies. We live, apparently, in a world where spending an extra $675 and accepting a scientifically worse result is good for your career. I can’t criticise Daniel for doing what it takes to get on in that world.

The situation is in every respect analogous to the following: before you attend a job interview, you are told by a respected senior colleague that your chances of getting the post are higher if you are wearing designer clothing. So you take $675 and buy a super-expensive shirt with a prominent label. If you get the job, you’ll consider it as bargain.

But you will never have much respect for the search committee that judged you on such idiotic criteria.

As I was figuring out what I thought about the new paper on sauropod posture (Vidal et al. 2020) I found the paper uncommonly difficult to parse. And I quickly came to realise that this was not due to any failure on the authors’ part, but on the journal it was published in: Nature’s Scientific Reports.

A catalogue of pointless whining

A big part of the problem is that the journal inexplicably insists on moving important parts of the manuscript out of the main paper and into supplementary information. So for example, as I read the paper, I didn’t really know what Vidal et al. meant by describing a sacrum as wedged: did it mean non-parallel anterior and posterior articular surfaces, or just that those surfaces are not at right angles to the long axis of the sacrum? It turns out to be the former, but I only found that out by reading the supplementary information:

The term describes marked trapezoidal shape in the
centrum of a platycoelous vertebrae in lateral view or in the rims of a condyle-cotyle (procoelous or opisthocoelous) centrum type.

This crucial information is nowhere in the paper itself: you could read the whole thing and not understand what the core point of the paper is due to not understanding the key piece of terminology.

And the relegation of important material to second-class, unformatted, maybe un-reviewed supplementary information doesn’t end there, by a long way. The SI includes crucial information, and a lot of it:

  • A terminology section of which “wedged vertebrae” is just one of ten sub-sections, including a crucial discussion of different interpretation of what ONP means.
  • All the information about the actual specimens the work is based on.
  • All the meat of the methods, including how the specimens were digitized, retro-deformed and digitally separated.
  • How the missing forelimbs, so important to the posture, were interpreted.
  • How the virtual skeleton was assembled.
  • How the range of motion of the neck was assessed.
  • Comparisons of the sacra of different sauropods.

And lots more. All this stuff is essential to properly understanding the work that was done and the conclusions that were reached.

And there’s more: as well as the supplementary information, which contains six supplementary figures and three supplementary tables, there is an additonal supplementary supplementary table, which could quite reasonably have gone into the supplementary information.

In a similar vein, even within the highly compressed actual paper, the Materials and Methods are hidden away at the back, after the Results, Discussion and Conclusion — as though they are something to be ashamed of; or, at best, an unwelcome necessity that can’t quite be omitted altogether, but need not be on display.

Then we have the disappointingly small illustrations: even the “full size” version of the crucial Figure 1 (which contains both the full skeleton and callout illustrations of key bones) is only 1000×871 pixels. (That’s why the illustration of the sacrum that I pulled out of the paper for the previous post was so inadequate.)

Compare that with, for example, the 3750×3098 Figure 1 of my own recent Xenoposeidon paper in PeerJ (Taylor 2018) — that has more than thirteen times as much visual information. And the thing is, you can bet that Vidal et al. submitted their illustration in much higher resolution than 1000×871. The journal scaled it down to that size. In 2020. That’s just crazy.

And to make things even worse, unrelated images are shoved into multi-part illustrations. Consider the ridiculousness of figure 2:

Vidal et al. (2020: figure 2). The verticalization of sauropod feeding envelopes. (A) Increased neck range of motion in Spinophorosaurus in the dorso-ventral plane, with the first dorsal vertebra as the vertex and 0° marking the ground. Poses shown: (1) maximum dorsiflexion; (2) highest vertical reach of the head (7.16 m from the ground), with the neck 90° deflected; (3) alert pose sensu Taylor Wedel and Naish13; (4) osteological neutral pose sensu Stevens14; (5) lowest vertical reach of the head (0.72 m from the ground at 0°), with the head as close to the ground without flexing the appendicular elements; (6) maximum ventriflexion. Blue indicates the arc described between maximum and minimum head heights. Grey indicates the arc described between maximum dorsiflexion and ventriflexion. (B) Bivariant plot comparing femur/humerus proportion with sacrum angle. The proportion of humerus and femur are compared as a ratio of femur maximum length/humerus maximum length. Sacrum angle measures the angle the presacral vertebral series are deflected from the caudal series by sacrum geometry in osteologically neutral pose. Measurements and taxa on Table 1. Scale = 1000 mm.

It’s perfectly clear that parts A and B of this figure have nothing to do with each other. It would be far more sensible for them to appear as two separate figures — which would allow part B enough space to convey its point much more clearly. (And would save us from a disconcertingly inflated caption).

And there are other, less important irritants. Authors’ given names not divulged, only initials. I happen to know that D. Vidal is Daniel, and that J. L. Sanz is José Luis Sanz; but I have no idea what the P in P. Mocho, the A in A. Aberasturi or the F in F. Ortega stand for. Journal names in the bibliography are abbreviated, in confusing and sometimes ludicrous ways: is there really any point in abbreviating Palaeogeography Palaeoclimatology Palaeoecology to Palaeogeogr. Palaeoclimatol. Palaeoecol?

The common theme

All of these problems — the unnatural shortening that relagates important material into supplementary information, the downplaying of methods, the tiny figures that ram unrelated illustrations into compound images, even the abbreviating of author names and journal titles — have this in common: that they are aping how Science ‘n’ Nature appear in print.

They present a sort of cargo cult: a superstitious belief that extreme space pressures (such as print journals legitimately wrestle with) are somehow an indicator of quality. The assumption that copying the form of prestigious journals will mean that the content is equally revered.

And this is simply idiotic. Scientific Reports is an open-access web-only journal that has no print edition. It has no rational reason to compress space like a print journal does. In omitting the “aniel” from “Daniel Vidal” it is saving nothing. All it’s doing is landing itself with the limitations of print journals in exchange for nothing. Nothing at all.

Why does this matter?

This squeezing of a web-based journal into a print-sized pot matters because it’s apparent that a tremendous amount of brainwork has gone into Vidal et al.’s research; but much of that is obscured by the glam-chasing presentation of Scientific Reports. It reduces a Pinter play to a soap-opera episode. The work deserved better; and so do readers.

References

 

“But wait, Matt”, I hear you thinking. “Every news agency in the world is tripping over themselves declaring Patagotitan the biggest dinosaur of all time. Why are you going in the other direction?”

Because I’ve been through this a few times now. But mostly because I can friggin’ read.

Maximum dorsal centrum diameter in Argentinosaurus is 60cm (specimen MCF-PVPH-1, Bonaparte and Coria 1993). In Puertasaurus it is also 60cm (MPM 10002, Novas et al. 2005). In Patagotitan it is 59cm (MPEF-PV 3400/5, Carballido et al. 2017). (For more big centra, see this post.)

Femoral midshaft circumference is 118cm in an incomplete femur of Argentinosaurus estimated to be 2.5m long when complete (Mazzetta et al. 2004). A smaller Argentinosaurus femur is 2.25m long with a circumference of 111.4cm (Benson et al. 2014). The largest reported femur of Patagotitan, MPEF-PV 3399/44, is 2.38m long and has a circumference of either 101cm (as reported in the Electronic Supplementary Materials to Carballido et al 2017) or 110cm (as reported in the media in 2014*).

TL;DR: 60>59, and 118>111>110>101, and in both cases Argentinosaurus > Patagotitan, at least a little bit.

Now, Carballido et al (2017) estimated that Patagotitan was sliiiiightly more massive than Argentinosaurus and Puertasaurus by doing a sort of 2D minimum convex hull dorsal vertebra area thingy, which the Patagotitan vertebra “wins” because it has a taller neural spine than either Argentinosaurus or Puertasaurus, and slightly wider transverse processes than Argentinosaurus (138cm vs 128cm) – but way narrower transverse processes than Puertasaurus (138cm vs 168cm). But vertebrae with taller or wider sticky-out bits do not a more massive dinosaur make, otherwise Rebbachisaurus would outweigh Giraffatitan.

Now, in truth, it’s basically a three-way tie between Argentinosaurus, Puertasaurus, and Patagotitan. Given how little we have of the first two, and how large the error bars are on any legit size comparison, there is no real way to tell which of them was the longest or the most massive. Still, to get to the conclusion that Patagotitan was in any sense larger than Argentinosaurus you have to physically drag yourself over the following jaggedly awkward facts:

  1. The weight-bearing parts of the anterior dorsal vertebrae are larger in diameter in both Argentinosaurus and Puertasaurus than in Patagotitan. Very slightly, but still, Patagotitan is the smallest of the three.
  2. The femora of Argentinosaurus are fatter than those of Patagotitan, even at shorter length. The biggest femora of Argentinosaurus are longer, too.

So all of the measurements of body parts that have to do with supporting mass are still larger in Argentinosaurus than in Patagotitan.

Now, it is very cool that we now have a decent chunk of the skeleton of a super-giant titanosaur, instead of little bits and bobs. And it’s nice to know that the numbers reported in the media back in 2014 turned out to be accurate. But Patagotitan is not the “world’s largest dinosaur”. At best, it’s the third-largest contender among near equals.

Parting shot to all the science reporters who didn’t report the same numbers I did here: instead of getting hype-notized by assumption-laden estimates, how about doing an hour’s worth of research making the most obvious possible comparisons?

Almost immediate UPDATE: Okay, that parting shot wasn’t entirely fair. As far as I know, the measurements of Patagotitan were not available until the embargo lifted. Which is in itself odd – if someone claims to have the world’s largest dinosaur, but doesn’t put any measurements in the paper, doesn’t that make your antennae twitch? Either demand some measurements so you can make those obvious comparisons, or approach with extreme skepticism – especially if the “world’s largest dino” claim was pre-debunked three years ago!

* From this article in the Boston Globe:

Paleobiologist Paul Upchurch of University College London believes size estimates are more reliable when extrapolated from the circumference of bones.

He said this femur is a whopping 43.3 inches around, about the same as the Argentinosaurus’ thigh bone.

‘‘Whether or not the new animal really will be the largest sauropod we know remains to be seen,’’ said Upchurch, who was not involved in this discovery but has seen the bones first-hand.

Some prophetically appropriate caution from Paul Upchurch there, who has also lived through a few of these “biggest dinosaur ever” bubbles.

References

The previous post (Every attempt to manage academia makes it worse) has been a surprise hit, and is now by far the most-read post in this blog’s nearly-ten-year history. It evidently struck a chord with a lot of people, and I’ve been surprised — amazed, really — at how nearly unanimously people have agreed with it, both in the comments here and on Twitter.

But I was brought up short by this tweet from Thomas Koenig:

That is the question, isn’t it? Why do we keep doing this?

I don’t know enough about the history of academia to discuss the specific route we took to the place we now find ourselves in. (If others do, I’d be fascinated to hear.) But I think we can fruitfully speculate on the underlying problem.

Let’s start with the famous true story of the Hanoi rat epidemic of 1902. In a town overrun by rats, the authorities tried to reduce the population by offering a bounty on rat tails. Enterprising members of the populace responded by catching live rats, cutting off their tails to collect the bounty, then releasing the rats to breed, so more tails would be available in future. Some people even took to breeding rats for their tails.

Why did this go wrong? For one very simple reason: because the measure optimised for was not the one that mattered. What the authorities wanted to do was reduce the number of rats in Hanoi. For reasons that we will come to shortly, the proxy that they provided an incentive for was the number of rat tails collected. These are not the same thing — optimising for the latter did not help the former.

The badness of the proxy measure applies in two ways.

First, consider those who caught rats, cut their tails off and released them. They stand as counter-examples to the assumption that harvesting a rat-tail is equivalent to killing the rat. The proxy was bad because it assumed a false equivalence. It was possible to satisfy the proxy without advancing the actual goal.

Second, consider those who bred rats for their tails. They stand as counter-examples to the assumption that killing a rat is equivalent to decreasing the total number of live rats. Worse, if the breeders released their de-tailed captive-bred progeny into the city, their harvests of tails not only didn’t represent any decrease in the feral population, they represented an increase. So the proxy was worse than neutral because satisfying it could actively harm the actual goal.

So far, so analogous to the perverse academic incentives we looked at last time. Where this gets really interesting is when we consider why the Hanoi authorities chose such a terribly counter-productive proxy for their real goal. Recall their object was to reduce the feral rat population. There were two problems with that goal.

First, the feral rat population is hard to measure. It’s so much easier to measure the number of tails people hand in. A metric is seductive if it’s easy to measure. In the same way, it’s appealing to look for your dropped car-keys under the street-lamp, where the light is good, rather than over in the darkness where you dropped them. But it’s equally futile.

Second — and this is crucial — it’s hard to properly reward people for reducing the feral rat population because you can’t tell who has done what. If an upstanding citizen leaves poison in the sewers and kills a thousand rats, there’s no way to know what he has achieved, and to reward him for it. The rat-tail proxy is appealing because it’s easy to reward.

The application of all this to academia is pretty obvious.

First the things we really care about are hard to measure. The reason we do science — or, at least, the reason societies fund science — is to achieve breakthroughs that benefit society. That means important new insights, findings that enable new technology, ways of creating new medicines, and so on. But all these things take time to happen. It’s difficult to look at what a lab is doing now and say “Yes, this will yield valuable results in twenty years”. Yet that may be what is required: trying to evaluate it using a proxy of how many papers it gets into high-IF journals this year will most certainly mitigate against its doing careful work with long-term goals.

Second we have no good way to reward the right individuals or labs. What we as a society care about is the advance of science as a whole. We want to reward the people and groups whose work contributes to the global project of science — but those are not necessarily the people who have found ways to shine under the present system of rewards: publishing lots of papers, shooting for the high-IF journals, skimping on sample-sizes to get spectacular results, searching through big data-sets for whatever correlations they can find, and so on.

In fact, when a scientist who is optimising for what gets rewarded slices up a study into multiple small papers, each with a single sensational result, and shops them around Science and Nature, all they are really doing is breeding rats.

If we want people to stop behaving this way, we need to stop rewarding them for it. (Side-effect: when people are rewarded for bad behaviour, people who behave well get penalised, lose heart, and leave the field. They lose out, and so does society.)

Q. “Well, that’s great, Mike. What do you suggest?”

A. Ah, ha ha, I’d been hoping you wouldn’t bring that up.

No-will be surprised to hear that I don’t have a silver bullet. But I think the place to start is by being very aware of the pitfalls of the kinds of metrics that managers (including us, when wearing certain hats) like to use. Managers want metrics that are easy to calculate, easy to understand, and quick to yield a value. That’s why articles are judged by the impact factor of the journal they appear in: the calculation of the article’s worth is easy (copy the journal’s IF out of Wikipedia); it’s easy to understand (or, at least, it’s easy for people to think they understand what an IF is); and best of all, it’s available immediately. No need for any of that tedious waiting around five years to see how often the article is cited, or waiting ten years to see what impact it has on the development of the field.

Wise managers (and again, that means us when wearing certain hats) will face up to the unwelcome fact that metrics with these desirable properties are almost always worse than useless. Coming up with better metrics, if we’re determined to use metrics at all, is real work and will require an enormous educational effort.

One thing we can usefully do, whenever considering a proposed metric, is actively consider how it can and will be hacked. Black-hat it. Invest a day imagining you are a rational, selfish researcher in a regimen that uses the metric, and plan how you’re going to exploit it to give yourself the best possible score. Now consider whether the course of action you mapped out is one that will benefit the field and society. If not, dump the metric and start again.

Q. “Are you saying we should get rid of metrics completely?”

A. Not yet; but I’m open to the possibility.

Given metrics’ terrible track-record of hackability, I think we’re now at the stage where the null hypothesis should be that any metric will make things worse. There may well be exceptions, but the burden of proof should be on those who want to use them: they must show that they will help, not just assume that they will.

And what if we find that every metric makes things worse? Then the only rational thing to do would be not to use any metrics at all. Some managers will hate this, because their jobs depend on putting numbers into boxes and adding them up. But we’re talking about the progress of research to benefit society, here.

We have to go where the evidence leads. Dammit, Jim, we’re scientists.

I’ve been on Twitter since April 2011 — nearly six years. A few weeks ago, for the first time, something I tweeted broke the thousand-retweets barrier. And I am really unhappy about it. For two reasons.

First, it’s not my own content — it’s a screen-shot of Table 1 from Edwards and Roy (2017):

c49rdmlweaaa4if

And second, it’s so darned depressing.

The problem is a well-known one, and indeed one we have discussed here before: as soon as you try to measure how well people are doing, they will switch to optimising for whatever you’re measuring, rather than putting their best efforts into actually doing good work.

In fact, this phenomenon is so very well known and understood that it’s been given at least three different names by different people:

  • Goodhart’s Law is most succinct: “When a measure becomes a target, it ceases to be a good measure.”
  • Campbell’s Law is the most explicit: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
  • The Cobra Effect refers to the way that measures taken to improve a situation can directly make it worse.

As I say, this is well known. There’s even a term for it in social theory: reflexivity. And yet we persist in doing idiot things that can only possibly have this result:

  • Assessing school-teachers on the improvement their kids show in tests between the start and end of the year (which obviously results in their doing all they can depress the start-of-year tests).
  • Assessing researchers by the number of their papers (which can only result in slicing into minimal publishable units).
  • Assessing them — heaven help us — on the impact factors of the journals their papers appear in (which feeds the brand-name fetish that is crippling scholarly communication).
  • Assessing researchers on whether their experiments are “successful”, i.e. whether they find statistically significant results (which inevitably results in p-hacking and HARKing).

What’s the solution, then?

I’ve been reading the excellent blog of economist Tim Harford, for a while. That arose from reading his even more excellent book The Undercover Economist (Harford 2007), which gave me a crash-course in the basics of how economies work, how markets help, how they can go wrong, and much more. I really can’t say enough good things about this book: it’s one of those that I feel everyone should read, because the issues are so important and pervasive, and Harford’s explanations are so clear.

In a recent post, Why central bankers shouldn’t have skin in the game, he makes this point:

The basic principle for any incentive scheme is this: can you measure everything that matters? If you can’t, then high-powered financial incentives will simply produce short-sightedness, narrow-mindedness or outright fraud. If a job is complex, multifaceted and involves subtle trade-offs, the best approach is to hire good people, pay them the going rate and tell them to do the job to the best of their ability.

I think that last part is pretty much how academia used to be run a few decades ago. Now I don’t want to get all misty-eyed and rose-tinted and nostalgic — especially since I wasn’t even involved in academia back then, and don’t know from experience what it was like. But could it be … could it possibly be … that the best way to get good research and publications out of scholars is to hire good people, pay them the going rate and tell them to do the job to the best of their ability?

[Read on to Why do we manage academia so badly?]

References

Bonus

Here is a nicely formatted full-page version of the Edwards and Roy table, for you to print out and stick on all the walls of your university. My thanks to David Roberts for preparing it.

It’s now been widely discussed that Jeffrey Beall’s list of predatory and questionable open-access publishers — Beall’s List for short — has suddenly and abruptly gone away. No-one really knows why, but there are rumblings that he has been hit with a legal threat that he doesn’t want to defend.

To get this out of the way: it’s always a bad thing when legal threats make information quietly disappear; to that extent, at least, Beall has my sympathy.

That said — over all, I think making Beall’s List was probably not a good thing to do in the first place, being an essentially negative approach, as opposed to DOAJ’s more constructive whitelisting approach. But under Beall’s sole stewardship it was a disaster, due to his well-known ideological opposition to all open access. So I think it’s a net win that the list is gone.

But, more than that, I would prefer that it not be replaced.

Researchers need to learn the very very basic research skills required to tell a real journal from a fake one. Giving them a blacklist or a whitelist only conceals the real issue, which is that you need those skills if you’re going to be a researcher.

Finally, and I’m sorry if this is harsh, I have very little sympathy with anyone who is caught by a predatory journal. Why would you be so stupid? How can you expect to have a future as a researcher if your critical thinking skills are that lame? Think Check Submit is all the guidance that anyone needs; and frankly much more than people really need.

Here is the only thing you need to know, in order to avoid predatory journals, whether open-access or subscription-based: if you are not already familiar with a journal — because it’s published research you respect, or colleagues who you respect have published in it or are on the editorial board — then do not submit your work to that journal.

It really is that simple.

So what should we do now Beall’s List has gone? Nothing. Don’t replace it. Just teach researchers how to do research. (And supervisors who are not doing that already are not doing their jobs.)