When peer-review goes bad … really bad.
May 19, 2021
THIS POST IS RETRACTED. The reasons are explained in the next post. I wish I had never posted this, but you can’t undo what is done, especially on the Internet, so I am not deleting it but marking it as retracted. I suggest you don’t bother reading on, but it’s here if you want to.
Neil Brocklehurst, Elsa Panciroli, Gemma Louise Benevento and Roger Benson have a new paper out (Brocklehurst et al. 2021, natch), showing that the post-Cretaceous radiation of modern mammals was not primarily due to the removal of dinosaurs, as everyone assumed, but of more primitive mammal-relatives. Interesting stuff, and it’s open access. Congratulations to everyone involved!

Neil summarised the new paper in a thread of twelve tweets, but it was the last one in the thread that caught my eye:
Thanks to all my co-authors for their tireless work on this, pushing it through eight rounds of review (my personal best)
I’m impressed that Neil has maintained his equanimity about this — in public at least — but if he is not going to be furious about it then we, the community, need to be furious on his behalf. Pushed to explain, Neil laid it out in a further tweet:
Was just one reviewer who really didn’t seem to like certain aspects, esp the use of discrete character matrices. Fair enough, can’t please everyone, but the editor just kept sending it back even when two others said our responses to this reviewer should be fine.
Again, somehow this tweet is free of cursing. He is a better man than I would be in that situation. He also doesn’t call out the reviewer by name, nor the spineless handling editor, which again shows great restraint — though I am not at all sure it’s the right way to go.
There is so, so much to hate about this story:
- The obstructive peer reviewer, who seems to have to got away with his reputation unblemished by these repeated acts of vandalism. (I’m assuming he was one of the two anonymous reviewers, not the one who identified himself.)
- The handling editor who had half a dozen opportunities to put an end to the round-and-round, and passed on at least five of them. Do your job! Handle the manuscript! Don’t just keep kicking it back to a reviewer who you know by this stage is not acting in good faith.
- The failure of the rest of the journal’s editorial board to step in and bring some sanity to the situation.
- The normalization of this kind of thing — arguably not helped by Neil’s level-headed recounting of the story as though it’s basically reasonable — as someting authors should expect, and just have to put up with.
- The time wasted: the other research not done while the authors were pithering around back and forth with the hostile reviewer.
It’s the last of these that pains me the most. Of all the comforting lies we tell ourselves about conventionl peer review, the worst is that it’s worth all the extra time and effort because it makes the paper better.
It’s not worth it, is it?
Maybe Brocklehurst et al. 2021 is a bit better for having gone through the 3rd, 4th, 5th, 6th, 7th and 8th rounds of peer review. But if it is, then it’s a marginal difference, and my guess is that in fact it’s no better and no worse that what they submitted after the second round. All that time, they could have been looking at specimens, generating hypotheses, writing descriptions, gathering data, plotting graphs, writing blogs, drafting papers — instead they have been frittering away their time in a pointless and destructive conflict with someone whose only goal was to prevent the advancement of science because an aspect of the paper happened to conflict with a bee he had in his bonnet. We have to stop this waste.
This incident has reinforced my growing conviction that venues like Qeios, Peer Community in Paleontology and BiorXiv (now that it’s moving towards support for reviewing) are the way to go. Our own experience at Qeios has been very good — if it works this well the next time we use it, I think think it’s a keeper. Crucially, I don’t believe our paper (Taylor and Wedel 2021) would have been stronger if it had gone through the traditional peer-review gauntlet; instead, I think it’s stronger than it would have been, because it’s received reviews from more pairs of eyes, and each of them with a constructive approach. Quicker publication, less work for everyone involved, more collegial process, better final result — what’s not to like?
References
- Brocklehurst, Neil, Elsa Panciroli, Gemma Louise Benevento and Roger Benson. 2021. Mammaliaform extinctions as a driver of the morphological radiation of Cenozoic mammals. Current Biology. doi:10.1016/j.cub.2021.04.044
- Taylor, Michael P., and Mathew J. Wedel. 2021. Why is vertebral pneumaticity in sauropod dinosaurs so variable? Qeios 1G6J3Q. doi:10.32388/1G6J3Q
May 19, 2021 at 11:39 pm
For those of us in the peanut gallery, is there anything to objectively not like about discrete character matrices? E.g. tendency to false positives or false negatives, so you like to see an independent confirmation? Or is it just a method that an older reviewer would not have used in grad school, so just wants to see something more familiar used instead? Or, what?
Have to agree that ultimately the editor is at fault, and they should have published elsewhere instead. There is no merit in wasting good papers on propping up bad journals.
May 19, 2021 at 11:45 pm
I don’t know enough about the methods to have an opinion on whether there really are drawbacks to using discrete characters — hopefully Neil will pop up here and comment. But given that the continuous data this reviewer presumably wanted simply isn’t out there, it strikes me as a fatuous complaint. If someone wants to come along later, re-examine all the specimens to score them for continuous characters, and redo the analysis with that presumably better data-set, fine — let ’em. But the possibility of that one day maybe happening is no reason to delay the publication of the present paper.
May 20, 2021 at 4:58 am
So in reply to Nathan, there are a fair number of people who do not like the use of discrete character matrices for use in analyses of morphological evolution. Some objections raised include: that they are intended to resolve phylogenetic relationships, not to represent ecologically or functionally relevant traits; they include characters to resolve a specific set of taxa rather than an overarching survey of the morphology of a clade; they don’t include autapomorphies; and they discretise continuous traits sometimes in a somewhat arbitrary way.
Now I obs don’t agree with these objections, or at least don’t agree that they are fatal, but I do understand people’s objections and I don’t begrudge the reviewer their opinion. I think the main problem was the reviewer was basically using the review as an opportunity to debate these points rather than make sure we acknowledge them, and then the editor didn’t seem to want to put a stop to it
May 20, 2021 at 8:32 am
Hi Mike,
Thanks for writing about out paper on your blog! And I can see your point of view regarding the peer review process, to some extent. It’s obviously that authors will find it frustrating when referees given them lots of commentary and ask them to re-think aspects of their approach, and how they’ve written about it. That’s just human nature.
However… I have a different perspective on this, and I think it is useful to voice that, becuase everyone should be thinking about the costs and benefits of the peer review process.
Over the years I’ve had lots of papers reviewed by all sorts of different people. I’ve always tried to suggest referees who i think will be critical, without worrying that they might give me a hard time in review. Sometimes this has meant that we had to do a lot more work than expected when the paper was first submitted! That’s the cost.
The benefits, however, have been enormous. Being exposed to a different point of view is a great thing! The review that you’ve discussed in your blog – and various others that I recieved over the years – was hugely thought-provoking (and sure, at time frustrating, but we ‘got over ourselves’ in the end). The referee challenged us to really think about what we were saying, and how we should interpret patterns in the data. Sure, we may not always have agreed with all aspects of it. But there was a lot of merit in what was said. In consequence, the paper is much better than it would have been. So in the end, we have a lot to say thanks for. All other things aside – no referee is ‘alone’. There will be others out there who share their view. And our job as authors is not simply to get papers published. it is to convince a wide and inclusive audience. So we need to use this valuable informaton from the peer review process as evidence of what people think then they read the words that we wrote.
So for me, it’s a big thumbs up to the referee, who did an excellent job of hauling us over the coals on this occasion. And the paper is greatly improved for it, and perhaps more likely to stand the test of time (but we’ll see – it’s possible to be wrong even with the best of intentions).
Thanks,
Roger
May 20, 2021 at 8:34 am
That’s very gracious, Roger. I completely accept that the first round a review materially improved your paper. My question to you is whether the eighth round did, too?
May 20, 2021 at 8:54 am
So I am going to comment again, having now read the full blog post, and having had my attention drawn by the reviewer to the academic vandalism comment: I do not consider the reviewer committed an act of “academic vandalism”. As I said in my above comment and on Twitter, I do not begrudge the reviewer different opinions on certain aspects of the paper. While I cannot deny that I found the review process for this paper frustrating, the reviewer is perfectly entitled to disagree with us, and I don’t think at any point he was deliberately holding the paper up with bad faith arguments. He was clearly making a genuine effort with the reviews (after all, while I mention having to create 9 drafts of the paper, he had to read it eight times and and provide detailed comments, and he definitely did not shirk his duty). I am genuinely sorry that his work was described like that.
May 20, 2021 at 9:27 am
The eighth round (which was the ?fourth round for that particular referee – and note: I think Neil meant ‘eight submissions’ rather than ‘eight rounds of review) caused us to include some simulations. I think they were an improvement because they put bounds on how wrong you might expect us to be based on a particular issue about uneven sampling of characters.
Accounting for unevenness in sampling is something I approve of. So from my perspective that was an improvement.
Anyway, it’s a complicated paper in some ways that makes various different claims. When you add those claims up it’s not surprising perhaps that we ended up looking at it so many times. For some types of papers of course I can see this perspective that the first round of review is the ‘main’ one. That doesn’t apply in exactly the same way to this specific paper.
Of course – we’re all human and at times we were frustrated. but never for very long. That’s just to say that I’m not being just gracious. I genuinely feel that the paper is much better as a result of the rounds of review.
May 20, 2021 at 9:37 am
I should also say – I forgot to state this explicitly.
Just because referees as challenging, doesn’t automatically that they’re being unfair (Mike – I think you’d agree with that, but this comment is for others out there who might be reading this thread).
Receiving challenging reviews is fundamental to the growth of good scientists. It certainly has been formative for me, and I’m sure that it also has for others out there.
We can’t be expected to get it right first time, after all. And no-one can see all the angles and possibilities on their own.
May 20, 2021 at 9:37 am
Thanks, Neil and Roger: I have added a preface note to the blog-post pointing readers to your comments.
May 20, 2021 at 9:47 am
Roger: yes, of course, not all challenging reviews are unfair! Having been on the end of both kinds — tough but fair and (less often!) tough and obstructive — I certainly recognise the difference.
May 20, 2021 at 9:48 am
I completely agree regarding the misuse of peer review here, and you have my sympathies for that.
Regarding the paper itself though, isn’t the conclusion basically guaranteed based on the way you divided the cladogram into three groups? Your first group which keeps up the high morphological dissimilarity with increasing patristic distance (the character steps between taxa) is all of the non-theriimorph mammaliform clades that diverged back in the Triassic and Early Jurassic (docodonts, stem-monotremes, gondwanatheres and their ancestors, etc.). Your second group which has the medium slope is basically just the stem of a single clade from that early time period (theriimorphs), which itself sprouts some successful clades (eutriconodonts, multituberculates, dryolestids, etc.) in the Middle Jurassic to Early Cretaceous that often last to the K-T boundary. Then your final group with the most constrained evolution is just one of those theriimorph groups emerging in the Early Cretaceous, the therians.
It’s like comparing (Group 1) non-sauropod dinosaurs, (Group 2) non-titanosaurian sauropods, and (Group 3) titanosaurs. Of course titanosaurs will be most constrained because you’re just dealing with one clade that diverged comparatively recently. They’re going to be convergently evolving the same characters a lot, while the abelisaurs, nodosaurids, hadrosaurs and troodontids in Group 1 aren’t going to be evolving the same characters as each other as often because they have bodyplans which have been diverging since the Early Jurassic.
But I bet if you compared representative samples of groups with similar temporal history, like australosphenids, dryolestids or gondwanatheres to therians, the latter wouldn’t be significantly more constrained. You almost did this when checking the effect of sample size when you compared haramyids with eutherians, but the former goes back to the Triassic so would be expected to be less constrained by default.
May 20, 2021 at 2:09 pm
Hi Mickey, Thanks for your comment
In answer to your point, I have three things to highlight. First and most crucially, the method accounts for the issue you raise. The key aspect isn’t just measuring observed morphological diversity (which would depend on how long the group has been around), it’s inferring whether the group has reached the actual limit of the morphospace available to it. This is assessed, not by whether one group has more morphologies than another, but by whether further evolution is continuing to produce new morphologies or whether its producing the same morphologies over and over again by convergence. In the graphs we show in the figures what we are looking at is whether further increases along the X axis (‘amount of evolution’) is still producing increases along the y axis (morphological diversity) or whether the curve has reached an asymptote (further evolution produces no new morphologies).
So in the example you give, yes, titanosaurs may not have accumulated so many different morphologies cos they’ve not been around so long, but we would see this represented by the fact that their curve has not yet reached an asymptote; evolution is still producing new morphologies, so they have not yet reached the limits of their constraint. However, Mesozoic crown therians, in spite of having not been around very long, have already reached character saturation. Their curve has reached an asymptote; further evolution is failing to produce new morphologies, but is instead producing the same traits over and over again by convergence. And this limit of constraint is at a much lower level than in seen in stem and non therians
Next, we account for this in another way: via the null simulations. Under equal rates of character evolution across the tree, how much morphological diversity would we expect each group to accumulate? Is the observed morphological diversity more or less than this? This is the basis our statements of significant strengthening or relaxation of constraint. And Mesozoic crown therians are found to be significantly constrained relative to stem and non therians
Finally, we do subdivide each group by divergence time (figure 2) so we’re able to compare crown therians to stem and non-therians that diverged within the same amount of time as crown theirans.
Hope this answers your question
May 20, 2021 at 4:41 pm
Nothing new here ideas that are outside the current paradigm of a field get rejected a lot. For a 100 year old example in geology, consider Harlan Bretz and the glacial lake missoula floods. Bretz presented a paper in 1927 but because it involved process appear catastrophic was rejected on grounds that it was catastrophist. 40 years later his theory was accepted, and Bretz did at least live to see his theory accepted as valid.
So as the structure of scientific revolutions point out if you propose an idea that filies in the face of the current paradigm, that appears at that time to be more or less working you can expect the idea to be rejected.
(See Wegner and continental drift for another example)
May 20, 2021 at 11:19 pm
As the reviewer accused in this blog post of obstruction, hostility, and vandalism, I figure it’s worth commenting here. I’ve emailed Mike (and Neil and his co-authors) separately with additional comments. I appreciate that Mike has added the disclaimer at the top of this post, and I thank Neil and Roger for clarifying this situation in their comments. I strongly agree with Roger’s comments on the benefits of peer-review. That said, I believe it’s still worth pointing out specific statements in this blog post that are false or misleading:
1) Contrary to Mike’s assumption, I signed all reviews. I did not hide behind anonymity in order to maintain an “unblemished reputation.” I gave honest, constructive reviews that I strongly stand behind.
2) Contrary to the implication of Mike’s comments (and Neil’s comment above), I only reviewed 4 full drafts (and was asked to comment on some specific revisions of another draft), and one of those was for a separate journal other than Current Biology. Thus, I did NOT review 8 drafts of the paper, and I don’t believe that 8 or 9 drafts were submitted to Current Biology, unless I was not asked to review some drafts. (Earlier drafts that I did not review were probably submitted to other journals.) I would never review 8 drafts of a manuscript for one journal.
3) Contrary to Mike’s comments, I was acting in good faith – my reviews were tough, but they were honest assessments. I believe that each review helped the authors to improve the paper. See comments above from Roger and Neil. It is extremely frustrating to put considerable time and thought into reviews, and then see those reviews described as “repeated acts of vandalism.” I hope that in the future Mike (and others) will refrain from jumping to conclusions about reviewers based on very limited information.
4) Contrary to Mike’s comments, the goal of my reviews was not to obstruct the paper from publication. Prior to sending the manuscript out for one final review, the editor contacted me to ask what I thought of the authors’ latest changes, and my email response included: “Not surprisingly, I continue to disagree with the authors on some points. But I appreciate that they continue to make changes based on my comments, and I know that the other reviewers may disagree with me, so I understand if you decide to accept their manuscript at this point.” (sent March 4, 2021). In my final review, I remained critical of some aspects of the paper, but I concluded my review with only two relatively minor suggestions.
I understand that repeated critiques can be frustrating for authors, and the editor certainly could have done a better job of handling the paper. But the actual situation in this case is a far cry from what is described in this blog post, and this situation should not be used as an example of “peer-review gone bad”.
Dave Grossnickle
May 21, 2021 at 9:12 am
[…] days ago, I wrote about what seemed to be an instance of peer review gone very wrong. I’ve now heard from two of the four authors of the paper and from the reviewer in question […]