The R2R debate, part 5: what I actually think
May 1, 2020
I’ve written four posts about the R2R debate on the proposition “the venue of its publication tells us nothing useful about the quality of a paper”:
- part 1: opening statement in support
- part 2: opening statement against the motion
- part 3: my response for the motion
- part 4: the video!
A debate of this kind is partly intended to persuade and inform, but is primarily entertainment — and so it’s necessary to stick to the position you’ve been assigned. But I don’t mind admitting, once the votes have been counted, that the statement goes a bit further than I would go in real life.
It took me a while to figure out exactly what I did think about the proposition, and the process of the debate was helpful in getting me the point where I felt able to articulate it clearly. Here is where I landed shortly after the debate:
The venue of its publication can tell us something useful about a paper’s quality; but the quality of publication venues is not correlated with their prestige (or Impact Factor).
I’m fairly happy with this formulation: and in fact, on revisiting my speech in support of the original proposition, it’s apparent that I was really speaking in support of this modified version. I make no secret of the fact that I think some journals are objectively better than others; but that those with higher impact factors are often worse, not better.
What are the things that make a journal good? Here are a few:
- Coherent narrative order, with methods preceding results.
- All relevant information in one place, not split between a main document and a supplement.
- Explicit methods.
- Large, clear illustrations that can be downloaded at full resolution as prepared by the authors.
- All data available, including specimen photos, 3D models, etc.
- Open peer review: availability of the full history of submissions, reviews, editorial responses, rebuttal letters, etc.
- Well designed experiment capable of replication.
- Honesty (i.e. no fabicated or cherry-picked) data.
- Sample sizes big enough to show real statistical effect.
- Realistic assessment of the significance of the work.
And the more I look at such lists, the more I realise that that these quality indicators appear less often in “prestige” venues such as Science, Nature and Cell than they do in good, honest, working journals like PeerJ, Acta Palaeontologica Polonica or even our old friend the Journal of Vertebrate Paleontology. (Note: I am aware that the replication and statistical power criteria listed above generally don’t apply directly to vertebrate palaeontology papers.)
So where are we left?
I think — and I admit that I find this surprising — the upshot is this:
The venue of its publication can tell us something useful about a paper’s quality; but the quality of publication venues is inversely correlated with their prestige (or Impact Factor).
I honestly didn’t see that coming.
The R2R debate, part 4: the video!
April 7, 2020
It’s been a while, but to be fair the world has caught fire since I first started posting about the Research to Reader conference. Stay safe, folks. Don’t meet people. Stay indoors; or go outdoors where there’s no-one else. You know how it’s done by now. This is not a drill.
Anyway — I am delighted to announce that the R2R conference has now made available the video of the debate — as part of a playlist that is slowly filling up with videos of all the conference’s sessions and workshops.
So here it is!
Here’s how the timeline breaks down:
- 0:18 — Mark Carden (pre-introduction)
- 0:46 — Rick Anderson (introduction and initial vote)
- 5:12 — Toby Green (proposing the motion)
- 15:50 — Pippa Smart (opposing the motion)
- 25:01 — Mike Taylor (responding for the motion)
- 28:31 — Niall Boyce (responding for the opposition)
- 31:34 — discussion
- 32:09 — Tasha Mellins-Cohen; response from Pippa
- 33:20 — Anthony Watkinson; Pippa
- 35:15 — Catriona McCallum; Niall
- 39:19 — anonymous online question; Mike
- 39:56 — anonymous online question; Mike; Niall; Toby; Pippa; Mike
- 46:27 — Robert Harrington; Mike; Toby
- 47:38 — Kaveh Bazargan; Niall; Mike; Niall; Pippa; Mike; Pippa
- 52:30 — Jennifer Smith; Pippa; Mike
- 58:32 — Rick Anderson (wrap up and final vote)
- 1:00:45 — Mark Carden (closing remarks)
A notable quality of the discussion that makes up the second half of this hour is that the two teams become gradually more concilatory as it progresses.
Anyway, enjoy! And let us know whether you found the argument for or against the proposition compelling!
The R2R debate, part 3: my response for the motion
February 29, 2020
The Researcher to Reader (R2R) conference at the start of this week featured a debate on the proposition “The venue of its publication tells us nothing useful about the quality of a paper”. I’ve already posted Toby Green’s opening statement for the proposition and Pippa Smart’s opening statement against it.
Now here is my (shorter) response in favour of the motion, which is supposed to be a response specifically to Pippa’s opening sttement against. As with Toby’s piece, I mistimed mine and ran into my (rather niggardly) three-minute limit, so I didn’t quite get to the end. But here’s the whole thing.

Here I am giving a talk on the subject “Should science always be open” back at ESOF 2014. (I don’t have any photos of me at the R2R debate, so this is the closest thing I could find.)
Like the Brexit debate, this is one where it’s going to be difficult to shift people’s opinions. Most of you will have come here already entrenched on one side or other of this issue. Unlike the Brexit debate, I hope this is one where evidence will be effective.
And that, really, is the issue. All of our intuition tells us, as our colleagues have argued, that high-prestige journals carry intrinsically better papers, or at least more highly cited ones — but the actual data tells us that this is not true: that papers in these journals are no more statistically powerful, and more prone to be inflated or even fraudulent. In the last few days, news has broken of a “paper mill” that has successfully seen more than 400 fake papers pass peer-review at reputable mainstream publishers despite having absolutely no underlying data. Evidently the venue of its publication tells us nothing useful about the quality of a paper.
It is nevertheless true that many scientists, especially early career researchers, spend an inordinate proportion of their time and effort desperately trying to get their work into Science and Nature, slicing and dicing substantial projects into the sparsely illustrated extended-abstract format that these journals demand, in the belief that this will help their careers. Worse, it is also true that they are often correct: publications in these venues do help careers. But that is not because of any inherent quality in the papers published there, which in many cases are of lower quality than they would have been in a different journal. Witness the many two-page descriptions of new dinosaurs that merit hundred-page monographic treatments — which they would have got in less flashy but more serious journals like PLOS ONE.
If we are scientists, or indeed humanities scholars, then we have to respect evidence ahead of our preconceptions. And once you start looking for actual data about the quality of papers in different venues, you find that there is a lot of it — and more emerging all the time. Only two days ago I heard of a new preprint by Carneiro at el. It defines an “overall reporting score”, which it describes as “an objective dimension of quality that is readily measurable [as] completeness of reporting”. When they plotted this score against the impact factor of journals they found no correlation.
We don’t expect this kind of result, so we are in danger of writing it off — just as Brexiteers write off stories about economic damage and companies moving out of Britain as “project fear”. The challenge for us is to do what Daily Mail readers perhaps can’t: to rise above our preconceptions, and to view the evidence about our publishing regimen with the same rigour and objectivity that we view the evidence in our own specialist fields.
Different journals certainly do have useful roles: as Toby explained in his opening statement, they can guide us to articles that are relevant to our subject area, pertain to our geographical area, or relate to the work of a society of interest. What they can’t guide us to is intrinsically better papers.
In The Adventure of the Copper Beeches, Arthur Conan Doyle tells us that Sherlock Holmes cries out “Data! Data! Data! I can’t make bricks without clay.” And yet in our attempts to understand the scholarly publishing system that we all interact with so extensively, we all too easily ignore the clay that is readily to hand. We can, and must do better.
And what does the data say? It tells us clearly, consistently and unambiguously that the venue of its publication tells us nothing useful about the quality of a paper.
References
- Carneiro, Clarissa F. D., Victor G. S. Queiroz, Thiago C. Moulin, Carlos A. M. Carvalho, Clarissa B. Haas, Danielle Rayêe, David E. Henshall, Evandro A. De-Souza, Felippe Espinelli, Flávia Z. Boos, Gerson D. Guercio, Igor R. Costa, Karina L. Hajdu, Martin Modrák, Pedro B. Tan, Steven J. Burgess, Sylvia F. S. Guerra, Vanessa T. Bortoluzzi, Olavo B. Amaral. Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature. bioRxiv 581892. doi:10.1101/581892
The R2R debate, part 2: opening statement against the motion
February 28, 2020
Yesterday I told you all about the Researcher to Reader (R2R) conference and its debate on the proposition “The venue of its publication tells us nothing useful about the quality of a paper”. I posted the opening statement for the proposition, which was co-written by Toby Green and me.
Now here is the opening statement against the proposition, presented by Pippa Smart of Learned Publishing, having been co-written by her and Niall Boyce of The Lancet Psychiatry.
(I’m sure it goes without saying that there is much in here that I disagree with. But I will let Pippa speak for herself and Niall without interruption for now, and discuss her argument in a later post.)

The debate in progress. I couldn’t find a photo of Pippa giving her opening statement, so here instead is her team-mate Niall giving his closing statement.
The proposal is that all articles or papers, from any venue, must be evaluated on their own merit and the venue of publication gives me no indicator of quality. We disagree with this assertion. To start our argument we’d like to ask, what is quality? Good quality research provides evidence that is robust, ethical, stands up to scrutiny and adheres to accepted principles of professionalism, transparency, accountability and auditability. These features not only apply to the underlying research, but also to the presentation. In addition, quality includes an element of relevance and timeliness which will make an article useful or not. And finally, quality is about standards and consistency – for example requiring authors to assert that they are all authors according to the ICMJE guidelines.
And once we agree what constitutes quality, the next question is what quality assurance do the different venues place on their content? There is a lot of content out there. Currently there are 110,479,348 DOIs registered, and the 2018 STM report states that article growth is in the region of 5% per annum with over three million articles published each year. And of course, articles can be published anywhere. In addition to journal articles, they can appear on preprint servers, on personal blogs, and social networking sites. Each different venue places its own quality standards on what they publish. Authors usually only place their “good stuff” on their personal sites, reputable journals only include items that have passed their quality assurance standards including peer review. Preprint archives only include materials that pass their criteria for inclusion.
Currently there are about 23,000 articles on bioRxiv, of which approximately a third will not be published (according to Kent Anderson’s research). This may be due to quality problems, or perhaps the authors never sought publication. So they may or may not be “quality” to me – I’d have to read every one to check. Of the two thirds that are published, they are likely to have been revised after peer review, changing the original article that exists on bioRxiv (perhaps extra experiments or reanalysis), so again, I would have to read and compare every version on bioRxiv and in the final journal to check its usefulness and quality.
A reputable journal promises me that what it publishes is of some value to the community that it serves by applying a level of independent validation. We therefore argue that the venue does provide important information about the quality of what they publish, and in particular that the journal model imposes some order on the chaos of available information. Journal selectivity answers the most basic question: “Is this worth bothering with?”
What would I have to do if I believed that the venue of publication tells me nothing useful about their publications? I could use my own judgement to check the quality of everything that has been published, but there are two problems with this: (1) I don’t have time to read every article, and (2) surely it is better to have the judgement of several people (reviewers and editors) rather than simply relying on my own bias and ability to mis-read an article.
What do journals do to make us trust their quality assurance?
1. Peer review – The use of independent experts may be flawed but it still provides a safety net that is able to discover problems. High impact journals find it somewhat easier to obtain reviews from reputable scientists. A friend of mine who works in biomedical research says that she expects to spend about two hours per article reviewing — unless it is for Nature in which case she would spend longe, about 4–5 hours on each article, and do more checking. Assuming she is not the only reviewer to take this position, it follows that Nature articles come under a higher level of pre-publication scrutiny than some other journals.
2. Editorial judgement – Editors select for the vision and mission of their journal, providing a measure of relevance and quality for their readers. For example, at Learned Publishing we are interested in articles about peer review research. But we are not interested in articles which simply describe what peer review is: this is too simplistic for our audience and would be viewed as a poor quality article. In another journal it might be useful to their community and be viewed as a high quality article. At the Lancet, in-house editors check accepted articles — checking their data and removing inflated claims of importance — adding an extra level of quality assurance for their community.
3. Corrections – Good journals correct the scholarly record with errata and retractions. And high impact journals have higher rates of retraction caused by greater visibility and scrutiny, which can be assumed to result in a “cleaner” list of publications than in journals which receive less attention — therefore making their overall content more trustworthy because it is regularly evaluated and corrected.
And quality becomes a virtuous circle. High impact journals attract more authors keen to publish in them, which allows for more selectivity — choosing only the best, most relevant and most impactful science, rather than having to accept poorer quality (smaller studies for example) to fill the issues.
So we believe that journals do provide order out of the information tsunami, and a stamp of quality assurance for their own communities. Editorial judgement attempts to find the sweet spot: both topical, and good quality research which is then moderated so that minor findings are not made to appear revolutionary. The combination of peer review and editorial judgement work together to filter content, to select only articles that are useful to their community, and to moderate excessive claims. We don’t assume that all journals get it right all the time. But some sort of quality control is surely better than none. The psychiatrist Winnicott came up with the idea of the “good enough” mother. We propose that there is a “good enough” editorial process that means readers can use these editorially-approved articles to make clinical, professional or research decisions. Of course, not every journal delivers the same level of quality assurance. Therefore there are journals I trust more than others to publish good quality – the venue of the publication informs me so that I can make a judgement about the likelihood of usefulness.
In summary, we believe that it is wrong to say that the venue tells us nothing useful about the quality of research. Unfiltered venues tell us that there is no guarantee of quality. Filtered venues tell us that there is some guarantee of reasonable quality. Filtered venues that I trust (because they have a good reputation in my community) tell me that the quality of their content is likely to match my expectations for validity, ethical standards, topicality, integrity, relevance and usefulness.
The R2R debate, part 1: opening statement in support
February 27, 2020
This Monday and Tuesday, I was at the R2R (Researcher to Reader) conference at BMA House in London. It’s the first time I’ve been to this, and I was there at the invitation of my old sparring partner Rick Anderson, who was organizing this year’s debate, on the proposition “The venue of its publication tells us nothing useful about the quality of a paper”.
I was one half of the team arguing in favour of the proposition, along with Toby Green, currently managing director at Coherent Digital and prevously head of publishing at the OECD for twenty years. Our opponents were Pippa Smart, publishing consultant and editor of Learned Publishing; and Niall Boyce, editor of The Lancet Psychiatry.
I’m going to blog three of the four statements that were made. (The fourth, that of Niall Boyce, is not available, as he spoke from handwritten notes.) I’ll finish this series with a fourth post summarising how the debate went, and discussing what I now think about the proposition.
But now, here is the opening statement for the proposition, co-written by Toby and me, and delivered by him.

The backs of the heads of the four R2R debaters as we watch the initial polling on the proposition. From left to right: me, Toby, Pippa, Niall.
What is the most significant piece of published research in recent history? One strong candidate is a paper called “Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children” published in 1998. It was written by Andrew Wakefield et al., and postulated a link between the MMR vaccine and autism. This article became the launching point for the anti-vax movement, which has resulted in (among other things) 142,000 deaths from measles in 2019 alone. It has also contributed to the general decline of trust in expertise and the rise of fake news.
This article is now recognised as “not just poor science, [but] outright fraud” (BMJ). It was eventually retracted — but it did take its venue of publication 12 years to do so. Where did it appear? In The Lancet, one of the world’s most established and prestigious medical journals, its prestige quantified by a stellar Impact Factor of 59.1.
How could such a terrible paper be published by such a respected journal? Because the venue of its publication tells us nothing useful about the quality of a paper.
Retractions from prestigious venues are not restricted to rogues like Wakefield. Last month, Nobel Prize winner Frances Arnold said she was “bummed” to have to retract her 2019 paper on enzymatic synthesis of beta-lactams because the results were not reproducible. “Careful examination of the first author’s lab notebook then revealed missing contemporaneous entries and raw data for key experiments.” she explained. I.e. “oops, we prepared the paper sloppily, sooorry!”
Prof. Arnold is the first woman to be elected to all three National Academies in the USA and has been lauded by institutions as diverse as the White House, BBC and the Vatican. She even appeared as herself in the TV series, Big Bang Theory. She received widespread praise for being so open about having to retract this work — yet what does it say of the paper’s venue of publication, Science? Plainly the quality of this paper was not in the least assured by its venue of publication. Or to put it another way, the venue of its publication tells us nothing useful about the quality of a paper.
If we’re going to talk about high- and low-prestige venues, we’ll need a ranking system of some sort. The obvious ranking system is the Impact Factor — which, as Clarivate says “can be used to provide a gross approximation of the prestige of journals”. Love it or hate it, the IF has become ubiquitous, and we will reluctantly use it here as a proxy for journal prestige.
So, then: what does “quality” really mean for a research paper? And how does it relate to journal prestige?
One answer would be that a paper’s quality is to do with its methodological soundness: adherence to best practices that make its findings reliable and reproducible. One important aspect of this is statistical power: are enough observations made, and are the correlations significant enough and strong enough for the results to carry weight? We would hope that all reputable journals would consider this crucially important. Yet Brembs et al. (2013) found no association between statistical power and journal impact factor. So it seems the venue of its publication tells us nothing useful about the quality of a paper.
Or perhaps we can define “quality” operationally, something like how frequently a paper is cited — more being good, less being less good, right?. Astonishingly, given that Impact Factor is derived from citation counts, Lozano et al. (2012) showed that citation count of an individual paper is correlated only very weakly with the Impact Factor of the journal it’s published in — and that correlation has been growing yet weaker since 1990, as the rise of the WWW has made discovery of papers easier irrespective of their venue. In other words, the venue of its publication tells us nothing useful about the quality of a paper.
We might at this point ask ourselves whether there is any measurable aspect of individual papers that correlates strongly with the Impact Factor of the journal they appear in. There is: Fang et al. (2012) showed that Impact Factor has a highly significant correlation with the number of retractions for fraud or suspected fraud. Wakefield’s paper has been cited 3336 times — did the Lancet know what it was doing by delaying this paper’s retraction for so long?[1] So maybe the venue of its publication does tell us something about the quality of a paper!
Imagine if we asked 1000 random scholars to rank journals on an “degree of excellence” scale. Science and The Lancet would, I’m sure you’ll agree — like Liverpool’s football team or that one from the “great state of Kansas” recently celebrated by Trump — be placed in the journal Premier League. Yet the evidence shows — both from anecdote and hard data — that papers published in these venues are at least as vulnerable to error, poor experimental design and even outright fraud as those in less exalted venues.
But let’s look beyond journals — perhaps we’ll find a link between quality and venue elsewhere.
I’d like to tell you two stories about another venue of publication, this time, the World Bank.
In 2016, the Bill & Melinda Gates Foundation pledged $5BN to fight AIDS in Africa. Why? Well, it was all down to someone at the World Bank having the bright idea to take a copy of their latest report on AIDS in Africa to Seattle and pitch the findings and recommendations directly to Mr Gates. I often tell this story as an example of impact. I think we can agree that the quality of this report must have been pretty high. After all, it unlocked $5BN for a good cause. But, of course, you’re thinking — D’oh! It’s a World Bank report, it must be high-quality. Really?
Consider also this story: in 2014, headlines like this lit up around the world: “Literally a Third of World Bank Policy Reports Have Never, Ever Been Read Online, By Anyone” (Slate) and “World Bank learns most PDFs it produces go unread” (Sydney Morning Herald). These headlines were triggered by a working paper, written by two economists from the World Bank and published on its website. The punchline? They were wrong, the paper was very wrong. Like Prof. Arnold’s paper they were “missing contemporaneous entries and raw data”, in this case data from the World Bank’s official repository. They’d pulled the data from an old repository. If they had also used data from the Bank’s new repository they’d have found that every Bank report, however niche, had been downloaded many times. How do I know? Because I called the one guy who would know the truth, the Bank’s Publisher, Carlos Rossel, and once he’d calmed down, he told me.
So, we have two reports from the same venue: one plainly exhibiting a degree of excellence, the other painfully embarrassing (and, by the way, it still hasn’t been retracted).
Now, I bet you’re thinking, the latter is a working paper, therefore it hasn’t been peer-reviewed and so it doesn’t count. Well, the Aids in Africa report wasn’t “peer reviewed” either — in the sense we all understand — but that didn’t stop Gates reaching for his Foundation’s wallet. What about all the preprints being posted on BiorXiv and elsewhere about the Coronavirus: do they “not count”? This reminds me of a lovely headline when Cern’s paper on the discovery of the Higgs Boson finally made it into a journal some months after the results had been revealed at a packed seminar, and weeks after the paper had been posted on arXiv: “Higgs boson discovery passes peer review, becomes actual science”. Quite apart from the irony expressed by the headline writer, here’s a puzzler for you. Was the quality of this paper assured by finally being published in a journal (with an impact factor one-tenth of Science’s), or when it was posted in arXiv, or when it was presented at a seminar? Which venue assured the quality of this work?
Of course, none of them did because the venue of its publication tells us nothing about the quality of the paper. The quality is inherent in the paper itself, not in the venue where it is made public.
Wakefield paper’s lack of quality was also inherent in the paper itself and that it was published in The Lancet (and is still available on more than seventy websites) did not mean it was high quality. Or to put it another way, the venue of its publication tells us nothing useful about the quality of a paper.
So what are different venues good for? Today’s scholarly publishing system is still essentially the same as the one that Oldenburg et al started in the 17th Century. This system evolved in an environment when publishing costs were significant and grew with increased dissemination (increased demand meant higher print and delivery costs). This meant that editors had to make choices to keep costs under control — to select what to publish and what to reject. The selection criteria varied: some used geography to segment the market (The Chinese Journal of X, The European Journal of Y); some set up societies (Operational Research Society Journal) and others segmented the market by discipline (The International Journal of Neurology). These were genuinely useful distinctions to make, helping guide authors, readers and librarians to solutions for their authoring, reading and archiving needs.
Most journals pretend to use quality as a criterion to select within their niche — but isn’t it funny that there isn’t a Quality Journal of Chemistry or a Higher-Quality Journal of Physics? The real reasons for selection and rejection are of course to do with building brands and meeting business targets in terms of the number of pages published. If quality was the overarching criteria, why, like the wine harvest, don’t journals fluctuate in output each year? Down when there’s a poor-season and up when the sun shines?
If quality was the principle reason for acceptance and rejection, why is it absent from the list of most common reasons for rejection? According to Editage one of the most common reasons is because the paper didn’t fit the aims and scope of the journal. Not because the paper is of poor quality. The current publishing process isn’t a system for weeding out weak papers from prestige journals, leaving them with only the best. It’s a system for sorting stuff into “houses” which is as opaque, unaccountable and random as the Sorting Hat which confronted Harry Potter at Hogwarts. This paper to the Journal of Hufflepuff; that one to the Journal of Slytherin!
So the venue of its publication can tell us useful things about a paper: its geographical origin, its field of study, the society that endorses it. The one thing it can’t tell us is anything useful about the quality of a paper.
Note
[1] We regret this phrasing. We asked “did the Lancet know what it was doing” in the usual colloquial sense of implying a lack of competence (“he doesn’t know what he’s doing”); but as Niall Boyce rightly pointed out, it can be read as snidely implying that The Lancet knew exactly what it was doing, and deliberately delayed the retraction in order to accumulate more citations. For avoidance of doubt, that is not what we meant; we apologise for not having written more clearly.
References
We were of course not able to give references during the debate. But since our statement included several citations, we can remedy that deficiency here.
- Brembs, Björn, Katherine Button and Marcus Munafò. 2013. Deep impact: unintended consequences of journal rank. Frontiers in Human Neuroscience, 24 June 2013. doi:10.3389/fnhum.2013.00291
- Cho, Inha, Zhi-Jun Jia and Frances H. Arnold. 2019. Site-selective enzymatic C‒H amidation for synthesis of diverse lactams. Science 364(6440):575-578. doi:10.1126/science.aaw9068
- Fang, F. C., R. G. Steen and A. Casadevall. 2012. Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences 109:17028–17033. doi:10.1073/pnas.1212247109
- Lozano, G. A., V. Larivière and Y. Gingras. 2012. The weakening relationship between the impact factor and papers’ citations in the digital age. Journal of the American Society for Information Science and Technology 63(11):2140–2145. doi:10.1002/asi.22731
- Wakefield, A. J., S. H. Murch, A. Anthony, J. Linnell, D. M. Casson, M. Malik, M. Berelowitz and A. P. Dhillon. 1998. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet 351(9103):637–641. doi:10.1016/S0140-6736(97)11096-0 [RETRACTED]