How can next REF more strongly emphasise the unimportance of Impact Factor?

July 10, 2015

I spent much of yesterday morning at the launch meeting of HEFCE’s new report on the use of metrics, The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management. (Actually, thanks to the combination of a tube strike and a train strike, I spent most of the day stationary in traffic jams, but that’s not important right now.)

There’s a lot to like about the report, which is a fantastically detailed piece of work. (It weighs in at 178 pages for the main report, plus 200 pages for Supplement I and another 85 for Supplement II. I suspect that most people, including me, will content themselves with the Executive Summary, which is itself no lightweight at 12 pages.) Much has been written about it elsewhere — see the LSE’s link farm — but I want to focus on one issue that came up in the discussion.

As we’ve noted here a couple of times before, the REF (Research Excellence Framework) is explicit in disavowing impact factors and other rankings in its assessments: see the answer to this question: How will journal impact factors, rankings or lists, or the perceived standing of publishers be used to inform the assessment of research outputs?, which is:

No sub-panel will make any use of journal impact factors, rankings, lists or the perceived standing of publishers in assessing the quality of research outputs. An underpinning principle of the REF is that all types of research and all forms of research outputs across all disciplines shall be assessed on a fair and equal basis.

The problem is, people tend not to believe it. Universities continue to select which papers to submit to the REF on the basis of what journals they were in. And this propagates all the problems of journal rank and the absurdly disproportionate influence that two or three scientifically weak journals have on the whole field of scholarship.

As Richard Butler said in a comment on an earlier post:

Most people inside UK universities that I have talked to say that journal reputation is being considered by departments when preparing their REF submissions, and this has been documented by various articles in The Guardian and THS.

So that’s the background.

Then at the Metrics Tide launch, the question was asked: what more can HEFCE do to convey that they’re looking for good work, not work from high-IF journals?

That was the only point in the meeting where I stuck my hand up — I had things to say, but at that point the chairman of the panel chipped in with a different question, and the moment had passed. So rather than yank the discussion back to that point, I decided it would be better to blog about it.

There are two possible reasons why universities depend on journal rank in general, and impact factor in particular, when deciding on what papers to submit to the REF.

1. They simply don’t believe what the REF says about not caring what venue a paper is in; or
2. They believe the REF is telling the truth, but think that Impact Factor is a good proxy for the qualities that the REF does care about.

The solution to #1 is a just a bigger, bolder statement in the 2020 REF. Instead of being somewhat buried, the 2020 documents should begin with the following statement in 20-point bold font:

Submitted works will be assessed according to their intrinsic quality (clarity, replicability, statistical power, significance) and not according to the venue they appear in. If you use Impact Factors to assess works, you are statistically illiterate.

The solution to #2 is a little more complicated. What it comes down to is education: helping administrators to see and understand that in fact the Impact Factor of the journal that work appears in is not a good proxy for any of the things that we care about.

In short, part of the job that the 2020 REF needs to do is to demonstrate to administrators that submitting high-IF papers is not a good strategy for them. It won’t optimise their REF results. IF will give them papers that have no tendency to be highly cited or statistically powerful, but which are more likely to be retracted.

10 Responses to “How can next REF more strongly emphasise the unimportance of Impact Factor?”

  1. Coincidentally I’m reading about Elon Musk and Tesla:

    >Musk had told me earlier in the day: “The moment the person leading a company thinks numbers have value in themselves, the company’s done. The moment the CFO becomes CEO—it’s done. Game over.” (


  2. Kniffler Says:

    A higher retraction rate is probably a good thing though, right? More papers deserve retraction than are actually retracted[citation needed] – so surviving in a more retract-happy journal is an indicator of higher paper quality.

  3. Mike Taylor Says:

    Well, Kniffler, there are (at least) two forces at play here. One is that, in order to have the best shot at getting their studies into these high-profile journals, researchers are tempted to falsify, or at least massage or over-interpret data. That of course is an actively harmful unintended consequence of our present system of journal rank. The second is that, as you suggest, articles in high-profile journals may attract more attention and therefore be less able to get away with such behaviours than they would be able to in other venues.

    I seem to recall reading somewhere that a study had been done that somehow disentangled these two effects, and found that when controlling for the latter, the former was still significant. I don’t remember the details, though: perhaps someone else can remind us?

  4. Kenneth Carpenter Says:

    Well, no surprise here. I have long said that IF is really a marketing tool that benefits the journal. The fact that a journal brags about its IF should cause skepticism: you can bet the datum is skewed to impress. Given that administrators often rise through the ranks of academia (e.g., department chair), they probably know better than to believe the hyperbole about IF, so I don’t fully accept the excuses given for administrator usage.

  5. brembs Says:

    We have looked at journal rank and the methodological quality of their non-retracted articles, and the data suggests the hi-ranking journals publish weaker papers:

    So direct evidence suggests top journal publish bottom research, independent of retractions.
    Indirect evidence suggests the scrutiny/visibility effect for retracted papers may be small:

    Anecdote and speculation suggest that top journals are, if anything, *less* likely to retract than lower journals.

    Taken together, it seems that the published correlation of journal rank with decreasing quality is conservative, potentially underestimating the actual effect.

  6. brembs Says:

    Commenting to your post, one may write in the instructions:

    “Submissions from high IF journals will be subjected to additional scrutiny as high IF journals compromise on quality control and hence have a track record of unreliability and irreproducibility”.

    Or something along those lines…

  7. Matt Wedel Says:

    Kniffler wrote:

    A higher retraction rate is probably a good thing though, right? More papers deserve retraction than are actually retracted[citation needed] – so surviving in a more retract-happy journal is an indicator of higher paper quality.

    Doubtful. Fang et al. (2012) found that “Misconduct accounts for the majority of retracted scientific publications”. So it may not be that the high-IF journals are better at catching bad science, but rather that the IF ratrace causes people to fudge their data to get into the weeklies in the first place. The pool of manuscripts submitted to high-IF journals may be more inherently corrupt than the pool of manuscripts submitted elsewhere, in which case more retractions will occur even if the level of scrutiny is no higher.

  8. Samuel Says:

    This is a fine goal: “Submitted works will be assessed according to their intrinsic quality (clarity, replicability, statistical power, significance)”

    But, how am I going to do this? I’m not an expert in every sub-sub-field of science.

    Hmm… How about I get some experts to read the paper and tell me what they think? I could get them to review the paper for me and tell me what they think. Perhaps I could find peers of the scientist who wrote it. We could call them…um… assessors, yes, let’s go with that.

    So I need good assessors – how will I find them? Maybe I could ask a respected leader in the field to pick them? Someone who knows the field and the people in it…

    This is great, I think I’m on to something here. But I’m not the only one who has to do this. Other institutions need to assess articles too. Maybe we could jointly hire a respected leader and he could pick assessors and tell us all the results.

    Hmm… Maybe he could take all the good papers and bundle them together. Would make it easy to identify papers his community think are good. We could call it a… uh… newsletter? Ok, that works.

    But hang on – how do I know my respected leader is any good? Maybe he’s a great scientist and crap at picking assessors.

    Huh. Oh, I know – I could look at his track record. Was he historically good at picking papers that proved to be important? I’m not expecting a perfect record, but is he better than average? So I just have to compare the track record of the newsletters that different respected leaders have produced, and that could give me some guidance on who is good at identifying important papers.

    Hmm… I wonder if this would work for picking which news to read (NY times?) or person to hire (you went to Harvard?). I mean, they’re not definitive of course – plenty of crap in both of those places – but I’m sure these are not meaningless signals.

    I don’t know. Your ideas interest me, though, maybe I should read some studies about this stuff. Do you maybe have a newsletter I could subscribe to?

  9. Well, you’ve made a fair case against IF, and I’m delighted to read it.

    Re REF, there’s still the question of how “statistical power” should be interpreted: either in absolute or relative terms. For example, my field is medieval Catalan literature. We are about forty people in all the planet, doing research on an area as wide as medieval Portuguese or even as medieval German. Therefore our statistical power is absolutely insignificant, and in a totally different league from researchers in those other areas.

    My most cited publications today date from twenty years ago, even though I’m a full professor publishing regularly: it will probably take a bunch of years for one of my scarce colleagues to treat the same topic I published on today, and therefore to (maybe) consider quoting it.

    How, then, to apply the “statistical power” idea in minority areas of research? Isn’t it like welcoming back IF again?

  10. Mike Taylor Says:

    Rosanna, you are absolutely right about statistical power being something that necessarily varies hugely between fields. As a sauropod palaeontologist, the population sample of most of the species I deal with is one. And when I say one, I mean “a handful of bones from a single specimen”.

    When we wrote our paper on (lack of) sexual selection in sauropod necks, we got dinged by a reviewer for conducting a correlation analysis of leg length to neck length on only a dozen sauropods. We had to explain to that reviewer (who was a neontologist) that this sample was all the sauropods for which there are reasonable estimates of both leg and neck lengths.

    The arguments about statistical power apply most strongly to medical trials, of course.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: