January 29, 2016
You’ll remember that in the last installment (before Matt got distracted and wrote about archosaur urine), I proposed a general schema for aggregating scores in several metrics, terming the result an LWM or Less Wrong Metric. Given a set of n metrics that we have scores for, we introduce a set of n exponents ei which determine how we scale each kind of score as it increases, and a set of n factors ki which determine how heavily we weight each scaled score. Then we sum the scaled results:
LWM = k1·x1e1 + k2·x2e2 + … + kn·xnen
“That’s all very well”, you may ask, “But how do we choose the parameters?”
Here’s what I proposed in the paper:
One approach would be to start with subjective assessments of the scores of a body of researchers – perhaps derived from the faculty of a university confidentially assessing each other. Given a good-sized set of such assessments, together with the known values of the metrics x1, x2 … xn for each researcher, techniques such as simulated annealing can be used to derive the values of the parameters k1, k2 … kn and e1, e2 … en that yield an LWM formula best matching the subjective assessments.
Where the results of such an exercise yield a formula whose results seem subjectively wrong, this might flag a need to add new metrics to the LWM formula: for example, a researcher might be more highly regarded than her LWM score indicates because of her fine record of supervising doctoral students who go on to do well, indicating that some measure of this quality should be included in the LWM calculation.
I think as a general approach that is OK: start with a corpus of well understood researchers, or papers, whose value we’ve already judged a priori by some means; then pick the parameters that best approximate that judgement; and let those parameters control future automated judgements.
The problem, really, is how we make that initial judgement. In the scenario I originally proposed, where say the 50 members of a department each assign a confidential numeric score to all the others, you can rely to some degree on the wisdom of crowds to give a reasonable judgement. But I don’t know how politically difficult it would be to conduct such an exercise. Even if the individual scorers were anonymised, the person collating the data would know the total scores awarded to each person, and it’s not hard to imagine that data being abused. In fact, it’s hard to imagine it not being abused.
In other situations, the value of the subjective judgement may be close to zero anyway. Suppose we wanted to come up with an LWM that indicates how good a given piece of research is. We choose LWM parameters based on the scores that a panel of experts assign to a corpus of existing papers, and derive our parameters from that. But we know that experts are really bad at assessing the quality of research. So what would our carefully parameterised LWM be approximating? Only the flawed judgement of flawed experts.
Perhaps this points to an even more fundamental problem: do we even know what “good research” looks like?
It’s a serious question. We all know that “research published in high-Impact Factor journals” is not the same thing as good research. We know that “research with a lot of citations” is not the same thing as good research. For that matter, “research that results in a medical breakthrough” is not necessarily the same thing as good research. As the new paper points out:
If two researchers run equally replicable tests of similar rigour and statistical power on two sets of compounds, but one of them happens to have in her batch a compound that turns out to have useful properties, should her work be credited more highly than the similar work of her colleague?
What, then? Are we left only with completely objective measurements, such as statistical power, adherance to the COPE code of conduct, open-access status, or indeed correctness of spelling?
If we accept that (and I am not arguing that we should, at least not yet), then I suppose we don’t even need an LWM for research papers. We can just count these objective measures and call it done.
I really don’t know what my conclusions are here. Can anyone help me out?
The Less Wrong Metric (LWM): towards a not wholly inadequate way of quantifying the value of research
January 26, 2016
I said last time that my new paper on Better ways to evaluate research and researchers proposes a family of Less Wrong Metrics, or LWMs for short, which I think would at least be an improvement on the present ubiquitous use of impact factors and H-indexes.
What is an LWM? Let me quote the paper:
The Altmetrics Manifesto envisages no single replacement for any of the metrics presently in use, but instead a palette of different metrics laid out together. Administrators are invited to consider all of them in concert. For example, in evaluating a researcher for tenure, one might consider H-index alongside other metrics such as number of trials registered, number of manuscripts handled as an editor, number of peer-reviews submitted, total hit-count of posts on academic blogs, number of Twitter followers and Facebook friends, invited conference presentations, and potentially many other dimensions.
In practice, it may be inevitable that overworked administrators will seek the simplicity of a single metric that summarises all of these.
This is a key problem of the world we actually live in. We often bemoan that fact that people evaluating research will apparently do almost anything than actually read the research. (To paraphrase Dave Barry, these are important, busy people who can’t afford to fritter away their time in competently and diligently doing their job.) There may be good reasons for this; there may only be bad reasons. But what we know for sure is that, for good reasons or bad, administrators often do want a single number. They want it so badly that they will seize on the first number that comes their way, even if it’s as horribly flawed as an impact factor or an H-index.
What to do? There are two options. One is the change the way these overworked administrators function, to force them to read papers and consider a broad range of metrics — in other words, to change human nature. Yeah, it might work. But it’s not where the smart money is.
So perhaps the way to go is to give these people a better single number. A less wrong metric. An LWM.
Here’s what I propose in the paper.
In practice, it may be inevitable that overworked administrators will seek the simplicity of a single metric that summarises all of these. Given a range of metrics x1, x2 … xn, there will be a temptation to simply add them all up to yield a “super-metric”, x1 + x2 + … + xn. Such a simply derived value will certainly be misleading: no-one would want a candidate with 5,000 Twitter followers and no publications to appear a hundred times stronger than one with an H-index of 50 and no Twitter account.
A first step towards refinement, then, would weight each of the individual metrics using a set of constant parameters k1, k2 … kn to be determined by judgement and experiment. This yields another metric, k1·x1 + k2·x2 + … + kn·xn. It allows the down-weighting of less important metrics and the up-weighting of more important ones.
However, even with well-chosen ki parameters, this better metric has problems. Is it really a hundred times as good to have 10,000 Twitter followers than 100? Perhaps we might decide that it’s only ten times as good – that the value of a Twitter following scales with the square root of the count. Conversely, in some contexts at least, an H-index of 40 might be more than twice as good as one of 20. In a search for a candidate for a senior role, one might decide that the value of an H-index scales with the square of the value; or perhaps it scales somewhere between linearly and quadratically – with H-index1.5, say. So for full generality, the calculation of the “Less Wrong Metric”, or LWM for short, would be configured by two sets of parameters: factors k1, k2 … kn, and exponents e1, e2 … en. Then the formula would be:
LWM = k1·x1e1 + k2·x2e2 + … + kn·xnen
So that’s the idea of the LWM — and you can see now why I refer to this as a family of metrics. Given n metrics that you’re interested in, you pick 2n parameters to combine them with, and get a number that to some degree measures what you care about.
January 25, 2016
Like Stephen Curry, we at SV-POW! are sick of impact factors. That’s not news. Everyone now knows what a total disaster they are: how they are signficantly correlated with retraction rate but not with citation count; how they are higher for journals whose studies are less statistically powerful; how they incentivise bad behaviour including p-hacking and over-hyping. (Anyone who didn’t know all that is invited to read Brembs et al.’s 2013 paper Deep impact: unintended consequences of journal rank, and weep.)
Its 2016. Everyone who’s been paying attention knows that impact factor is a terrible, terrible metric for the quality of a journal, a worse one for the quality of a paper, and not even in the park as a metric for the quality of a researcher.
Unfortunately, “everyone who’s been paying attention” doesn’t seem to include such figures as search committees picking people for jobs, department heads overseeing promotion, tenure committees deciding on researchers’ job security, and I guess granting bodies. In the comments on this blog, we’ve been told time and time and time again — by people who we like and respect — that, however much we wish it weren’t so, scientists do need to publish in high-IF journals for their careers.
What to do?
It’s a complex problem, not well suited to discussion on Twitter. Here’s what I wrote about it recently:
The most striking aspect of the recent series of Royal Society meetings on the Future of Scholarly Scientific Communication was that almost every discussion returned to the same core issue: how researchers are evaluated for the purposes of recruitment, promotion, tenure and grants. Every problem that was discussed – the disproportionate influence of brand-name journals, failure to move to more efficient models of peer-review, sensationalism of reporting, lack of replicability, under-population of data repositories, prevalence of fraud – was traced back to the issue of how we assess works and their authors.
It is no exaggeration to say that improving assessment is literally the most important challenge facing academia.
This is from the introduction to a new paper which came out today: Taylor (2016), Better ways to evaluate research and researchers. In eight short pages — six, really, if you ignore the appendix — I try to get to grips with the historical background that got us to where we are, I discuss some of the many dimensions we should be using to evaluate research and researchers, and I propose a family of what I call Less Wrong Metrics — LWMs — that administrators could use if they really absolutely have to put a single number of things.
(I was solicited to write this by SPARC Europe, I think in large part because of things I have written around this subject here on SV-POW! My thanks to them: this paper becomes part of their Briefing Papers series.)
Next time I’ll talk about the LWM and how to calculate it. Those of you who are impatient might want to read the actual paper first!
November 19, 2015
I got back on Tuesday from OpenCon 2015 — the most astonishing conference on open scholarship. Logistically, it works very different from most conferences: students have their expenses paid, but established scholars have to pay a registration fee and cover their own expenses. That inversion of how things are usually done captures much of what’s unique about OpenCon: its focus on the next generation is laser-sharp.
They say you should never meet your heroes, but OpenCon demonstrated that that’s not always a good rule. Here I am with Erin McKiernan — the epitome of a fully open early-career researcher — and Mike Eisen, who needs no introduction:
(This photo was supposed to be Erin and me posing in our PeerJ T-shirts, but Mike crashed it with his PLOS shirt. Thanks to Geoff Bilder for taking the photo.)
It was striking the opening session, on Saturday morning, consisted of consecutive keynotes from Mike and then Erin. Both are now free to watch, and I can’t overstate how highly I recommend them. Seriously, make time. Next time you’re going to watch a movie, skip it and watch Mike and Erin instead.
Much of Mike’s talk was history: how he and others first became convinced of the importance of openness, how E-biomed nearly happened and then didn’t, how PLOS started with a declaration and became a publisher, and so on. What’s striking about this is just how much brutal opposition and painful discouragement Mike and his colleagues had to go through to get us to where we are now. The E-biomed proposal that would have freed all biomedical papers was opposed powerfully by publishers (big surprise, huh?) and eventually watered down into PubMed Central. The PLOS declaration collected 34,000 signatures, but most signatories didn’t follow through. PLOS as a publisher was met with scepticism; and PLOS ONE with derision. It takes a certain strength of mind and spirit to keep on truckin’ through that kind of setback, and we can all be grateful that Mike’s was one of the hands on the wheel.
At a much earlier stage in her career, Erin’s pledge to extreme openness reflects Mike’s. It’s good to see that so far, it’s helping rather than harming her career.
(And how is it going? Watch her talk, which follows Mike’s, to find out. You won’t regret it.)
There is so, so much more that I could say about OpenCon. Listing all the inspiring people that I met, alone, would be too much for one blog-post. I will just briefly mention some of those that I have known by email/blog/Twitter for some time, but met in the flesh for the first time: Mike Eisen and Erin McKiernan both fall into that category; so do Björn Brembs, Melissa Hagemann, Geoff Bilder and Danny Kingsley. I could have had an amazing time just talking to people even if I’d missed all the sessions. (Apologies to everyone I’ve not mentioned.)
Oh, and how often do you get to rub shoulders with Jimmy Wales?
(That’s Jon Tennant in between Jimmy and me, and Mike Eisen trying, but not quite succeeding, to photobomb us from behind.)
And yet, even with global superstars around, the part of the weekend that impressed me the most was a small breakout session where I found myself in a room with a dozen people I’d never met before, didn’t recognise, and hadn’t heard of. As we went around the room and did introductions, every single one of them was doing something awesome. They were helping a scholarly society to switch to OA publishing, or funding open projects in the developing world, or driving a university’s adoption of an OA policy, or creating a new repository for unpublished papers, or something. (I really wish I’d written them all down.)
The sheer amount of innovation and hard work that’s going on just blew me away. So: OpenCon 2015 community, I salute you! May we meet again!
Update (Saturday 21 November 2015)
Here is the conference photo, taken by Slobodan Radicev, CC by:
And here’s a close-up of the bit with me, honoured to be sandwiched between the founders of Public Library of Science and the Open Library of Humanities! (That’s Mike Eisen to the left, and Martin Eve to the right.)
Many SV-POW! readers will already be aware that the entire editorial staff of the Elsevier journal Lingua has resigned over the journal’s high price and lack of open access. As soon as they have worked out their contracts, they will leave en bloc and start a new open access journal, Glossa — which will in fact be the old journal under a new name. (Whether Elsevier tries to keep the Lingua ghost-ship afloat under new editors remains to be seen.)
Today I saw Elsevier’s official response, “Addressing the resignation of the Lingua editorial board“. I just want to pick out one tiny part of this, which reads as follows:
The article publishing charge at Lingua for open access articles is 1800 USD. The editor had requested a price of 400 euros, an APC that is not sustainable. Had we made the journal open access only and at the suggested price point, it would have rendered the journal no longer viable – something that would serve nobody, least of which the linguistics community.
The new Lingua will be hosted at Ubiquity Press, a well-established open-access publisher that started out as UCL’s in-house OA publishing arm and has spun off into private company. The APC at Ubiquity journals is typically £300 (€375, $500), which is less than the level that Elsevier describe as “not sustainable” (and a little over a fifth of what Elsevier currently charge).
Evidently Ubiquity Press finds it sustainable.
You know what’s not sustainable? Dragging around the carcass of a legacy barrier-based publisher, with all its expensive paywalls, authentication systems, Shibboleth/Athens/Kerberos integration, lawyers, PR departments, spin-doctors, lobbyists, bribes to politicians, and of course 37.3% profit margins.
The biggest problem with legacy publishers? They’re just a waste of money.
Copyright: promoting the Progress of Science and useful Arts by preventing access to 105-year-old quarry maps
October 11, 2015
In my recent preprint on the incompleteness and distortion of sauropod neck specimens, I discuss three well-known sauropod specimens in detail, and show that they are not as well known as we think they are. One of them is the Giraffatitan brancai lectotype MB.R.2181 (more widely known by its older designation HMN SII), the specimen that provides the bulk of the mighty mounted skeleton in Berlin.
That photo is from this post, which is why it’s disfigured by red arrows pointing at its epipophyses. But the vertebra in question — the eighth cervical of MB.R.2181 — is a very old friend: in fact, it was the subject of the first ever SV-POW! post, back in 2007.
In the reprint, to help make the point that this specimen was found extremely disarticulated, I reproduce Heinrich (1999:figure 16), which is Wolf-Dieter Heinrich’s redrawing of Janensch’s original sketch map of Quarry S, made in 1909 or 1910. Here it is again:
For the preprint, as for this blog-post (and indeed the previous one), I just went right ahead and included it. But the formal version of the paper (assuming it passes peer-review) will by very explicitly under a CC By licence, so the right thing to do is get formal permission to include it under those terms. So I’ve been trying to get that permission.
What a stupid, stupid waste of time.
Heinrich’s paper appeared in the somewhat cumbersomely titled Mitteilungen aus dem Museum fur Naturkunde in Berlin, Geowissenschaftliche Reihe, published as a subscription journal by Wiley. Happily, that journal is now open access, published by Pensoft as The Fossil Record. So I wrote to the Fossil Record editors to request permission. They wrote back, saying:
We are not the right persons for your question. The Wiley Company holds the copyright and should therefore be asked. Unfortunately, I do not know who is the correct person.
Thank you for your enquiry.
We are currently experiencing a large volume of email traffic and will deal with your request within the next 15 working days.
We are pleased to advise that permission for the majority of our journal content, and for an increasing number of book publications, may be cleared more quickly by using the RightsLink service via Wiley’s websites http://onlinelibrary.wiley.com and www.wiley.com.
Within the next fifteen working days? That is, in the next three weeks? How can it possibly take that long? Are they engraving their response on a corundum block?
So, OK, let’s follow the automated suggestion and try RightsLink. I went to the Wiley Online Library, and searched for journals whose names contain “naturkunde”. Only one comes up, and it’s not the right one. So Wiley doesn’t admit the existence of the journal.
Well, there’s lots to enjoy here, isn’t there? First, and most important, it doesn’t actually work: “Permission to reproduce this content cannot be granted via the RightsLink service.” Then there’s that cute little registered-trademark symbol “®” on the name RightsLink, because it’s important to remind me not to accidentally set up my own rights-management service with the same name. In the same vein, there’s the “Copyright © 2015 Copyright Clearance Center, Inc. All Rights Reserved” notice at the bottom — copyright not on the content that I want to reuse, but on the RightsLink popup itself. (Which I guess means I am in violation for including the screenshot above.) Oh, and there’s the misrendering of “Museum für Naturkunde” as “Museum fÃ¼r Naturkunde”.
All of this gets me precisely nowhere. As far as I can tell, my only recourse now is to wait three weeks for Wiley to get in touch with me, and hope that they turn out to be in favour of science.
It’s Sunday afternoon. I could be watching Ireland play France in the Rugby World Cup. I could be out at Staverton, seeing (and hearing) the world’s last flying Avro Vulcan overfly Gloucester Airport for the last time. I could be watching Return of the Jedi with the boys, in preparation for the forthcoming Episode VII. Instead, here I am, wrestling with copyright.
How absolutely pointless. What a terrible waste of my life.
Is this what we want researchers to be spending their time on?
Update (13 October 2015): a happy outcome (this time)
I was delighted, on logging in this morning, to find I had email from RIGHTS-and-LICENCES@wiley-vch.de with the subject “Permission to reproduce Heinrich (1999:fig. 16) under CC By licence” — a full thirteen working days earlier than expected. They were apologetic and helpful. Here is key part of what they said:
We are of course happy to handle your request directly from our office – please find the requested permission here:We hereby grant permission for the requested use expected that due credit is given to the original source.If material appears within our work with credit to another source, authorisation from that source must be obtained.Credit must include the following components:– Journals: Author(s) Name(s): Title of the Article. Name of the Journal. Publication year. Volume. Page(s). Copyright Wiley-VCH Verlag GmbH & Co. KGaA. Reproduced with permission.
So this is excellent. I would of course have included all those elements in the attribution anyway, with the exception that it might not have occurred to me to state who the copyright holder is. But there is no reason to object to that.
So, two cheers for Wiley on this occasion. I had to waste some time, but at least none of it was due to deliberate obstructiveness, and most importantly they are happy for their figure to be reproduced under CC By.
- Heinrich, Wolf-Dieter. 1999. The taphonomy of dinosaurs from the Upper Jurassic of Tendaguru, Tanzania (East Africa), based on field sketches of the German Tendaguru expedition (1909-1913). Mitteilungen aus dem Museum fur Naturkunde in Berlin, Geowissenschaftliche Reihe 2:25-61.
October 4, 2015
Preprints are in the air! A few weeks ago, Stephen Curry had a piece about them in the Guardian (Peer review, preprints and the speed of science) and pterosaur palaeontologist Liz Martin published Preprints in science on her blog Musings of Clumsy Palaeontologist. The latter in particular has spawned a prolific and fascinating comment stream. Then SV-POW!’s favourite journal, PeerJ, weighed in on its own blog with A PeerJ PrePrint – so just what is that exactly?.
Following on from that, I was invited to contribute a guest-post to the PeerJ blog: they’re asking several people about their experiences with PeerJ Preprints, and publishing the results in a series. I started to write my answers in an email, but they soon got long enough that I concluded it made more sense to write my own post instead. This is that post.
As a matter of fact, I’ve submitted four PeerJ preprints, and all of them for quite different reasons.
1. Barosaurus neck. I and Matt submitted the Barosaurus manuscript as a preprint because we wanted to get feedback as quickly as possible. We certainly got it: four very long detailed comments that were more helpful than most formally solicited peer-reviews that I’ve had. (It’s to our discredit that we didn’t then turn the manuscript around immediately, taking those reviews into a account. We do still plan to do this, but other things happened.)
2. Dinosaur diversity. Back in 2004 I submitted my first ever scientific paper, a survey of dinosaur diversity broken down in various ways. It was rejected (for what I thought were spurious reasons, but let it pass). The more time that passed, the more out of date the statistics became. As my interests progressed in other directions, I reached the point of realising that I was never going to get around to bringing that paper up to date and resubmitting it to a journal. Rather than let it be lost to the world, when I think it still contains much that is of interest, I published it as a pre-print (although it’s not pre- anything: what’s posted is the final version).
3. Cartilage angles. Matt and I had a paper published on PLOS ONE in 2013, on the effect that intervertebral cartilage had on sauropod neck posture. Only after it was published did I realise that there was a very simple way to quantify the geometric effect. I wrote what was intended to be a one-pager on that, planning to issue it as a sort of erratum. It ended up much longer than expected, but because I considered it to be material that should really have been in the original PLOS ONE paper, I wanted to get it out as soon as possible. So as soon as the manuscript was ready, I submitted it simultaneously as a preprint and onto the peer-review track at PeerJ. (It was published seven weeks later.)
4. Apatosaurine necks. Finally, I gave a talk at this year’s SVPCA (Symposium on Vertebrate Palaeontology and Comparative Anatomy), based on an in-progress manuscript in which I am second author to Matt. The proceedings of the symposium are emerging as a PeerJ Collection, and I and the other authors wanted our paper to be a part of that collection. So I submitted the abstract of the talk I gave, with the slide-deck as supplementary information. In time, this version of the preprint will be superseded by the completed manuscript, and eventually (we hope) by the peer-reviewed paper.
So the thing to take away from this is that there are lots of reasons to publish preprints. They open up different ways of thinking about the publication process.