Back in 2005, three years before their paper on the WDC Supersaurus known as Jimbo was published, Lovelace at al. presented their work as a poster at the annual SVP meeting. The abstract for that poster appeared, as usual, in the abstracts book that came as a supplement to JVP 25 issue 3. But the poster itself was never published — which is a shame, as it contains some useful images that didn’t make it into the descriptive paper (Lovelace et al. 2008).

With Dave and Scott’s blessing, here it is! Click through for full resolution, of course.

And here’s the abstract as it appeared in print (Lovelace et al. 2005):

REVISED OSTEOLOGY OF SUPERSAURUS VIVIANAE

LOVELACE, David, HARTMAN, Scott, WAHL, William, Wyoming Dinosaur Center, Thermopolis, WY

A second, and more complete, associated specimen of Supersaurus vivianae (WDC-DMJ021) was discovered in the Morrison Formation of east-central Wyoming in a single sauropod locality. The skeleton provides a more complete picture of the osteology of S. vivianae, including a surprising number of apatosaurine characteristics. The caudals have heart shaped centra that lack a ventral longitudinal hollow, and the rectangular distal neural spines of the anterior caudals are mediolaterally expanded similar to Apatosaurus excelsus. The centra of the anterior caudals are procoelous as in other diplodocids, but the posterior ball is very weakly pronounced. The robusticity of the tibiae and fibulae are intermediate between Apatosaurus and diplodocines. The cervical vertebrae demonstrate classic diplodocine elongation with an elongation index ranging from 4 to 7.5. All 7 of the new cervicals have a centrum length that exceeds 1 meter. Mid-posterior cervicals are semicamellate at mid-centra near the pneumatic foramina. The dorsal vertebrae exhibit a high degree of elaboration on laminae, and extremely rugose pre and postspinal laminae. Costal elements are robust, with complex pneumatic innervations in the rib head. Although unknown in other diplodocids, early reports described pneumatic ribs in an A. excelsus; unfortunately the described specimen is unavailable.

Inclusion of lesser-known North American diplodocids such as Supersaurus, Seismosaurus and Suuwassea in phyolgenetic studies, may provide a framework for better understanding North American diplodocid evolution.

Many thanks to Dave and Scott for permission to share this important poster more widely. (Publish your posters, people! That option didn’t exist in 2005, but it does now!)

References

  • Lovelace, David M., Scott A. Hartman and William R. Wahl. 2005. Revised Osteology of Supersaurus vivanae (SVP poster). Journal of Vertebrate Paleontology 25(3):84A–85A.
  • Lovelace, David M., Scott A. Hartman and William R. Wahl. 2008. Morphology of a specimen of Supersaurus (Dinosauria, Sauropoda) from the Morrison Formation of Wyoming, and a re-evaluation of diplodocid phylogeny. Arquivos do Museu Nacional, Rio de Janeiro 65(4):527–544.

If you don’t get to give a talk at a meeting, you get bumped down to a poster. That’s what’s happened to Matt, Darren and me at this year’s SVPCA, which is coming up next week. My poster is about a weird specimen that Matt and I have been informally calling “Biconcavoposeidon” (which I remind you is not a formal taxonomic name).

Here it is, for those of you who won’t be at the meeting (or who just want a preview):

But wait — there’s more. The poster is now also formally published (Taylor and Wedel 2017) as part of the PeerJ preprint containing the conference abstract. It has a DOI and everything. I’m happy enough about it that I’m now citing it in my CV.

Do scientific posters usually get published? Well, no. But why not? I can’t offhand think of a single example of a published poster, though there must be some out there. They are, after all, legitimate research artifacts, and typically contain more information than published abstracts. So I’m happy to violate that norm.

Folks: it’s 2017. Publish your posters.

References

  • Taylor, Michael P., and Mathew J. Wedel. 2017. A unique Morrison-Formation sauropod specimen with biconcave dorsal vertebrae. p. 78 in: Abstract Volume: The 65th Symposium on Vertebrate Palaeontology and Comparative Anatomy & The 26th Symposium on Palaeontological Preparation and Conservation. University of Birmingham: 12th–15th September 2017. 79 pp. PeerJ preprint 3144v2. doi:10.7287/peerj.preprints.3144v2/supp-1

[Note: Mike asked me to scrape a couple of comments on his last post – this one and this one – and turn them into a post of their own. I’ve edited them lightly to hopefully improve the flow, but I’ve tried not to tinker with the guts.]

This is the fourth in a series of posts on how researchers might better be evaluated and compared. In the first post, Mike introduced his new paper and described the scope and importance of the problem. Then in the next post, he introduced the idea of the LWM, or Less Wrong Metric, and the basic mathemetical framework for calculating LWMs. Most recently, Mike talked about choosing parameters for the LWM, and drilled down to a fundamental question: (how) do we identify good research?

Let me say up front that I am fully convicted about the problem of evaluating researchers fairly. It is a question of direct and timely importance to me. I serve on the Promotion & Tenure committees of two colleges at Western University of Health Sciences, and I want to make good decisions that can be backed up with evidence. But anyone who has been in academia for long knows of people who have had their careers mangled, by getting caught in institutional machinery that is not well-suited for fairly evaluating scholarship. So I desperately want better metrics to catch on, to improve my own situation and those of researchers everywhere.

For all of those reasons and more, I admire the work that Mike has done in conceiving the LWM. But I’m pretty pessimistic about its future.

I think there is a widespread misapprehension that we got here because people and institutions were looking for good metrics, like the LWM, and we ended up with things like impact factors and citation counts because no-one had thought up anything better. Implying a temporal sequence of:

1. Deliberately looking for metrics to evaluate researchers.
2. Finding some.
3. Trying to improve those metrics, or replace them with better ones.

I’m pretty sure this is exactly backwards: the metrics that we use to evaluate researchers are mostly simple – easy to explain, easy to count (the hanky-panky behind impact factors notwithstanding) – and therefore they spread like wildfire, and therefore they became used in evaluation. Implying a very different sequence:

1. A metric is invented, often for a reason completely unrelated to evaluating researchers (impact factors started out as a way for librarians to rank journals, not for administration to rank faculty!).
2. Because a metric is simple, it becomes widespread.
3. Because a metric is both simple and widespread, it makes it easy to compare people in wildly different circumstances (whether or not that comparison is valid or defensible!), so it rapidly evolves from being trivia about a researcher, to being a defining character of a researcher – at least when it comes to institutional evaluation.

If that’s true, then any metric aimed for wide-scale adoption needs to be as simple as possible. I can explain the h-index or i10 index in one sentence. “Citation count” is self-explanatory. The fundamentals of the impact factor can be grasped in about 30 seconds, and even the complicated backstory can be conveyed in about 5 minutes.

In addition to being simple, the metric needs to work the same way across institutions and disciplines. I can compare my h-index with that of an endowed chair at Cambridge, a curator at a small regional museum, and a postdoc at Podunk State, and it Just Works without any tinkering or subjective decisions on the part of the user (other than What Counts – but that affects all metrics dealing with publications, so no one metric is better off than any other on that score).

I fear that the LWM as conceived in Taylor (2016) is doomed, for the following reasons:

  • It’s too complex. It would probably be doomed if it had just a single term with a constant and an exponent (which I realize would defeat the purpose of having either a constant or an exponent), because that’s more math than either an impact factor or an h-index requires (perceptively, anyway – in the real world, most people’s eyes glaze over when the exponents come out).
  • Worse, it requires loads of subjective decisions and assigning importance on the part of the users.
  • And fatally, it would require a mountain of committee work to sort that out. I doubt if I could get the faculty in just one department to agree on a set of terms, constants, and exponents for the LWM, much less a college, much less a university, much less all of the universities, museums, government and private labs, and other places where research is done. And without the promise of universal applicability, there’s no incentive for any institution to put itself through the hell of work it would take to implement.

Really, the only way I think the LWM could get into place is by fiat, by a government body. If the EPA comes up with a more complicated but also more accurate way to measure, say, airborne particle output from car exhausts, they can theoretically say to the auto industry, “Meet this standard or stop selling cars in the US” (I know there’s a lot more legislative and legal push and pull than that, but it’s at least possible). And such a standard might be adopted globally, either because it’s a good idea so it spreads, or because the US strong-arms other countries into following suit.

Even if I trusted the US Department of Education to fill in all of the blanks for an LWM, I don’t know that they’d have the same leverage to get it adopted. I doubt that the DofE has enough sway to get it adopted even across all of the educational institutions. Who would want that fight, for such a nebulous pay-off? And even if it could be successfully inflicted on educational institutions (which sounds negative, but that’s precisely how the institutions would see it), what about the numerous and in some cases well-funded research labs and museums that don’t fall under the DofE’s purview? And that’s just in the US. The culture of higher education and scholarship varies a lot among countries. Which may be why the one-size-fits-all solutions suck – I am starting to wonder if a metric needs to be broken, to be globally applicable.

The problem here is that the user base is so diverse that the only way metrics get adopted is voluntarily. So the challenge for any LWM is to be:

  1. Better than existing metrics – this is the easy part – and,
  2. Simple enough to be both easily grasped, and applied with minimal effort. In Malcolm Gladwell Tipping Point terms, it needs to be “sticky”. Although a better adjective for passage through the intestines of academia might be “smooth” – that is, having no rough edges, like exponents or overtly subjective decisions*, that would cause it to snag.

* Calculating an impact factor involves plenty of subjective decisions, but it has the advantages that (a) the users can pretend otherwise, because (b) ISI does the ‘work’ for them.

At least from my point of view, the LWM as Mike has conceived it is awesome and possibly unimprovable on the first point (in that practically any other metric could be seen as a degenerate case of the LWM), but dismal and possibly pessimal on the second one, in that it requires mounds of subjective decision-making to work at all. You can’t even get a default number and then iteratively improve it without investing heavily in advance.

An interesting thought experiment would be to approach the problem from the other side: invent as many new simple metrics as possible, and then see if any of them offer advantages over the existing ones. Although I have a feeling that people are already working on that, and have been for some time.

Simple, broken metrics like impact factor are the prions of scholarship. Yes, viruses are more versatile and cells more versatile still, by orders of magnitude, but compared to prions, cells take an awesome amount of effort to build and maintain. If you just want to infect someone and you don’t care how, prions are very hard to beat. And they’re so subtle in their machinations that we only became aware of them comparatively recently – much like the emerging problems with “classical” (e.g., non-alt) metrics.

I’d love to be wrong about all of this. I proposed the strongest criticism of the LWM I could think of, in hopes that someone would come along and tear it down. Please start swinging.

You’ll remember that in the last installment (before Matt got distracted and wrote about archosaur urine), I proposed a general schema for aggregating scores in several metrics, terming the result an LWM or Less Wrong Metric. Given a set of n metrics that we have scores for, we introduce a set of n exponents ei which determine how we scale each kind of score as it increases, and a set of n factors ki which determine how heavily we weight each scaled score. Then we sum the scaled results:

LWM = k1·x1e1 + k2·x2e2 + … + kn·xnen

“That’s all very well”, you may ask, “But how do we choose the parameters?”

Here’s what I proposed in the paper:

One approach would be to start with subjective assessments of the scores of a body of researchers – perhaps derived from the faculty of a university confidentially assessing each other. Given a good-sized set of such assessments, together with the known values of the metrics x1, x2xn for each researcher, techniques such as simulated annealing can be used to derive the values of the parameters k1, k2kn and e1, e2en that yield an LWM formula best matching the subjective assessments.

Where the results of such an exercise yield a formula whose results seem subjectively wrong, this might flag a need to add new metrics to the LWM formula: for example, a researcher might be more highly regarded than her LWM score indicates because of her fine record of supervising doctoral students who go on to do well, indicating that some measure of this quality should be included in the LWM calculation.

I think as a general approach that is OK: start with a corpus of well understood researchers, or papers, whose value we’ve already judged a priori by some means; then pick the parameters that best approximate that judgement; and let those parameters control future automated judgements.

The problem, really, is how we make that initial judgement. In the scenario I originally proposed, where say the 50 members of a department each assign a confidential numeric score to all the others, you can rely to some degree on the wisdom of crowds to give a reasonable judgement. But I don’t know how politically difficult it would be to conduct such an exercise. Even if the individual scorers were anonymised, the person collating the data would know the total scores awarded to each person, and it’s not hard to imagine that data being abused. In fact, it’s hard to imagine it not being abused.

In other situations, the value of the subjective judgement may be close to zero anyway. Suppose we wanted to come up with an LWM that indicates how good a given piece of research is. We choose LWM parameters based on the scores that a panel of experts assign to a corpus of existing papers, and derive our parameters from that. But we know that experts are really bad at assessing the quality of research. So what would our carefully parameterised LWM be approximating? Only the flawed judgement of flawed experts.

Perhaps this points to an even more fundamental problem: do we even know what “good research” looks like?

It’s a serious question. We all know that “research published in high-Impact Factor journals” is not the same thing as good research. We know that “research with a lot of citations” is not the same thing as good research. For that matter, “research that results in a medical breakthrough” is not necessarily the same thing as good research. As the new paper points out:

If two researchers run equally replicable tests of similar rigour and statistical power on two sets of compounds, but one of them happens to have in her batch a compound that turns out to have useful properties, should her work be credited more highly than the similar work of her colleague?

What, then? Are we left only with completely objective measurements, such as statistical power, adherance to the COPE code of conduct, open-access status, or indeed correctness of spelling?

If we accept that (and I am not arguing that we should, at least not yet), then I suppose we don’t even need an LWM for research papers. We can just count these objective measures and call it done.

I really don’t know what my conclusions are here. Can anyone help me out?

I said last time that my new paper on Better ways to evaluate research and researchers proposes a family of Less Wrong Metrics, or LWMs for short, which I think would at least be an improvement on the present ubiquitous use of impact factors and H-indexes.

What is an LWM? Let me quote the paper:

The Altmetrics Manifesto envisages no single replacement for any of the metrics presently in use, but instead a palette of different metrics laid out together. Administrators are invited to consider all of them in concert. For example, in evaluating a researcher for tenure, one might consider H-index alongside other metrics such as number of trials registered, number of manuscripts handled as an editor, number of peer-reviews submitted, total hit-count of posts on academic blogs, number of Twitter followers and Facebook friends, invited conference presentations, and potentially many other dimensions.

In practice, it may be inevitable that overworked administrators will seek the simplicity of a single metric that summarises all of these.

This is a key problem of the world we actually live in. We often bemoan that fact that people evaluating research will apparently do almost anything than actually read the research. (To paraphrase Dave Barry, these are important, busy people who can’t afford to fritter away their time in competently and diligently doing their job.) There may be good reasons for this; there may only be bad reasons. But what we know for sure is that, for good reasons or bad, administrators often do want a single number. They want it so badly that they will seize on the first number that comes their way, even if it’s as horribly flawed as an impact factor or an H-index.

What to do? There are two options. One is the change the way these overworked administrators function, to force them to read papers and consider a broad range of metrics — in other words, to change human nature. Yeah, it might work. But it’s not where the smart money is.

So perhaps the way to go is to give these people a better single number. A less wrong metric. An LWM.

Here’s what I propose in the paper.

In practice, it may be inevitable that overworked administrators will seek the simplicity of a single metric that summarises all of these. Given a range of metrics x1, x2xn, there will be a temptation to simply add them all up to yield a “super-metric”, x1 + x2 + … + xn. Such a simply derived value will certainly be misleading: no-one would want a candidate with 5,000 Twitter followers and no publications to appear a hundred times stronger than one with an H-index of 50 and no Twitter account.

A first step towards refinement, then, would weight each of the individual metrics using a set of constant parameters k1, k2kn to be determined by judgement and experiment. This yields another metric, k1·x1 + k2·x2 + … + kn·xn. It allows the down-weighting of less important metrics and the up-weighting of more important ones.

However, even with well-chosen ki parameters, this better metric has problems. Is it really a hundred times as good to have 10,000 Twitter followers than 100? Perhaps we might decide that it’s only ten times as good – that the value of a Twitter following scales with the square root of the count. Conversely, in some contexts at least, an H-index of 40 might be more than twice as good as one of 20. In a search for a candidate for a senior role, one might decide that the value of an H-index scales with the square of the value; or perhaps it scales somewhere between linearly and quadratically – with H-index1.5, say. So for full generality, the calculation of the “Less Wrong Metric”, or LWM for short, would be configured by two sets of parameters: factors k1, k2kn, and exponents e1, e2en. Then the formula would be:

LWM = k1·x1e1 + k2·x2e2 + … + kn·xnen

So that’s the idea of the LWM — and you can see now why I refer to this as a family of metrics. Given n metrics that you’re interested in, you pick 2n parameters to combine them with, and get a number that to some degree measures what you care about.

(How do you choose your 2n parameters? That’s the subject of the next post. Or, as before, you can skip ahead and read the paper.)

References

Like Stephen Curry, we at SV-POW! are sick of impact factors. That’s not news. Everyone now knows what a total disaster they are: how they are signficantly correlated with retraction rate but not with citation count; how they are higher for journals whose studies are less statistically powerful; how they incentivise bad behaviour including p-hacking and over-hyping. (Anyone who didn’t know all that is invited to read Brembs et al.’s 2013 paper Deep impact: unintended consequences of journal rank, and weep.)

Its 2016. Everyone who’s been paying attention knows that impact factor is a terrible, terrible metric for the quality of a journal, a worse one for the quality of a paper, and not even in the park as a metric for the quality of a researcher.

Unfortunately, “everyone who’s been paying attention” doesn’t seem to include such figures as search committees picking people for jobs, department heads overseeing promotion, tenure committees deciding on researchers’ job security, and I guess granting bodies. In the comments on this blog, we’ve been told time and time and time again — by people who we like and respect — that, however much we wish it weren’t so, scientists do need to publish in high-IF journals for their careers.

What to do?

It’s a complex problem, not well suited to discussion on Twitter. Here’s what I wrote about it recently:

The most striking aspect of the recent series of Royal Society meetings on the Future of Scholarly Scientific Communication was that almost every discussion returned to the same core issue: how researchers are evaluated for the purposes of recruitment, promotion, tenure and grants. Every problem that was discussed – the disproportionate influence of brand-name journals, failure to move to more efficient models of peer-review, sensationalism of reporting, lack of replicability, under-population of data repositories, prevalence of fraud – was traced back to the issue of how we assess works and their authors.

It is no exaggeration to say that improving assessment is literally the most important challenge facing academia.

This is from the introduction to a new paper which came out today: Taylor (2016), Better ways to evaluate research and researchers. In eight short pages — six, really, if you ignore the appendix — I try to get to grips with the historical background that got us to where we are, I discuss some of the many dimensions we should be using to evaluate research and researchers, and I propose a family of what I call Less Wrong Metrics — LWMs — that administrators could use if they really absolutely have to put a single number of things.

(I was solicited to write this by SPARC Europe, I think in large part because of things I have written around this subject here on SV-POW! My thanks to them: this paper becomes part of their Briefing Papers series.)

Next time I’ll talk about the LWM and how to calculate it. Those of you who are impatient might want to read the actual paper first!

References

Re-reading an email that Matt sent me back in January, I see this:

One quick point about [an interesting sauropod specimen]. I can envision writing that up as a short descriptive paper, basically to say, “Hey, look at this weird thing we found! Morrison sauropod diversity is still underestimated!” But I honestly doubt that we’ll ever get to it — we have literally years of other, more pressing work in front of us. So maybe we should just do an SV-POW! post about the weirdness of [that specimen], so that the World Will Know.

Although as soon as I write that, I think, “Screw that, I’m going to wait until I’m not busy* and then just take a single week* and rock out a wiper* on it.”

I realize that this way of thinking represents a profound and possibly psychotic break with reality. *Thrice! But it still creeps up on me.

(For anyone not familiar with the the “wiper”, it refers to a short paper of only one or two pages. The etymology is left as an exercise to the reader.)

It’s just amazing how we keep on and on falling for this delusion that we can get a paper out quickly, even when we know perfectly well, going into the project, that it’s not going to work out that way. To pick a recent example, my paper on quantifying the effect of intervertebral cartilage on neutral posture was intended to be literally one page, an addendum to the earlier paper on cartilage: title, one paragraph of intro, diagram, equation, single reference, DONE! Instead, it landed up being 11 pages long with five illustrations and two tables.

I think it’s a reasonable approximation to say that any given project will require about an order of magnitude more work than we expect at the outset.

Even as I write this, the top of my palaeo-work priority list is a paper that I’m working on with Matt and two other colleagues, which he kicked off on 6 May, writing:

I really, really want to kill this off absolutely ASAP. Like, seriously, within a week or two. Is that cool? Is that doable?

To which I idiotically replied:

IT SHALL BE SO!

A month and a bit later, the answers to Matt’s questions are clear. Yes, it’s cool; and no, it’s not doable.

The thing is, I think that’s … kind of OK. The upshot is that we end up writing reasonably substantial papers, which is after all what we’re meant to be trying to do. If the reasonably substantial papers that end up getting written aren’t necessarily the ones we thought they were going to be, well, that’s not a problem. After all, as I’ve noted before, my entire Ph.D dissertation was composed of side-projects, and I never got around to doing the main project. That’s fine.

In 2011, Matt’s tutorial on how to find problems to work on discussed in detail how projects grow and mutate and anastamose. I’m giving up on thinking that this is a bad thing, abandoning the idea that I ought to be in control of my own research program. I’m just going to keep chasing whatever rabbits look good to me at the time, and see what happens.

Onwards!

I’ll try to live-blog the first day of part 2 of the Royal Society’s Future of Scholarly Scientific Communication meeting, as I did for the first day of part 1. We’ll see how it goes.

Here’s the schedule for today and tomorrow.

Session 1: the reproducibility problem

Chair: Alex Halliday, vice-president of the Royal Society

Introduction to reproducibility. What it means, how to achieve it, what role funding organisations and publishers might play.

For an introduction/overview, see #FSSC – The role of openness and publishers in reproducible research.

Michele Dougherty, planetary scientist

It’s very humbling being at this meeting, when it’s so full of people who have done astonishing things. For example, Dougherty discovered an atmosphere around one of Saturn’s moons by an innovative use of magnetic field data. So many awesome people.

Her work is largely to do with very long-term project involving planetary probes, e.g. the Cassini-Huygens probe. It’s going to be interesting to know what can be said about reproducibility of experiments that take decades and cost billions.

“The best science output you can obtain is as a result of collaboration with lots of different teams.”

Application of reproducibility here is about making the data from the probes available to the scientific community — and the general public — so that the result of analysis can be reproduced. So not experimental replication.

Such data often has a proprietary period (essentially an embargo) before its public release, partly because it’s taken 20 years to obtain and the team that did this should get the first crack at it. But it all has to be made publicly available.

Dorothy Bishop, chair of Academy of Medical Sciences group on replicability

The Royal Society is very much not the first to be talking about replicability — these discussions have been going on for years.

About 50% of studies in Bishop’s field are capable of replication. Numbers are even worse in some fields. Replication of drug trials are particularly important, as false result kill people.

Journals cause awful problems with impact-chasing: e.g. high-impact journals will publish sexy-looking autism studies with tiny samples, which no reputable medical journal would publish.

Statistical illiteracy is very widespread. Authors can give the impression of being statistically aware but in a superficial way.

Too much HARKing going on (Hypothesising After Results Known — searching a dataset for anything that looks statistically significant in the shallow p < 0.05 sense.)

“It’s just assumed that people doing research, know what they are doing. Often that’s just not the case.”

many more criticisms of how the journal system encourages bad research. They’re coming much faster than I can type them. This is a storming talk, I wish the record would be made available.

Employers are also to blame for prioritising expensive research proposals (= large grants) over good ones.

All of this causes non-replicable science.

Floor discussion

Lots of great stuff here that I just can’t capture, sorry. Best follow the tweet stream for the fast-moving stuff.

One highlight: Pat Brown thinks it’s not necessarily a problem if lots of statistically underpowered studies are performed, so long as they’re recognised as such. Dorothy Bishop politely but emphatically disagrees: they waste resources, and produce results that are not merely useless but actively wrong and harmful.

David Colhoun comments from the floor: while physical sciences consider “significant results” to be five sigmas (p < 0.000001), biomed is satisfied with slightly less than two sigmas (p < 0.05) which really should be interpreted only as “worth another look”.

Dorothy Bishop on publishing data, and authors’ reluctance to do so: “It should be accepted as a cultural norm that mistakes in data do happen, rather than shaming people who make data open.”

Coffee break

Nothing to report :-)

Session 2: what can be done to improve reproducibility?

Iain Hrynaszkiewicz, head of data, Nature

In an analysis of retractions of papers in PubMed Central, 2/3 were due to fraud and 20% due to error.

Access to methods and data is a prerequisite for replicability.

Pre-registration, sharing of data, reporting guidelines all help.

“Open access is important, but it’s only part of the solution. Openness is a means to an end.”

Hrynaszkiewicz says text-miners are a small minority of researchers. [That is true now, but I and others are confident this will change rapidly as the legal and technical barriers are removed: it has to, since automated reading is the only real solution to the problem of keeping up with an exponentially growing literature. — Ed.]

Floor discussion

I’m at the Royal Society today and tomorrow as part of the Future of Scholarly Scientific Communication conference. Here’s the programme.

I’m making some notes for my own benefit, and I thought I might as well do them in the form of a blog-post, which I will continuously update, in case anyone else is interested.

I stupidly didn’t make notes on the first two speakers, but let’s pick up from the third:

Deborah Shorley, ex-librarian of Imperial College London

Started out by saying that she feels her opinion, as a librarian, is irrelevant, because librarians are becoming irrelevant. A pretty incendiary opening!

Important observations:

“Scientific communication in itself doesn’t matter; what matters is that good science be communicated well.”

And regarding the model of giving papers to publishers gratis, then paying them for the privilege of reading them:

“I can’t think of any other area where such a dopey business model pertains.”

(On which, see Scott Aaronson’s brilliant take on this in his review of The Access Principle — the article that first woke me up to the importance of open access.)

Shorey wants to bring publishing skills back in-house, to the universities and their libraries, and do it all themselves. As far as I can make out, she simply sees no need for specialist publishers. (Note: I do not necessarily endorse all these views.)

“If we don’t seize the opportunity, market forces will prevail. And market forces in this case are not pretty.”

Robert Parker, ex-head of publishing, Royal Society of Chemistry

Feels that society publishers allowed themselves to be overtaken by commercial publishers. Notes that when he started working for the RSC’s publishing arm, it was “positively dickensian”, using technology that would mostly have been familiar to Gutenberg. Failure to engage with authors and with technology allowed the commercial publishers to get ahead — something that is only now being redressed.

He’s talking an awful lot about the impact factors of their various journals.

My overall impression is that his perspective is much less radical than that of Deborah Shorley, wanting learned-society publishers to be better able to compete with the commercial publishers.

Gary Evoniuk, policy director at Glaxo Smith Klein

GSK submits 300-400 scientific studies for publication each year.

Although the rise of online-only journals means there is no good reason to not publish any finding, they still find that negative results are harder to get published.

“The paper journal, and the paper article, will soon be dead. This makes me a little bit sad.”

He goes further and wonders whether we need journal articles at all? When actual results are often available long before the article, is the context and interpretation that it provides valuable enough to be worth all the effort that’s expended on it? [My answer: yes — Ed.]

Discussion now follows. I probably won’t attempt to blog it (not least because I will want to participate). Better check out the twitter stream.

Nigel Shadbolt, Open Data Institute

Begin by reflecting on a meeting ten years ago, convened at Southampton by Stevan Harnad, on … the future of scholarly scientific communication.

Still optimistic about the Semantic Web, as I guess we more or less have to be. [At least, about many separate small-sw semantic webs — Ed.] We’re starting to see regular search-engines like Google taking advantage of available machine-readable data to return better results.

Archiving data is important, of course; but it’s also going to be increasingly important to archive algorithms. github is a useful prototype of this.

David Lambert, president/CEO, internet2

Given how the digital revolution has transformed so many fields (shopping, auctions, newspapers, movies) why has scholarly communication been so slow to follow? [Because the incumbents with a vested interesting in keeping things as they are have disproportionate influence due to their monopoly ownership of content and brands — Ed.]

Current publication models are not good at handling data. So we have to build a new model to handle data. In which case, why not build a new model to handle everything?

New “born-digital” researchers are influenced by the models of social networks: that is going to push them towards SN-like approaches of communicating more stuff, more often, in smaller unit. This is going to affect how scholarly communication is done.

Along with this goes an increasing level of comfort with collaboration. [I’m not sure I see that — Ed.]

Bonus section: tweets from Stephen Curry

He posted these during the previous talk. Very important:

Ritu Dhand, Nature

[A disappointing and unconvincing apologia for the continuing existence and importance of traditional publishers, and especially Nature. You would think that they, and they alone, guard the gates of academia from the barbarians. *sigh*. — Ed.]

Lunch

Georgina Mace, UCL

[A defence of classical peer-review. Largely an overview of how peer-review is supposed to work.]

“It’s not perfect, it has its challenges, but it’s not broken yet.”

Richard Smith, ex-editor of BMJ

[An attack on classical peer-review.]

“Peer review is faith-, not evidence-based; ineffective; a lottery; slow; expensive; wasteful; ineffective; easily abused; biased; doesn’t detect fraud; irrelevant.

Apart from that, it’s perfect.”

He doesn’t want to reform peer-review, he wants to get rid of it. Publish, let the world decide. That’s the real peer-review.

He cites studies supporting his assertions. Cochrane review concluded there is no evidence that peer-review is effective. The Ioannidis paper shows that most published findings are false.

Someone should be recording this talk. It’s solid gold.

Annual cost of peer-review is $1.9 billion.

[There is much, much more. I can’t get it down quickly enough.]

 Georgina Mace’s rebuttal

… amounts to contradicting Richard Smith’s evidence-supported statements, but she provides no evidence in support of her position.

Richard Smith’s counter-counter rebuttal

… cites a bunch more studies. This is solid. Solid.

For those who missed out, see Smith’s equally brutal paper Classical peer review: an empty gun. I find his conclusion (that we should just dump peer-review) emotionally hard to accept, but extremely compelling based on actual, you know, evidence.

Fascinating to hear the level of denial in the room. People really, really want to keep believing in peer-review, in spite of evidence. I understand that impulse, but I think it’s unbecoming in scientists.

The challenge for peer-review advocates is: produce evidence that it has value. No-one has responded to that.

Richard Sever, Cold Spring Harbour Press

Richard presents the BiorXive preprint server. Turns out it’s pronounced “bio-archive”, not “bye-orx-ive”.

Nothing in this talk will be new to regular SV-POW! readers, but he makes good, compelling points in favour of preprinting (which we of course agree with!)

Elizabeth Marincola, CEO, PLOS

PLOS is taking steps towards improving peer-review:

  • Use of article-level metrics
  • Moves towards open review
  • Move toward papers evolving over time, not being frozen at the point of publication
  • Better recognition of different kinds of contribution to papers
  • Intention to make submitted paper available to view before peer-review has been carried out, subject only to checks on ethical and technical standard: they aim to make papers available in “a matter of days”.

She notes that much of this is not original: elements of these approaches are in F1000 Research, BiorXiv, etc.

Jan Velterop, science publisher with everyone at some point.

“I’m basically with Richard Smith when it comes to abolishing peer review, but I have a feeling it won’t happen in the next few weeks.”

The situation of publishers:

“Academia throws money at you. What do you do? You pick it up.”

Velterop gets a BIG laugh for this:

“Does peer-review benefit science? I think it does; and it also benefits many other journals.”

He quotes a Scholarly Kitchen blog-post[citation needed] as saying that the cost of technical preparation at PubMed Central — translating from an MS-Word manuscript to valid JATS XML — at $47. So why do we pay $3000 APCs? Surely the peer-review phase doesn’t cost $2953?

Update: here is that Scholarly Kitchen article.

Velterop’s plan is to streamline the review-and-publish process as follows:

  • Author writes manuscript.
  • She solicits reviews from two experts, using her own knowledge of the field to determine who is suitably skilled.
  • They eventually sign off (perhaps after multiple rounds of revisions)
  • The author submits the manuscript, along with the endorsements.
  • The editor checks with the endorsers that they really have given endorsement.
  • The article is posted.

Bam, done!

And at that point in the proceedings, my battery was running dangerously low. I typed a tweet: “low battery may finally force me to shut up! #RSSC”, but literally between typing at and hitting the Tweet button, my laptop shut down. So that’s it for day 1. I’ll do a separate post for the second and final day.

In a comment on the last post, Mark Robinson asked an important question:

You linked to the preprint of your The neck of Barosaurus was not only longer but also wider than those of Diplodocus and other diplodocines submission – does this mean that it has not yet been formally published?

As so often in these discussions, it depends what we mean by our terms. The Barosaurus paper, like this one on neck cartilage, is “published” in the sense that it’s been released to the public, and has a stable home at a well known location maintained by a reputable journal. It’s open for public comment, and can be cited in other publications. (I notice that it’s been cited in Wikipedia). It’s been made public, which after all is the root meaning of the term “publish”.

On the other hand, it’s not yet “published” in the sense of having been through a pre-publication peer-review process, and perhaps more importantly it’s not yet been made available via other channels such as PubMed Central — so, unlike say our previous PeerJ paper on sauropod neck anatomy, it would in some sense go away if PeerJ folded or were acquired by a hostile entity. But then the practical truth is of course that we’d just make it directly available here on SV-POW!, where any search would find it.

In short, the definition of what it means for a paper to be “published” is rather fluid, and is presently in the process of drifting. More than that, conventions vary hugely between fields. In maths and astronomy, posting a preprint on arXiv (their equivalent of PeerJ Preprints, roughly) pretty much is publication. No-one in those fields would dream of not citing a paper that had been published in that way, and reputations in those fields are made on the basis of arXiv preprints. [Note: I was mistaken about this, or at least oversimplified. See David Roberts’ and Michael Richmond’s comments below.]

Maybe the most practical question to ask about the published-ness or otherwise of a paper is, how does it affect the author’s job prospects? When it comes to evaluation by a job-search panel, or a promotion committee, or a tenure board, what counts? And that is a very hard question to answer, as it depends largely on the institution in question, the individuals on the committee, and the particular academic field. My gut feeling is that if I were looking for a job in palaeo, the Barosaurus preprint and this cartilage paper would both count for very little, if anything. But, candidly, I consider that a bug in evaluation methods, not a problem with pre-printing per se. But then again, it’s very easy for me to say that, as I’m in the privileged position of not needing to look for a job in palaeo.

For Matt and me, at least as things stand right now, we do feel that we have unfinished business with these papers. In their present state, they represent real work and a real (if small) advance in the field; but we don’t feel that our work here is done. That’s why I submitted the cartilage paper for peer-review at the same time as posting it as a preprint (it’s great that PeerJ lets you do both together); and it’s why one of Matt’s jobs in the very near future will be getting the Barosaurus revised in accordance with the very helpful reviews that we received, and then also submitted for peer-review. We do still want that “we went through review” badge on our work (without believing it means more than it really does) and the archiving in PubMed Central and CLOCKSS, and the removal of any reason for anyone to be unsure whether those papers “really count”.

But I don’t know whether in ten years, or even five, our attitude will be the same. After all, it changed long ago in maths and astronomy, where — glory be! — papers are judged primarily on their content rather than on where they end up published.