As recently noted, it was my pleasure and privilege on 25 June to give a talk at the ESOF2014 conference in Copenhagen (the EuroScience Open Forum). My talk was one of four, followed by a panel discussion, in a session on the subject “Should science always be open?“.

 

Banner

I had just ten minutes to lay out the background and the problem, so it was perhaps a bit rushed. But you can judge for yourself, because the whole session was recorded on video. The image is not the greatest (it’s hard to make out the slides) and the audio is also not all it could be (the crowd noise is rather loud). But it’s not too bad, and I’ve embedded it below. (I hope the conference organisers will eventually put out a better version, cleaned up by video professionals.)

Subbiah Arunachalam (from Arun, Chennai, India) asked me whether the full text of the talk was available — the echoey audio is difficult for non-native English speakers. It wasn’t but I’ve sinced typed out a transcript of what I said (editing only to remove “er”s and “um”s), and that is below. Finally, you may wish to follow the slides rather than the video: if so, they’re available in PowerPoint format and as a PDF.

Enjoy!

It’s very gracious of you all to hold this conference in English; I deeply appreciate it.

“Should science always be open?” is our question, and I’d like to open with one of the greatest scientists there’s ever been, Isaac Newton, who humility didn’t come naturally to. But he did manage to say this brilliant humble thing: “If I have seen further, it’s by standing on the shoulders of giants.”

And the reason I love this quote is not just because it’s insightful in itself, but because he stole it from something John of Salisbury said right back in 1159. “Bernard of Chartres used to say that we were like dwarfs seated on the shoulders of giants. If we see more and further than they, it is not due to our own clear eyes or tall bodies, but because we are raised on high and upborne by their gigantic bigness.”

Well, so Newton — I say he stole this quote, but of course he did more than that: he improved it. The original is long-winded, it goes around the houses. But Newton took that, and from that he made something better and more memorable. So in doing that, he was in fact standing on the shoulders of giants, and seeing further.

And this is consistently where progress comes from. It’s very rare that someone who’s locked in a room on his own thinking about something will have great insights. It’s always about free exchange of ideas. And we see this happening in lots of different fields.

Over the last ten or fifteen years, enormous advances in the kinds of things computers working in networks can do. And that’s come from the culture of openness in APIs and protocols, in Silicon Valley and elsewhere, where these things are designed.

Going back further and in a completely different field, the Impressionist painters of Paris lived in a community where they were constantly — not exactly working together, but certainly nicking each other’s ideas, improving each other’s techniques, feeding back into this developing sense of what could be done. Resulting in this fantastic art.

And looking back yet further, Florence in the Renaissance was a seat of all sorts of advances in the arts and the sciences. And again, because of this culture of many minds working together, and yielding insights and creativity that would not have been possible with any one of them alone.

And this is because of network effects; or Metcalfe’s Law expresses this by saying that the value of a network is proportional to the square of the number of nodes in that network. So in terms of scientific reasearch, what that means is that if you have a corpus of published research output, of papers, then the value of that goes — it doesn’t just increase with the number of papers, it goes up with the square of the number of papers. Because the value isn’t so much in the individual bits of research, but in the connections between them. That’s where great ideas come from. One researcher will read one paper from here and one from here, and see where the connection or the contradiction is; and from that comes the new idea.

So it’s very important to increase the size of the network of what’s available. And that’s why we have a very natural tendency, I think among scientists particularly, but I think we can say researchers in other areas as well, have a natural tendency to share.

Now until recently, the big difficulty we’ve had with sharing has been logistical. It was just difficult to make and distribute copies of pieces of research. So this [picture of a printing press] is how we made copies, this [picture of stacks of paper] was what we stored them on, and this was how we transmitted them from one researcher to another.

And they were not the most efficient means, or at least not as efficient as what we now have available. And because of that, and because of the importance of communication and the links between research, I would argue that maybe the most important invention of the last hundred years is the Internet in general and the World Wide Web in particular. And the purpose of the Web, as it was initially articulated in the first public post that Tim Berners-Lee made in 1991 — he explained not just what the Web was but what it was for, and he said: “The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.”

So that’s what the Web is for; and here’s why it’s important. I’m quoting here from Cameron Neylon, who’s great at this kind of thing. And again it comes down to connections, and I’m just going to read out loud from his blog: “Like all developments of new communication networks, SMS, fixed telephones, the telegraph, the railways, and writing itself, the internet doesn’t just change how well we can do things, it qualitatively changes what we can do.” And then later on in the same post: “At network scale the system ensures that resources get used in unexpected ways. At scale you can have serendipity by design, not by blind luck.”

Now that’s a paradox; it’s almost a contradiction, isn’t it? Serendipity by definition is what you get by blind luck. But the point is, when you have enough connections — enough papers floating around the same open ecosystem — all the collisions happening between them, it’s inevitable that you’re going to get interesting things coming out. And that’s what we’re aiming towards.

And of course it’s never been more important, with health crises, new diseases, the diminishing effectiveness of antibiotics, the difficulties of feeding a world of many billions of people, and the results of climate change. It’s not as though we’re short of significant problems to deal with.

So I love this Jon Foley quote. He said, “Your job” — as a researcher — “Your job is not to get tenure! Your job is to change the world”. Tenure is a means to an end, it’s not what you’re there for.

So this is the importance of publishing. Of course the word “publish” comes from the same root as the word “public”: to publish a piece of research means to make that piece of research public. And the purpose of publishing is to open research up to the world, and so open up the world itself.

And that’s why it’s so tragic when we run into this [picture of a paywalled paper]. I think we’ve all seen this at various times. You go to read a piece of research that’s valuable, that’s relevant to either the research you’re doing, or the job you’re doing in your company, or whatever it might be. And you run into this paywall. Thirty five dollars and 95 cents to read this paper. It’s a disaster. Because what’s happened is we’ve got a whole industry whose existence is to make things public, and who because of accidents of history have found themselves doing the exact opposite. Now no-one goes into publishing with the intent of doing this. But this is the unfortunate outcome.

So what we end up with is a situation where we’re re-imposing on the research community barriers that were necessarily imposed by the inadequate technology of 20 or 30 years ago, but which we’ve now transcended in technological terms but we’re still strugging with for, frankly, commercial reasons. This is why we’re struggling with this.

And I don’t like to be critical, but I think we have to just face the fact that there is a real problem when organisations, for many years have been making extremely high profits — these [36%, 32%, 34%, 42%] are the profit margins of the “big four” academic publishers which together hugely dominate the scholarly publishing market — and as you can see they’re in the range 32% to 42% of revenue, is sheer profit. So every time your university library spends a dollar on subscriptions, 40% of that goes straight out of the system to nowhere.

And it’s not surprising that these companies are hanging on desperately to the business model that allows them to do that.

Now the problem we have in advocating for open access is that when we stand against publishers who have an existing very profitable business model, they can complain to governments and say, “Look, we have a market that’s economically significant, it’s worth somewhere in the region of 10-15 billion US dollars a year.” And they will say to governments, “You shouldn’t do anything that might damage this.” And that sounds effective. And we struggle to argue against that because we’re talking about an opportunity cost, which is so much harder to measure.

You know, I can stand here — as I have done — and wave my hands around, and talk about innovation and opportunity, and networks and connections, but it’s very hard to quantify in a way that can be persuasive to people in a numeric way. Say, they have a 15 billion dollar business, we’re talking about saving three trillion’s worth of economic value (and I pulled that number out of thin air). So I would love, if we can, when we get to the discussions, to brainstorm some way to quantify the opportunity cost of not being open. But this is what it looks like [picture of flooding due to climate change]. Economically I don’t know what it’s worth. But in terms of the world we live in, it’s just essential.

So we’ve got to remember the mission that we’re on. We’re not just trying to save costs by going to open access publishing. We’re trying to transform what research is, and what it’s for.

So should science always be open? Of course, the name of the session should have been “Of course science should always be open”.

 

Today, available for the first time, you can read my 2004 paper A survey of dinosaur diversity by clade, age, place of discovery and year of description. It’s freely available (CC By 4.0) as a PeerJ Preprint. It’s one of those papers that does exactly what it says on the tin — you should be able to find some interesting patterns in the diversity of your own favourite dinosaur group.

Fig. 1. Breakdown of dinosaur diversity by phylogeny. The number of genera included in each clade is indicated in parentheses. Non-terminal clades additionally have, in square brackets, the number of included genera that are not also included in one of the figured subclades. For example, there are 63 theropods that are neither carnosaurs nor coelurosaurs. The thickness of the lines is proportional to the number of genera in the clades they represent.

Taylor (2014 for 2004), Figure 1. Breakdown of dinosaur diversity by phylogeny. The number of genera included in each clade is indicated in parentheses. Non-terminal clades additionally have, in square brackets, the number of included genera that are not also included in one of the figured subclades. For example, there are 63 theropods that are neither carnosaurs nor coelurosaurs. The thickness of the lines is proportional to the number of genera in the clades they represent.

“But Mike”, you say, “you wrote this thing ten years ago?”

Yes. It’s actually the first scientific paper I ever wrote (bar some scraps of computer science) beginning in 2003. It’s so old that all the illustrations are grey-scale. I submitted it to Acta Palaeontologica Polonica way back on on 24 October 2004 (three double-spaced hard-copies in the post!) , but it was rejected without review. I was subsequently able to publish a greatly truncated version (Taylor 2006) in the proceedings of the 2006 Symposium on Mesozoic Terrestrial Ecosystems, but that was only one tenth the length of the full manuscript — much potentially valuable information was lost.

My finally posting this comes (as so many things seem to) from a conversation with Matt. Off work sick, he’d been amusing himself by re-reading old SV-POW! posts (yes, we do this). He was struck by my exhortation in Tutorial 14: “do not ever give a conference talk without immediately transcribing your slides into a manuscript”. He bemoaned how bad he’s been at following that advice, and I had to admit I’ve done no better, listing a sequence of old my SVPCA talks that have still never been published as papers.

The oldest of these was my 2004 presentation on dinosaur diversity. Commenting on this, I wrote in email: “OK, I got the MTE four-pager out of this, but the talk was distilled from a 40ish-page manuscript that was never published and never will be.” Quick as a flash, Matt replied:

If I had written this and sent it to you, you’d tell me to put it online and blog about how I went from idea to long paper to talk to short paper, to illuminate the process of science.

And of course he was right — hence this preprint.

Fig. 2. Breakdown of dinosaurian diversity by high-level taxa. "Other sauropodomorphs" are the "prosauropods" sensu lato. "Other theropods" include coelophysoids, neoceratosaurs, torvosaurs (= megalosaurs) and spinosaurs. "Other ornithischians" are basal forms, including heterodontosaurs and those that fall into Marginocephalia or Thyreophora but not into a figured subclade.

Taylor (2014 for 2004), Figure 2. Breakdown of dinosaurian diversity by high-level taxa. “Other sauropodomorphs” are the “prosauropods” sensu lato. “Other theropods” include coelophysoids, neoceratosaurs, torvosaurs (= megalosaurs) and spinosaurs. “Other ornithischians” are basal forms, including heterodontosaurs and those that fall into Marginocephalia or Thyreophora but not into a figured subclade.

I will never update this manuscript, as it’s based on a now wildly outdated database and I have too much else happening. (For one thing, I really ought to get around to finishing up the paper based on my 2005 SVPCA talk!) So in a sense it’s odd to call it a “pre-print” — it’s not pre anything.

Despite the data being well out of date, this manuscript still contains much that is (I think) of interest, and my sense is that the ratios of taxon counts, if not the absolute numbers, are still pretty accurate.

I don’t expect ever to submit a version of this to a journal, so this can be considered the final and definitive version.

References

 

I think it’s fair to say that this “bifurcation heat-map”, from Wedel and Taylor (2013a: figure 9), has been one of the best-received illustrations that we’ve prepared:

Wedel and Taylor 2013 bifurcation Figure 9 - bifurcatogram

(See comments from Jaime and from Mark Robinson.)

Back when the paper came out, Matt rashly said “Stand by for a post by Mike explaining how it came it be” — a post which has not materialised. Until now!

This illustration was (apart from some minor tweaking) produced by a program that I wrote for that purpose, snappily named “vcd2svg“. That name is because it converts a vertebral column description (VCD) into a scalable vector graphics (SVG) file, which you can look at with a web-browser or load into an image editor for further processing.

The vertebral column description is in a format designed for this purpose, and I think it’s fairly intuitive. Here, for example, is the fragment describing the first three lines of the figure above:

Taxon: Apatosaurus louisae
Specimen: CM 3018
Data: —–YVVVVVVVVV|VVVuuunnn-

Taxon: Apatosaurus parvus
Specimen: UWGM 155556/CM 563
Data: –nnn-VVV—V-V|VVVu——

Taxon: Apatosaurus ajax
Specimen: NMST-PV 20375
Data: –n–VVVVVVVVVV|VVVVYunnnn

Basically, you draw little ASCII pictures of the vertebral column. Other directives in the file explain how to draw the various glyphs represented by (in this case) “Y”, “V”, “u”, and “n”.

It’s pretty flexible. We used the same program to generate the right-hand side (though not the phylogenetic tree) of Wedel and Taylor (2013b: figure 2):

Wedel and Taylor (2013b: Figure 2).

Wedel and Taylor (2013b: Figure 2).

The reason I mention this is because I released the software today under the GNU General Public Licence v3.0, which is kind of like CC By-SA. It’s free for anyone to download, use, modify and redistribute either verbatim or in modified form, subject only to attribution and the requirement that the same licence be used for modified versions.

vcd2svg is written in Perl, and implemented in part by the SVG::VCD module, which is included in the package. It’s available as a CPAN module and on GitHub. There’s documentation of the command-line vcd2svg program, and of the VCD file format. Also included in the distribution are two documented examples: the bifurcation heat-map and the caudal pneumaticity diagram.

Folks, please use it! And feel free to contribute, too: as the change-log notes, there’s work still to be done, and I’ll be happy to take pull requests from those of you who are programmers. And whether you’re a programmer or not, if you find a bug, or want a new feature, feel free to file an issue.

A final thought: in academia, you don’t really get credit for writing software. So to convert the work that went into this release into some kind of coin, I’ll probably have to write a short paper describing it, and let that stand as a proxy for the actual program. Hopefully people will cite that paper when they generate a figure using the software, the way we all reflexively cite Swofford every time we use PAUP*.

Update (12 April 2014)

On Vertebrat’s suggestion, I have renamed the program VertFigure.

References

I just read this on Zen Faulkes’ NeuroDojo blog:

How should scientists, and reporters, discuss work that has failed to replicate? The original Barr and colleagues article remains in the scientific literature; failed replication alone is not grounds for retraction.

He’s right, of course: we certainly don’t want to retract every paper whose conclusions can’t be replicated, for all sorts of reasons: they may subsequently be replicated after all; the paper may contain other useful information even if the experiment in question was flawed; the replication studies themselves probably rely on the original’s Methods section; authors should not be punished for unfortunate outcomes unless they were fraudulently obtained.

What we want is for that Barr et al paper, whenever anyone looks at it, to be displayed with a prominent header that says “The following studies attempted to replicate this finding but failed:”, and a list of references/links. And, for that matter, another header saying that the following other studies did replicate it.

For web-sites to automatically produce that kind of annotation, they need articles that cite the original to include an additional piece of metadata, along with the author/year/title/journal/etc. metadata that identifies the cited paper. That additional ingredient is the citation’s type, which should be one of a small set of defined values.

What values are relevant? I won’t try to come up with an exhaustive list at this point, but obvious ones include:

  • Replicates — the current paper replicates work done in the cited paper (and so provides evidence, though not proof, that the cited paper’s conclusion is correct).
  • FailsToReplicate — the current paper attempts to replicate work done in the cited paper, but fails (and so provides evidence that the cited paper is mistaken).
  • Falsifies — the current paper shows definitely that the cited paper is wrong. This is a stronger statement than FailsToReplicate, and would be used for example when the new work shows conclusively that the experimental protocol of the original was critically flawed.
  • DependsOn — the current paper depends on information from the cited paper, such as the phylogeny that it proposes or the vertebral formula that it gives. For these purposes, the cited paper is treated as an authoritative source.
  • Acknowledges — the current paper uses ideas proposed in the cited paper, and gives credit to the original.

(We discussed the distinction between those last two previously.)

There are all sorts of practical issues that will impede the adoption of this idea (not least the idiot fact that the citation graph is a trade secret rather than a freely available database), but let’s ignore those for now, and figure out what taxonomy of citation-types we want.

Stop what you’re doing and go read Cameron Neylon’s blog. Specifically, read his new post, Improving on “Access to Research”.

Regular readers of SV-POW! might legitimately complain that my so-called advocacy consists mostly of whining about how rubbish things are. If you find that wearying (and I won’t blame you if you do), then read Cameron instead: he goes beyond critiquing what is, and sees what could be. Here is a key quote on this new post:

I did this on a rainy Saturday afternoon because I could, because it helped me learn a few things, and because it was fun. I’m one of tens or hundreds of thousands who could have done this, who might apply those skills to cleaning up the geocoding of species in research articles, or extracting chemical names, or phylogenetic trees, or finding new ways to understand the networks of influence in the research literature. I’m not going to ask for permission, I’m not going to go out of my way to get access, and I’m not going to build something I’m not allowed to share. A few dedicated individuals will tackle the permissions issues and the politics. The rest will just move on to the next interesting, and more accessible, puzzle.

Right! Open access is not about reducing subscription costs to libraries, or about  slicing away the absurd profits of the legacy publishers, or about a change to business models. It’s about doing new and exciting things that simply weren’t possible before.

 

From the files of J. K. Rowling.

Dear Ms. Rowling,

Thank you for submitting your manuscript Harry Potter and the Half-Blood Prince. We will be happy to consider it for publication. However we have some concerns about the excessive length of this manuscript. We usually handle works of 5-20 pages, sometimes as much as 30 pages. Your 1337-page manuscript exceeds these limits, and requires some trimming.

We suggest that this rather wide-ranging work could usefully be split into a number of smaller, more tightly focussed, papers. In particular, we feel that the “magic” theme is not appropriate for our venue, and should be excised from the current submission.

Assuming you are happy to make these changes, we will be pleased to work with you on this project.

Correspondence ends.

Esteemed Joenne Kay Rowling,

We are delightful to recieve your manuscript Harry Potter and the Half-Blood Prince and we look forword to publish it in our highly prestigious International Journal of Story Peer Reviewed which in 2013 is awarded an impact factor of 0.024.

Before we can progression this mutually benefit work, we require you to send a cheque for $5,000 US Dollars to the above address.

Correspondence ends.

Dear J.R.R. Rowling,

We are in receipt of your manuscript Harry Potter and the Half-Blood Prince. Unfortunately, after a discussion with the editorial board, we concluded that it is insufficiently novel to warrant publication in our journal, which is one of the leading venues in its field. Although your work is well executed, it does not represent a significant advance in scholarship.

That is not to say that minor studies such as yours are of no value, however! Have you considered one of the smaller society journals?

Correspondence ends.

Dear Dr. Rowling

Your submission Harry Potter and the Half-Blood Prince has passed initial editorial checks and will now be sent to two peer-reviewers. We will contact you when we have their reports and are able to make a decision.

Dear Dr. Rowling

Re: Harry Potter and the Half-Blood Prince.

We agree that eighteen months is too long for a manuscript to spend in review. On making inquiries, we find that we are unfortunately no longer able to contact the editor who was handling your submission.

We have appointed a new handling editor, who will send your submission to two new reviewers. We will contact you as soon as the new editor has made a decision.

Dear Dr. Rowling

Re: Harry Potter and the Half-Blood Prince.

Your complaint is quite justified. We will chase the reviewers.

Dear Dr. Rowling

I am pleased to say that the reviewers have returned their reports on your submission Harry Potter and the Half-Blood Prince and we are able to make an editiorial decision, which is ACCEPT WITH MAJOR REVISION.

Reviewer 1 felt that the core point of your contribution could be made much more succinctly, and recommended that you remove the characters of Ron, Hermione, Draco, Hagrid and Snape. I concur with his assessment that the final version will be tighter and stronger for these cuts, and am confident that you can make them in a way that does not compromise the plot.

Reviewer 2 was positive over all, but did not like being surprised by the ending, and felt that it should have been outlined in the abstract. She also felt that citation of earlier works including Lewis (1950, 1951, 1952, 1953, 1954, 1955, 1956) and Pullman (1995, 1997, 2000) would be appropriate, and noted an over-use of constructions such as “… said Hermione, warningly”.

Dear Dr. Rowling

Thank you for your revised manuscript of Harry Potter and the Half-Blood Prince, which it is our pleasure to accept. We now ask you to sign the attached copyright transfer form, so we can proceed with publication.

Dear Dr. Rowling

I am sorry that you are unhappy about this, but transfer of copyright is our standard procedure, and we must insist on it as a prerequisite for publication. None of our other authors have complained.

Dear Dr. Rowling

Thank you for the signed copyright transfer form.

In answer to your query, no, we do not pay royalties.

Dear Dr. Rowling

Sadly, no, we are unable to make an exception in the matter of royalties.

Dear Dr. Rowling

Your book has now been formatted. We attach a proof PDF. Please read this very carefully as this is the last chance to spot errors.

You will readily appreciate that publishing is an expensive business. In order to remain competitive we have had to reduce costs, and as a result we are no longer able to offer proof-reading or copy-editing. Therefore you are responsible for ensuring the copy is clean.

At this stage, changes should be kept as small as possible, otherwise a charge may be incurred for re-typesetting.

Dear Dr. Rowling

Many thanks for returning the corrected proofs of Harry Potter and the Half-Blood Prince. We will proceed with publication.

Now that the final length of your contribution is known, we are able to assess page charges. At 607 pages, this work exceeds our standard twenty free pages by 587. At $140 US per page, this comes to $82,180. We would be grateful if you would forward us a cheque for this amount at your convenience.

Dear Dr. Rowling

Thank you for you prompt payment of the page charges. We agree that these are regrettable, but sadly they are part of the reality of the publishing business.

We are delighted to inform you that Harry Potter and the Half-Blood Prince is now published online, and has been assigned the DOI 10.123.45678.

We thank you for working on this fine contribution with us, and hope you will consider us for your future publications.

Dear Dr. Rowling

You are correct, your book is not freely downloadable. As we explained earlier in this correspondence, publishing is an expensive business. We recover our substantial costs by means of subscriptions and paid downloads.

In our experience, those with the most need to read your book will probably have institutional access. As for those who do not: if your readers are as keen as you say, they will no doubt find the customary download fee of $37.95 more than reasonable. Alternatively, readers can rent online access at the convenient price of $9.95 per 24 hours.

Dear Dr. Rowling

I am sorry that you feel the need to take that tone. I must reiterate, as already stated, that the revenues from download charges are not sufficient for us to be able to pay royalties. The $37.95 goes to cover our own costs.

If you wish for your book to be available as “open access”, then you may take advantage of our Freedom Through Slavery option. This will attract a further charge of $3,000, which can be paid by cheque as previously.

Dr. Rowling

Your attitude is really quite difficult to understand. All of this was quite clearly set out on our web-site, and should have been understood by you before you made your submission.

As stated in the copyright transfer form that you signed, you do not retain the right to post freely downloadable copies of your work, since you are no longer the copyright holder.

Dr. Rowling

We must ask you not to contact your handling editor directly. He was quite shaken by your latest outburst. If you feel you must write to us again, we must ask you to moderate your language, which is quite unsuitable for a lady. Meanwhile, we remind you that our publishing agreement follows industry best practice. It’s too late to complain about it now.

Correspondence ends.

Dear Pyramid Web-Hosting,

Copyright claim

We write on behalf of our client, Ancient Monolith Scholarly Publishing, who we assert are the copyright holders of Harry Potter and the Half-Blood Prince. It has come to our attention that a copy of this copyrighted work has been posted on a site hosted by you at the URL below.

This letter is official notification under the provisions of Section 512(c) of the Digital Millennium Copyright Act (“DMCA”) to effect removal of the above-reported infringement. We request that you immediately issue a cancellation message as specified in RFC 1036 for the specified posting and prevent the infringer, Ms. J. K. Rowling, from posting the infringing material to your servers in the future. Please be advised that law requires you, as a service provider, to “expeditiously remove or disable access to” the infringing material upon receiving this notice. Noncompliance may result in a loss of immunity for liability under the DMCA.

Please send us at the address above a prompt response indicating the actions you have taken to resolve this matter.

Correspondence ends.

Historical Note

Examination of Ms. Rowling’s personal effects established that she had written most of a seventh book, Harry Potter and the Deathly Hallows. However, Rowling never sought to publish this final book in the series.

Regular readers will remember Jennifer Raff’s guest post on the PeerJ blog, How To Become Good At Peer-Review; and my response to it, Three points of disagreement. Today I read a very different take on this piece by Chorasimilarity, who is a frequent commenter here at SV-POW!: Two pieces of all too obvious propaganda.

Chorasimilarity starts by taking the original piece to task — fairly, I think — for its opening statement that “peer review is at the heart of the scientific method”. It’s true that the scientific method is something rather different. But as I argued in Science is enforced humility, peer-review is part of the scaffolding that prevents individual scientists from running away with their own ideas, unchecked by consensus wisdom.

Chorasimilarity then goes on to make a stronger criticism of peer-review:

Peer review is an idea based on authority, not on science [...] the quote mentions that “one’s research must survive the scrutiny of experts before it is presented to the larger scientific community as worthy of serious consideration”, which would be just sad, dinosaurish speaking, if it would come from an old person who did not understood that today there is, or there should be, free access to information.

As we’ve discussed here before, having been through peer review certainly does not mean we can trust a published paper. People do sometimes talk as though this is the case, and it’s an absolutely fallacy that we should be quick to rebut whenever we encounter it.

But it does have a weaker, yet still non-negligible, value.

The real value of peer-review not as a mark of correctness, but of seriousness. Back in the original SV-POW! series on peer-review (Where peer-review went wrongSome more of peer-review’s greatest mistakesWhat is this peer-review process anyway?, Well, that about wraps it up for peer-review), I likened peer-review to hazing:

The best analogy for our current system of pre-publication peer-review is that it’s a hazing ritual. It doesn’t exist because of any intrinsic value it has, and it certainly isn’t there for the benefit of the recipient. It’s basically a way to draw a line between In and Out. Something for the inductee to endure as a way of proving he’s made of the Right Stuff.

So: the principle value of peer-review is that it provides an opportunity for authors to demonstrate that they are prepared to undergo peer-review.

When I first wrote that, I wrote it in a spirit of cynicism and in frustration that so much of the effort that goes into the process is thrown away and that the results are so arbitrary. Those are real and serious complaints, but I’ve since come around to the idea that peer-review is useful in that the hazing aspect enables it to clear a much lower bar. Being prepared to undergo peer-review really is a mark of seriousness.

I would imagine that everyone involved in dinosaur research occasionally gets unsolicited emails from cranks and from as-yet unpublished amateurs. One of the most reliable ways to distinguish the two groups is this: serious amateurs are trying to figure out how to get their work into peer-review, while cranks are either actively avoiding it or not even aware of it. That’s why the web is full of sites like Dinosaur Home, with all its fine pictures of pebbles, which can continue on their merry way free of scrutiny.

I do think that the benefits of traditional peer-review are usually greatly overstated and the costs (both direct and indirect) underestimated. But I’m coming down on the side that its barrier-to-cranks effect might just tip the balance in favour of retaining it.

Think your work has scientific value? Good. Prove it, by showing it to professionals. If you won’t do that, then the rest of us don’t need to expend mental energy on taking you seriously.

 

Jennifer Raff wrote a useful guest post on the PeerJ Blog: How To Become Good At Peer-Review. Most of its advice is excellent, and I’d heartily recommend it to anyone starting out on reviewing. But there are three points where I disagree with it. Here are the three things Jennifer said, and my counter-points.

1. Communicating with authors

“Don’t communicate with the authors about their manuscript. All thoughts and comments on it should only go to the editor.”

This may be different in different academic fields, but I’ve been contacted by reviewers of my material, and contacted the authors of papers I’m reviewing, too. Palaeo may be less formal in this respect than fields such as medical research. It’s often useful, for example, to get the authors to send higher resolution versions of the specimen photographs than the downscaled ones the journal passes on; or to get the manuscript in a read-write format that lets you more easily add notes and corrections. Most importantly, I’ve sometimes had to send my marked-up copy of the manuscript directly to the corresponding author because the journal’s automated system has no way to attach it to the formal response.

Perhaps the idea that you shouldn’t communicate with authors comes from confidentiality concerns. But I know who the authors are. (There are no palaeo journals that do double-blind reviewing, and it would be impossible any in a field small enough that you pretty much know who everyone is and what they work on.) And since I never review anonymously, I don’t mind them knowing who I am while I am still doing the review.

In the end, one of the main goals of peer-review — I would say the main goal — is to help the authors make their work the best it can be. Often contacting them directly is the more effective way to do that.

2. Novelty

“Ask yourself whether the questions the authors are addressing are really advancing the field in a meaningful way. This does not mean that an article has to be completely novel, but it does mean that the work contributes to the sum of knowledge in the field and does not, for example, simply repeat well known results.”

I only agree with this for certain values of “well known”. In experimental sciences, replication is hugely important, and it’s one of the worst consequences of the prestige-obsessed journal system that it’s so hard to get a replication published. You could almost say that an experimental result that’s only been published once is worthless.

Equally important, or maybe even more important than replication, is the failed replication. When Doyen et al. (2012) tried and failed to replicate the findings of Bargh et al. (1996) on psychological priming, it was an important check on the influence of an article that has been cited more than 2,500 times. Bargh himself was not happy about it, but to quote a much-loved SV-POW! maxim due to Tom Holtz, “Sorry if that makes some people feel bad, but I’m not in the ‘make people feel good’ business; I’m a scientist.”

So a reviewer should only complain about lack of novelty if the experiment has already been replicated several times. (There’s no value in a research paper showing that large and small cannonballs fall at the same speed from the top of the leaning tower of Pisa.)

3. Changing the subject

“Can you think of a better way to address the research questions than what the authors did?” … “You have every right to ask the authors to do a different experiment.”

Ugh. I just hate this. There is literally nothing I detest more in a review than “You should have written this different paper instead”. Please reviewers, review what’s in front of you, not what you would have done instead.

If you think of another approach that you think is promising, by all means suggest it as a followup project. But please in the name of all that we hold dear, don’t let it be a roadblock that delays this work from being published.

So, I’ve been thinking a lot about this interesting situation with Elsevier, which David Tempest’s remarks at the Oxford Evolution or Revolution debate highlighted: they can’t afford (literally or figuratively) to tell us how much they charge different institutions for the same stuff.

And I had this thought, which Mike tweeted:

When simply telling the truth can blow up your business model, you need a new business model.

Mash that up with “information wants to be free” and “if all else fails, someone will show up to liberate it”, and you get this:

When a single person of good conscience can blow up your business model simply by telling the truth, you need a new business model.

If we’ve learned anything in the past few years, it is that humans are the weak link in any campaign of secrecy.

We know that all of the big barrier-based publishers have these bundling deals with libraries, and that no-one on either side is allowed to say what the terms of those deals are. But there must be a lot of people with access to that information. And at least some of them must know how much libraries are getting screwed, precisely because they have access to that information. Seems unlikely that information will stay secret forever.

So, should we be expecting a Snowden-type leak from one or another barrier-based publisher? It doesn’t have to be Elsevier, but I think if it happens they’re the most likely target, because they are so single-minded about cultivating the ill-will of the people they allegedly serve (most recently with this and this). Sometimes I wonder if the other barrier-based publishers are getting too much of a free pass precisely because Elsevier is so good at tossing grenades and then jumping on them.

Corollary: barrier-based publishers, what are you doing to prepare for such a leak? “More secrecy” and “harsher penalties” will probably not work out well in the long run. But do feel free to keep scoring own goals if you must.

It’s now widely understood among researchers that the impact factor (IF) is a statistically illiterate measure of the quality of a paper. Unfortunately, it’s not yet universally understood among administrators, who in many places continue to judge authors on the impact factors of the journals they publish in. They presumably do this on the assumption that impact factor is a proxy for, or predictor of, citation count, which is turn is assumed to correlate with influence.

As shown by Lozano et al. (2012), the correlation between IF and citations is in fact very weak — r2 is about 0.2 — and has been progressively weakening since the dawn of the Internet era and the consequent decoupling of papers from the physical journal that they appear in. This is a counter-intuitive finding: given that the impact factor is calculated from citation counts you’d expect it to correlate much more strongly. But the enormous skew of citation rates towards a few big winners renders the average used by the IF meaningless.

To bring this home, I plotted my own personal impact-factor/citation-count graph. I used Google Scholar’s citation counts of my articles, which recognises 17 of my papers; then I looked up the impact factors of the venues they appeared in, plotted citation count against impact factor, and calculated a best-fit line through my data-points. Here’s the result (taken from a slide in my Berlin 11 satellite conference talk):

berlin11-satellite-taylor-what-we-can-do--impact-factor-graph

I was delighted to see that the regression slope is actually negative: in my case at least, the higher the impact factor of the venue I publish in, the fewer citations I get.

There are a few things worth unpacking on that graph.

First, note the proud cluster on the left margin: publications in venues with impact factor zero (i.e. no impact factor at all). These include papers in new journals like PeerJ, in perfectly respectable established journals like PaleoBios, edited-volume chapters, papers in conference proceedings, and an arXiv preprint.

My most-cited paper, by some distance, is Head and neck posture in sauropod dinosaurs inferred from extant animals (Taylor et al. 2009, a collaboration between all three SV-POW!sketeers). That appeared in Acta Palaeontologia Polonica, a very well-respected journal in the palaeontology community but which has a modest impact factor of 1.58.

My next most-cited paper, the Brachiosaurus revision (Taylor 2009), is in the Journal of Vertebrate Palaeontology — unquestionably the flagship journal of our discipline, despite its also unspectacular impact factor of 2.21. (For what it’s worth, I seem to recall it was about half that when my paper came out.)

In fact, none of my publications have appeared in venues with an impact factor greater than 2.21, with one trifling exception. That is what Andy Farke, Matt and I ironically refer to as our Nature monograph (Farke et al. 2009). It’s a 250-word letter to the editor on the subject of the Open Dinosaur Project. (It’ a subject that we now find profoundly embarrassing given how dreadfully slowly the project has progressed.)

Google Scholar says that our Nature note has been cited just once. But the truth is even better: that one citation is in fact from an in-prep manuscript that Google has dug up prematurely — one that we ourselves put on Google Docs, as part of the slooow progress of the Open Dinosaur Project. Remove that, and our Nature note has been cited exactly zero times. I am very proud of that record, and will try to preserve it by persuading Andy and Matt to remove the citation from the in-prep paper before we submit. (And please, folks: don’t spoil my record by citing it in your own work!)

What does all this mean? Admittedly, not much. It’s anecdote rather than data, and I’m posting it more because it amuses me than because it’s particularly persuasive. In fact if you remove the anomalous data point that is our Nature monograph, the slope becomes positive — although it’s basically meaningless, given that all my publications cluster in the 0–2.21 range. But then that’s the point: pretty much any data based on impact factors is meaningless.

References

 

Follow

Get every new post delivered to your Inbox.

Join 376 other followers