Fumbling towards transparency: the Royal Society’s “reject & resubmit” and submitted/published dates
July 31, 2014
Regulars will remember that nearly two years ago, I reviewed a paper for the Royal Society’s journal Biology Letters, recommended acceptance with only trivial changes (as did both other reviewers) and was astonished to see that it was rejected outright. There was an invitation to resubmit, with wording that made it clear that the resubmission would be treated as a brand new manuscript; but when the “resubmission” was made, it was accepted almost immediately without being sent to reviewers at all — proving that it was in fact a minor revision.
What’s worse, the published version gives the dates “Received August 21, 2012.
Accepted September 13, 2012″, for a submission-to-acceptance time of just 23 days. But my review was done before August 21. This is a clear falsifying of the true time taken to process the manuscript, a misrepresentation unworthy of the Royal Society, and which provoked Matt and me to declare that we would no longer provide peer-review for the Society until they fix this.
By the way, we should be clear that the Royal Society is not the only publisher that does this. For example, one commenter had had the same experience with Molecular Ecology. Misreporting the submission/revision cycle like this works to publishers’ benefit in two ways: it makes them look faster than they really are, and makes the rejection rate look higher (which a lot of people still use as a proxy for prestige).
To the Society’s credit, they were quick to get in touch, and I had what at time seemed like a fruitful conversation with Dr Stuart Taylor, their Commercial Director. The result was that they made some changes:
- Editors now have the additional decision option of ‘revise’. This provides a middle way between ‘reject and resubmit’ and ‘accept with minor revisions’. [It’s hard to believe this didn’t exist before, but I guess it’s so.]
- The Society now publicises ‘first decision’ times rather than ‘first acceptance’ times on their website.
As I noted at the time, while this is definitely progress, it doesn’t (yet) fix the problem.
A few days ago, I checked whether things have improved by looking at a recent article, and was disappointed to see that they had not. I posted two tweets:
Again, I want to acknowledge that the Royal Society is taking this seriously: less than a week later I heard from Phil Hurst at the Society:
I was rather surprised to read your recent tweets about us not fixing this bug. I thought it was resolved to your satisfaction.
Because newly published articles still only have two dates (submitted and accepted) it’s impossible to tell whether the “submitted” date is that of the original submission (which would be honest) or that of the revision, styled “a new submission” even though it’s not, that follows a “reject and resubmit” verdict.
Also: if the journals are still issuing “reject and resubmit” and then accepting the supposed new submissions without sending them out for peer-review (I can’t tell whether this is the case) then that is also wrong.
Sorry to be so hard to satisfy :-) I hope you will see and agree that it comes from a desire to have the world’s oldest scientific society also be one that leads the way in transparency and honesty.
And Phil’s response (which I quote with his kind permission):
I feel the changes we have made provide transparency.
Now that the Editors have the ‘revise’ option, this revision time is now incorporated in the published acceptance times. If on the other hand the ‘reject and resubmit’ option is selected, the paper has clearly been rejected and the author may or may not re-submit. Clearly if a paper had been rejected from another journal and then submitted to us, we would not include the time spent at that journal, so I feel our position is logical.
We only advertise the average ‘receipt to first decision’ time. As stated previously, we feel this is more meaningful as it gives prospective authors an indication of the time, irrespective of decision.
After all that recapitulation, I am finally in a position to lay out what the problems are, as I perceive them, in how things currently stand.
- Even in recently published articles, only two dates are given: “Received May 13, 2014. Accepted July 8, 2014″. It’s impossible to tell whether the first of those dates is that of the original submission, or the “new submission” that is really a minor revision following a reject-and-resubmit verdict.
- It’s also impossible to tell what “receipt to first decision” time is in the journal’s statistics. Is “receipt” the date of the revision?
- We don’t know what the journals’ rejection rates mean. Do they include the rejections of articles that are in fact published a couple of weeks later?
So we have editorials like this one from 2012 that trumpet a rejection rate of 78% (as though wasting the time of 78% of their authors is something to be proud of), but we have no idea what that number represents. Maybe they reject all articles initially, then accept 44% of them immediately on resubmission, and call that a 22% acceptance rate. We just can’t tell.
All of this uncertainly comes from the same root cause: the use of “reject and resubmit” to mean “accept with minor revisions”.
What can the Royal Society do to fix this? Here is one approach:
- Each article should report three dates instead of two. The date of initial submission, the date of resubmission, and the date of acceptance. Omitting the date of initial submission is actively misleading.
- For each of the statistics they report, add prose that is completely clean on what is being measured. In particular, be clear about what “receipt” means.
But a much better and simpler and more honest approach is just to stop issuing “reject and resubmit” verdicts for minor revisions. All the problems just go away then.
“Minor revisions” should mean “we expect the editor to be able to make a final decision based on the changes you make”.
“Major revisions” should mean “we expect to send the revised manuscript back out to the reviewers, so they can judge whether you’ve made the necessary changes”.
And “reject and resubmit” should mean “this paper is rejected. If you want to completely retool it and resubmit, feel free”. It is completely inappropriate to accept a resubmitted paper without sending it out to peer review: doing so unambiguously gives the lie to the claim in the decision letter that “The resubmission will be treated as a new manuscript”.
Come on, Royal Society. You’ve been publishing science since 1665. Three hundred and forty-nine years should be long enough to figure out what “reject” means. You’re better than this.
And once the Royal Society gets this fixed, it will become much easily to persuade other publishers who’ve been indulging in this shady practice to mend their ways, too.
July 29, 2014
I wrote this paper in 2003-4, and it was rejected without review when I submitted it back then. (For, I think, specious reasons, but that’s a whole nother discussion. Forget I mentioned it.)
I haven’t touched the manuscript since then (except to single-space it for submission as a preprint). It’s ten years old. That’s a problem because it’s an analysis of a database of dinosaur diversity, and as everyone knows, the rate of recognising new dinosaurs has gone through the roof. That’s the reason I never made any attempt to update and resubmit it: dinosaur diversity is a fast-moving target, and each time through the submit-reject cycle takes long enough for the data to be outdated.
So much for the history. Now the question: how should I cite this paper? Specifically, what date should I give it? If I cite it as from 2004, it will give the misleading impression that the paper has been available for ten years; but if I cite it as from 2014, it will imply that it’s been worked on at some point in the last ten years. Both approaches seem misleading to me.
At the moment, I am citing it as “Taylor (2014 for 2004)”, which seems to more or less capture what’s meant, but I don’t know whether it’s an established convention. Is there an established convention?
Releated: where in mv publications list should it appear? At present I am sorting it under 2014, since that’s when it came out; but should it be under 2004, when it was written? I guess publication date is the one to go far — after all, it’s not unusual even now for papers to spend a year or more in press, and it’s the later (publication) date that’s cited.
Help me out. How should this be done?
- Taylor, Michael P. 2014 (written in 2004). A survey of dinosaur diversity by clade, age, place of discovery and year of description. PeerJ PrePrints 2:e434v1. doi: 10.7287/peerj.preprints.434v1
I was reading a rant on another site about how pretentious it is for intellectuals and pseudo-intellectuals to tell the world about their “media diets” and it got me thinking–well, angsting–about my scientific media diet.
And then almost immediately I thought, “Hey, what am I afraid of? I should just go tell the truth about this.”
And that truth is this: I can’t tell you what forms of scientific media I keep up with, because I don’t feel like I am actually keeping up with any of them.
Papers – I have no systematic method of finding them. I don’t subscribe to any notifications or table of contents updates. Nor, to be honest, am I in the habit of regularly combing the tables of contents of any journals.
Blogs – I don’t follow any in a timely fashion, although I do check in with TetZoo, Laelaps, and a couple of others every month or two. Way back when we started SV-POW!, we made a command decision not to list any sites other than our own on the sideboard. At the time, that was because we didn’t want to have any hurt feelings or drama over who we did and didn’t include. But over time, a strong secondary motive to keep things this way is that we’re not forced to keep up with the whole paleo blogosphere, which long ago outstripped my capacity to even competently survey. Fortunately, those overachievers at Love in the Time of Chasmosaurs have a pretty exhaustive-looking set of links on their sidebar, so globally speaking, someone is already on that.
The contraction in my blog reading is a fairly recent thing. When TetZoo was on ScienceBlogs, I was over there all the time, and there were probably half a dozen SciBlogs that I followed pretty regularly and another dozen or so that I at least kept tabs on. But ScienceBlogs burned down the community I was interested in, and the Scientific American Blog Network is sufficiently ugly (in the UI sense) and reader-unfriendly to not be worth my dealing with it. So I am currently between blog networks–or maybe past my last one.
Social Media – I’m not on Twitter, and I tend to only log into Facebook when I get an interesting notice in my Gmail “Social” folder. Sometimes I’m not on FB for a week or two at a time. So I miss a lot of stuff that goes down there, including notices about new papers. I could probably fix that if I just followed Andy Farke more religiously.
What ends up happening – I mainly find papers relevant to specific projects as I execute those projects; each new project is a new front in my n-dimensional invasion of the literature. My concern is that in doing this, I tend to find the papers that I’m looking for, whereas the papers that have had the most transformative effect on me are the ones I was not looking for at the time.
Beyond that, I find out about new papers because the authors take it on themselves to include me when they email the PDF out to a list of potentially interested colleagues (and many thanks to all of you who are doing that!), or Mike, Darren, or Andy send it to me, or it turns up in the updates to my Google Scholar profile.
So far, this combination of ad hoc and half-assed methods seems to be working, although it does mean that I have unfairly outsourced much of my paper discovery to other people without doing much for them in return. When I say that it’s working, I mean that I don’t get review comments pointing out that I have missed important recent papers. I do get review comments saying that I need to cite more stuff,* but these tend to be papers that I already know of and maybe even cited already, just not in the right ways to satisfy the reviewers.**
* There is a sort of an arrow-of-inevitability thing here, in that reviewers almost always ask you to cite more papers rather than fewer. Only once ever have I been asked to cite fewer sources, and that is when I had submitted my dinosaur nerve paper (Wedel 2012) to a certain nameless anatomy journal that ended up not publishing it. One of the reviewers said that I had cited several textbooks and popular science books and that was poor practice, I should have cited primary literature. Apparently this subgenius did not realize that I was citing all of those popular sources as examples of publications that held up the recurrent laryngeal nerve of giraffes as evidence for evolution, which was part of the point that I was making: giraffe RLNs are overrated.
** My usual sin is that I mentally categorize papers in one or two holes and forget that a given paper also mentioned C and D in addition to saying a lot about A and B. It’s something that vexes me about some of my own papers. I put so much stuff into the second Sauroposeidon paper (Wedel et al. 2000b) that some it has never been cited–although that paper has been cited plenty, it often does not come up in discussions where some of the data presented therein is relevant, I think because there’s just too much stuff in that paper for anyone (who cares about that paper less than I do) to hold in their heads. But that’s a problem to be explored in another post.
The arborization of science
Part of the problem with keeping up with the literature is just that there is so much more of it than there was even a few years ago. When I first got interested in sauropod pneumaticity back in the late 90s, you were pretty much up to speed if you’d read about half a dozen papers:
- Seeley (1870), who first described pneumaticity in sauropods as such, even if he didn’t know what sauropods were yet;
- Longman (1933), who first realized that sauropod vertebrae could be sorted into two bins based on their internal structures, which are crudely I-beam-shaped or honeycombed;
- Janensch (1947), who wrote the first ever paper that was primarily about pneumaticity in dinosaurs;
- Britt (1993), who first CTed dinosaur bones looking for pneumaticity, independently rediscovered Longman’s two categories, calling them ‘camerate’ and ‘camellate’ respectively, and generally put the whole investigation of dinosaur pneumaticity on its modern footing;
- Witmer (1997), who provided what I think is the first compelling explanation of how and why skeletal pneumaticity works the way it does, using a vast amount of evidence culled from both living and fossil systems;
- Wilson (1999), who IIRC was the first to seriously discuss the interplay of pneumaticity and biomechanics in determining the form of sauropod vertebrae.
Yeah, there you go: up until the year 2000, you could learn pretty much everything important that had been published on pneumaticity in dinosaurs by reading five papers and one dissertation. “Dinosaur pneumaticity” wasn’t a field yet. It feels like it is becoming one now. To get up to speed today, in addition to the above you’d need to read big swaths of the work of Roger Benson, Richard Butler, Leon Claessens, Pat O’Connor (including a growing body of work by his students), Emma Schachner (not on pneumaticity per se, but too closely related [and too awesome] to ignore), Daniela Schwarz, and Jeff Wilson (and his students), plus important singleton papers like Woodward and Lehman (2009), Cerda et al. (2012), Yates et al. (2012), and Fanti et al. (2013). Not to mention my own work, and some of Mike’s and Darren’s. And Andy Farke and the rest of Witmer, if you’re into cranial pneumaticity. And still others if you care about pneumaticity in pterosaurs, which you should if you want to understand how–and, crucially, when–the anatomical underpinnings of ornithodiran pneumaticity evolved. Plus undoubtedly some I’ve forgotten–apologies in advance to the slighted, please prod me in the comments.
You see? If I actually listed all of the relevant papers by just the authors I named above, it would probably run to 50 or so papers. So someone trying to really come to grips with dinosaur pneumaticity now faces a task roughly equal to the one I faced in 1996 when I was first trying to grokk sauropods. This is dim memory combined with lots of guesswork and handwaving, but I probably had to read about 50 papers on sauropods before I felt like I really knew the group. Heck, I read about a dozen on blood pressure alone.
(Note to self: this is probably a good argument for writing a review paper on dinosaur pneumaticity, possibly in collaboration with some of the folks mentioned above–sort of a McIntosh  for the next generation.)
When I wrote the first draft of this post, I was casting about for a word to describe what is going on in science, and the first one that came to mind is “fragmentation”. But that’s not the right word–science isn’t getting more fragmented. If anything, it’s getting more interconnected. What it’s really doing is arborizing–branching fractally, like the blood vessels in the image at the top of this post. I think it’s pointless to opine about whether this is a good or bad thing. Like the existence of black holes and fuzzy ornithischians, it’s just a fact now, and we’d better get on with trying to make progress in this new reality.
How do I feel about all this, now that my little capillary of science has grown into an arteriole and threatens to become a full-blown artery? It is simultaneously exhilarating and worrying. Exhilarating because lots of people are discovering lots of cool stuff about my favorite system, and I have a lot more people to bounce ideas around with than I did when I started. Worrying because I feel like I am gradually losing my ability to keep tabs on the whole thing. Sound familiar?
Conclusion: Help a brother out
Having admitted all of this, it seems imperative that I get my act together and establish some kind of systematic new-paper-discovery method, beyond just sponging off my friends and hoping that they’ll continue to deliver everything I need. But it seems inevitable that I am either going to have to be come more selective about what I consume–which sounds both stupid and depressing–or lose all of my time just trying to keep up with things.
Hi, I’m Matt. I just arrived here in Toomuchnewscienceistan. How do you find your way around?
- Britt, B. B. 1993. Pneumatic postcranial bones in dinosaurs and other archosaurs. Ph.D. dissertation, University of Calgary, Calgary, 383 pp.
- Cerda, I.A., Salgado, L., and Powell, J.E. 2012. Extreme postcranial pneumaticity in sauropod dinosaurs from South America. Palaeontologische Zeitschrift. DOI 10.1007/s12542-012-0140-6
- Fanti, F., Cau, A., Hassine, M., and Contessi, M. 2013. A new sauropod dinosaur from the Early Cretaceous of Tunisia with extreme avian-like pneumatization. Nature Communications 4:2080. doi:10.1038/ncomms3080
- Longman, H. A. 1933. A new dinosaur from the Queensland Creta- ceous. Memoirs of the Queensland Museum 10:131–144.
- McIntosh, John S. 1990. Sauropoda. pp. 345-401 in: D. B. Weishampel, P. Dodson and H. Osmólska (eds.), The Dinosauria. University of California Press, Berkeley and Los Angeles.
- Seeley, H. G. 1870. On Ornithopsis, a gigantic animal of the pterodactyle kind from the Wealden. Annals and Magazine of Natural History, Series 4, 5 279-283.
- Wilson, J. A. 1999. A nomenclature for vertebral laminae in sauropods and other saurischian dinosaurs. Journal of Vertebrate Paleontology 19, 639-653.
- Witmer, L.M. 1997. The evolution of the antorbital cavity of archosaurs: a study in soft-tissue reconstruction in the fossil record with an analysis of the function of pneumaticity. Society of Vertebrate Paleontology Memoir 3:1-73.
- Woodward, H.N., and Lehman, T.M. 2009. Bone histology and microanatomy of Alamosaurus sanjuanensis (Sauropoda: Titanosauria) from the Maastrichtian of Big Bend National Park, Texas. Journal of Vertebrate Paleontology 29(3):807-821.
- Yates, A.M., Wedel, M.J., and Bonnan, M.F. 2012. The early evolution of postcranial skeletal pneumaticity in sauropodomorph dinosaurs. Acta Palaeontologica Polonica 57(1):85-100. doi: http://dx.doi.org/10.4202/app.2010.0075
As recently noted, it was my pleasure and privilege on 25 June to give a talk at the ESOF2014 conference in Copenhagen (the EuroScience Open Forum). My talk was one of four, followed by a panel discussion, in a session on the subject “Should science always be open?“.
I had just ten minutes to lay out the background and the problem, so it was perhaps a bit rushed. But you can judge for yourself, because the whole session was recorded on video. The image is not the greatest (it’s hard to make out the slides) and the audio is also not all it could be (the crowd noise is rather loud). But it’s not too bad, and I’ve embedded it below. (I hope the conference organisers will eventually put out a better version, cleaned up by video professionals.)
Subbiah Arunachalam (from Arun, Chennai, India) asked me whether the full text of the talk was available — the echoey audio is difficult for non-native English speakers. It wasn’t but I’ve sinced typed out a transcript of what I said (editing only to remove “er”s and “um”s), and that is below. Finally, you may wish to follow the slides rather than the video: if so, they’re available in PowerPoint format and as a PDF.
It’s very gracious of you all to hold this conference in English; I deeply appreciate it.
“Should science always be open?” is our question, and I’d like to open with one of the greatest scientists there’s ever been, Isaac Newton, who humility didn’t come naturally to. But he did manage to say this brilliant humble thing: “If I have seen further, it’s by standing on the shoulders of giants.”
And the reason I love this quote is not just because it’s insightful in itself, but because he stole it from something John of Salisbury said right back in 1159. “Bernard of Chartres used to say that we were like dwarfs seated on the shoulders of giants. If we see more and further than they, it is not due to our own clear eyes or tall bodies, but because we are raised on high and upborne by their gigantic bigness.”
Well, so Newton — I say he stole this quote, but of course he did more than that: he improved it. The original is long-winded, it goes around the houses. But Newton took that, and from that he made something better and more memorable. So in doing that, he was in fact standing on the shoulders of giants, and seeing further.
And this is consistently where progress comes from. It’s very rare that someone who’s locked in a room on his own thinking about something will have great insights. It’s always about free exchange of ideas. And we see this happening in lots of different fields.
Over the last ten or fifteen years, enormous advances in the kinds of things computers working in networks can do. And that’s come from the culture of openness in APIs and protocols, in Silicon Valley and elsewhere, where these things are designed.
Going back further and in a completely different field, the Impressionist painters of Paris lived in a community where they were constantly — not exactly working together, but certainly nicking each other’s ideas, improving each other’s techniques, feeding back into this developing sense of what could be done. Resulting in this fantastic art.
And looking back yet further, Florence in the Renaissance was a seat of all sorts of advances in the arts and the sciences. And again, because of this culture of many minds working together, and yielding insights and creativity that would not have been possible with any one of them alone.
And this is because of network effects; or Metcalfe’s Law expresses this by saying that the value of a network is proportional to the square of the number of nodes in that network. So in terms of scientific reasearch, what that means is that if you have a corpus of published research output, of papers, then the value of that goes — it doesn’t just increase with the number of papers, it goes up with the square of the number of papers. Because the value isn’t so much in the individual bits of research, but in the connections between them. That’s where great ideas come from. One researcher will read one paper from here and one from here, and see where the connection or the contradiction is; and from that comes the new idea.
So it’s very important to increase the size of the network of what’s available. And that’s why we have a very natural tendency, I think among scientists particularly, but I think we can say researchers in other areas as well, have a natural tendency to share.
Now until recently, the big difficulty we’ve had with sharing has been logistical. It was just difficult to make and distribute copies of pieces of research. So this [picture of a printing press] is how we made copies, this [picture of stacks of paper] was what we stored them on, and this was how we transmitted them from one researcher to another.
And they were not the most efficient means, or at least not as efficient as what we now have available. And because of that, and because of the importance of communication and the links between research, I would argue that maybe the most important invention of the last hundred years is the Internet in general and the World Wide Web in particular. And the purpose of the Web, as it was initially articulated in the first public post that Tim Berners-Lee made in 1991 — he explained not just what the Web was but what it was for, and he said: “The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.”
So that’s what the Web is for; and here’s why it’s important. I’m quoting here from Cameron Neylon, who’s great at this kind of thing. And again it comes down to connections, and I’m just going to read out loud from his blog: “Like all developments of new communication networks, SMS, fixed telephones, the telegraph, the railways, and writing itself, the internet doesn’t just change how well we can do things, it qualitatively changes what we can do.” And then later on in the same post: “At network scale the system ensures that resources get used in unexpected ways. At scale you can have serendipity by design, not by blind luck.”
Now that’s a paradox; it’s almost a contradiction, isn’t it? Serendipity by definition is what you get by blind luck. But the point is, when you have enough connections — enough papers floating around the same open ecosystem — all the collisions happening between them, it’s inevitable that you’re going to get interesting things coming out. And that’s what we’re aiming towards.
And of course it’s never been more important, with health crises, new diseases, the diminishing effectiveness of antibiotics, the difficulties of feeding a world of many billions of people, and the results of climate change. It’s not as though we’re short of significant problems to deal with.
So I love this Jon Foley quote. He said, “Your job” — as a researcher — “Your job is not to get tenure! Your job is to change the world”. Tenure is a means to an end, it’s not what you’re there for.
So this is the importance of publishing. Of course the word “publish” comes from the same root as the word “public”: to publish a piece of research means to make that piece of research public. And the purpose of publishing is to open research up to the world, and so open up the world itself.
And that’s why it’s so tragic when we run into this [picture of a paywalled paper]. I think we’ve all seen this at various times. You go to read a piece of research that’s valuable, that’s relevant to either the research you’re doing, or the job you’re doing in your company, or whatever it might be. And you run into this paywall. Thirty five dollars and 95 cents to read this paper. It’s a disaster. Because what’s happened is we’ve got a whole industry whose existence is to make things public, and who because of accidents of history have found themselves doing the exact opposite. Now no-one goes into publishing with the intent of doing this. But this is the unfortunate outcome.
So what we end up with is a situation where we’re re-imposing on the research community barriers that were necessarily imposed by the inadequate technology of 20 or 30 years ago, but which we’ve now transcended in technological terms but we’re still strugging with for, frankly, commercial reasons. This is why we’re struggling with this.
And I don’t like to be critical, but I think we have to just face the fact that there is a real problem when organisations, for many years have been making extremely high profits — these [36%, 32%, 34%, 42%] are the profit margins of the “big four” academic publishers which together hugely dominate the scholarly publishing market — and as you can see they’re in the range 32% to 42% of revenue, is sheer profit. So every time your university library spends a dollar on subscriptions, 40% of that goes straight out of the system to nowhere.
And it’s not surprising that these companies are hanging on desperately to the business model that allows them to do that.
Now the problem we have in advocating for open access is that when we stand against publishers who have an existing very profitable business model, they can complain to governments and say, “Look, we have a market that’s economically significant, it’s worth somewhere in the region of 10-15 billion US dollars a year.” And they will say to governments, “You shouldn’t do anything that might damage this.” And that sounds effective. And we struggle to argue against that because we’re talking about an opportunity cost, which is so much harder to measure.
You know, I can stand here — as I have done — and wave my hands around, and talk about innovation and opportunity, and networks and connections, but it’s very hard to quantify in a way that can be persuasive to people in a numeric way. Say, they have a 15 billion dollar business, we’re talking about saving three trillion’s worth of economic value (and I pulled that number out of thin air). So I would love, if we can, when we get to the discussions, to brainstorm some way to quantify the opportunity cost of not being open. But this is what it looks like [picture of flooding due to climate change]. Economically I don’t know what it’s worth. But in terms of the world we live in, it’s just essential.
So we’ve got to remember the mission that we’re on. We’re not just trying to save costs by going to open access publishing. We’re trying to transform what research is, and what it’s for.
So should science always be open? Of course, the name of the session should have been “Of course science should always be open”.
New (but very old) preprint: A survey of dinosaur diversity by clade, age, place of discovery and year of description
July 11, 2014
Today, available for the first time, you can read my 2004 paper A survey of dinosaur diversity by clade, age, place of discovery and year of description. It’s freely available (CC By 4.0) as a PeerJ Preprint. It’s one of those papers that does exactly what it says on the tin — you should be able to find some interesting patterns in the diversity of your own favourite dinosaur group.
“But Mike”, you say, “you wrote this thing ten years ago?”
Yes. It’s actually the first scientific paper I ever wrote (bar some scraps of computer science) beginning in 2003. It’s so old that all the illustrations are grey-scale. I submitted it to Acta Palaeontologica Polonica way back on on 24 October 2004 (three double-spaced hard-copies in the post!) , but it was rejected without review. I was subsequently able to publish a greatly truncated version (Taylor 2006) in the proceedings of the 2006 Symposium on Mesozoic Terrestrial Ecosystems, but that was only one tenth the length of the full manuscript — much potentially valuable information was lost.
My finally posting this comes (as so many things seem to) from a conversation with Matt. Off work sick, he’d been amusing himself by re-reading old SV-POW! posts (yes, we do this). He was struck by my exhortation in Tutorial 14: “do not ever give a conference talk without immediately transcribing your slides into a manuscript”. He bemoaned how bad he’s been at following that advice, and I had to admit I’ve done no better, listing a sequence of old my SVPCA talks that have still never been published as papers.
The oldest of these was my 2004 presentation on dinosaur diversity. Commenting on this, I wrote in email: “OK, I got the MTE four-pager out of this, but the talk was distilled from a 40ish-page manuscript that was never published and never will be.” Quick as a flash, Matt replied:
If I had written this and sent it to you, you’d tell me to put it online and blog about how I went from idea to long paper to talk to short paper, to illuminate the process of science.
And of course he was right — hence this preprint.
I will never update this manuscript, as it’s based on a now wildly outdated database and I have too much else happening. (For one thing, I really ought to get around to finishing up the paper based on my 2005 SVPCA talk!) So in a sense it’s odd to call it a “pre-print” — it’s not pre anything.
Despite the data being well out of date, this manuscript still contains much that is (I think) of interest, and my sense is that the ratios of taxon counts, if not the absolute numbers, are still pretty accurate.
I don’t expect ever to submit a version of this to a journal, so this can be considered the final and definitive version.
- Taylor, Michael P. 2006. Dinosaur diversity analysed by clade, age, place and year of description. pp. 134-138 in Paul M. Barrett and Susan E. Evans (eds.), Ninth international symposium on Mesozoic terrestrial ecosystems and biota, Manchester, UK. Cambridge Publications. Natural History Museum, London, UK. 187 pp.
- Taylor, Michael P. 2014 (written in 2004). A survey of dinosaur diversity by clade, age, place of discovery and year of description. PeerJ PrePrints 2:e434v1. doi:10.7287/peerj.preprints.434v1
I think it’s fair to say that this “bifurcation heat-map”, from Wedel and Taylor (2013a: figure 9), has been one of the best-received illustrations that we’ve prepared:
Back when the paper came out, Matt rashly said “Stand by for a post by Mike explaining how it came it be” — a post which has not materialised. Until now!
This illustration was (apart from some minor tweaking) produced by a program that I wrote for that purpose, snappily named “vcd2svg“. That name is because it converts a vertebral column description (VCD) into a scalable vector graphics (SVG) file, which you can look at with a web-browser or load into an image editor for further processing.
The vertebral column description is in a format designed for this purpose, and I think it’s fairly intuitive. Here, for example, is the fragment describing the first three lines of the figure above:
Taxon: Apatosaurus louisae
Specimen: CM 3018
Taxon: Apatosaurus parvus
Specimen: UWGM 155556/CM 563
Taxon: Apatosaurus ajax
Specimen: NMST-PV 20375
Basically, you draw little ASCII pictures of the vertebral column. Other directives in the file explain how to draw the various glyphs represented by (in this case) “Y”, “V”, “u”, and “n”.
It’s pretty flexible. We used the same program to generate the right-hand side (though not the phylogenetic tree) of Wedel and Taylor (2013b: figure 2):
The reason I mention this is because I released the software today under the GNU General Public Licence v3.0, which is kind of like CC By-SA. It’s free for anyone to download, use, modify and redistribute either verbatim or in modified form, subject only to attribution and the requirement that the same licence be used for modified versions.
vcd2svg is written in Perl, and implemented in part by the SVG::VCD module, which is included in the package. It’s available as a CPAN module and on GitHub. There’s documentation of the command-line vcd2svg program, and of the VCD file format. Also included in the distribution are two documented examples: the bifurcation heat-map and the caudal pneumaticity diagram.
Folks, please use it! And feel free to contribute, too: as the change-log notes, there’s work still to be done, and I’ll be happy to take pull requests from those of you who are programmers. And whether you’re a programmer or not, if you find a bug, or want a new feature, feel free to file an issue.
A final thought: in academia, you don’t really get credit for writing software. So to convert the work that went into this release into some kind of coin, I’ll probably have to write a short paper describing it, and let that stand as a proxy for the actual program. Hopefully people will cite that paper when they generate a figure using the software, the way we all reflexively cite Swofford every time we use PAUP*.
Update (12 April 2014)
- Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, MA.
- Wedel, M.J., and Taylor, M.P. 2013. Neural spine bifurcation in sauropod dinosaurs of the Morrison Formation: ontogenetic and phylogenetic implications. Palarch’s Journal of Vertebrate Palaeontology 10(1): 1-34. ISSN 1567-2158.
- Wedel, Mathew J., and Michael P. Taylor. 2013. Caudal pneumaticity and pneumatic hiatuses in the sauropod dinosaurs Giraffatitan and Apatosaurus.PLOS ONE 8(10):e78213. 14 pages. doi:10.1371/journal.pone.0078213 [PDF]
March 25, 2014
How should scientists, and reporters, discuss work that has failed to replicate? The original Barr and colleagues article remains in the scientific literature; failed replication alone is not grounds for retraction.
He’s right, of course: we certainly don’t want to retract every paper whose conclusions can’t be replicated, for all sorts of reasons: they may subsequently be replicated after all; the paper may contain other useful information even if the experiment in question was flawed; the replication studies themselves probably rely on the original’s Methods section; authors should not be punished for unfortunate outcomes unless they were fraudulently obtained.
What we want is for that Barr et al paper, whenever anyone looks at it, to be displayed with a prominent header that says “The following studies attempted to replicate this finding but failed:”, and a list of references/links. And, for that matter, another header saying that the following other studies did replicate it.
For web-sites to automatically produce that kind of annotation, they need articles that cite the original to include an additional piece of metadata, along with the author/year/title/journal/etc. metadata that identifies the cited paper. That additional ingredient is the citation’s type, which should be one of a small set of defined values.
What values are relevant? I won’t try to come up with an exhaustive list at this point, but obvious ones include:
- Replicates — the current paper replicates work done in the cited paper (and so provides evidence, though not proof, that the cited paper’s conclusion is correct).
- FailsToReplicate — the current paper attempts to replicate work done in the cited paper, but fails (and so provides evidence that the cited paper is mistaken).
- Falsifies — the current paper shows definitely that the cited paper is wrong. This is a stronger statement than FailsToReplicate, and would be used for example when the new work shows conclusively that the experimental protocol of the original was critically flawed.
- DependsOn — the current paper depends on information from the cited paper, such as the phylogeny that it proposes or the vertebral formula that it gives. For these purposes, the cited paper is treated as an authoritative source.
- Acknowledges — the current paper uses ideas proposed in the cited paper, and gives credit to the original.
There are all sorts of practical issues that will impede the adoption of this idea (not least the idiot fact that the citation graph is a trade secret rather than a freely available database), but let’s ignore those for now, and figure out what taxonomy of citation-types we want.