I think I figured out what the core, immutable quality of science is. It’s not formal publication, it’s not peer-review, it’s not “the scientific method” (whatever that means). It’s not replicability, it’s not properly citing sources, it’s not Popperian falsification. Underlying all those things is something more fundamental.


We all know that it’s good to be able to admit when you’ve been wrong about something. We all like to see that quality in others. We all like to think that we possess it ourselves — although, needless to say, in our case it never comes up. And it’s that last part that’s the rub. It goes so, so strongly against the grain for us to admit the possibility of error in our own work.

If science was just a matter of increasing the sum of human knowledge, it would suffice for us all to note our thoughts in blogs and have done. But because we’re not humble by nature — because we need to have humility formally imposed on us — we need the scaffolding of all those other things I mentioned:

  • Formal publication is important so that there’s a permanent record of what we claimed to have found. We can’t weasel out of an earlier mistake by claiming never to have made it.
  • Peer-review helps to prevent us from making mistakes in those formal publications. (That applies to informal pre-submission reviews as well as gatekeeper reviews.)
  • Whatever the scientific method means in detail, it’s a way to keep hypothesis, experiment, result and conclusion separate, so other scientists can clearly see what has been done, what is fact and what is opinion.
  • Replicability is providing enough information to enable others to determine on their own whether we’ve made mistakes.
  • Properly citing sources allows others to check that our assumptions are well supported.
  • Popperian falsification helps prevent us from having too much faith in our own ideas, by leaving them for the community to test.

All these standard parts of how science is done are about helping us to spot our own mistakes, giving opportunity for others to spot them, and providing a means for them to be corrected. (Of course, they have other benefits, too: for example, citing sources is important as a way of giving credit.)

We may not be humble people; but doing science forces us to act humbly.

Counting beans

October 10, 2012

The reason most of my work is in the form of journal articles is that I didn’t know there were other ways to communicate. Now that I know that there are other and in some ways demonstrably better ways (arXiv, etc.), my enthusiasm for sending stuff to journals is flagging. Whereas before I was happy to do it and the tenure beans were a happy side-effect, now I can see that the tenure beans are in fact shackles preventing me from taking a better path.

Posting palaeo papers on arXiv

September 28, 2012

Over on Facebook, where Darren posted a note about our new paper, most of the discussion has not been about its content but about where it was published. We’re not too surprised by that, even though we’d love to be talking about the science. We did choose arXiv with our eyes open, knowing that there’s no tradition of palaeontology being published there, and wanting to start a new tradition of palaeontology being routinely published there. Having now made the step for the first time, I see no reason ever to not post a paper on arXiv, as soon as it’s ready, before — or maybe even instead of — submitting it to a journal.

(Instead of? Maybe. We’ll discuss that below.)

The key issue is this: science isn’t really science until it’s out there where it can be used. We wrote the bulk of the neck-anatomy paper back in 2008 — the year that we first submitted it to a journal. In the four years since then, all the observations and deductions that it contains have been unavailable to the world. And that is stupid. The work might just as well never have been done. Now that it’s on arXiv, that’s over. I was delighted to get an email less than 24 hours after the paper was published, from an author working on a related issue, thanking us for posting the paper, saying that he will now revise his own in-prep manucript in light of its findings, and cite our paper. Which of course is the whole point: to get our science out there where it can do some damage.

Because the alternative is horrible, really. Horribly wasteful, horribly dispiriting, horribly retarding for science. For example, a couple of weeks ago in his SVPCA talk, David Norman was lamenting again that he never got around to publishing the iguanodont systematic work that was in his dissertation, I-don’t-know-how-many-years-ago. The result of that interminable delay is that others have done other, conflicting iguanodont systematic work, and Norman is now trying belatedly to undo that and bring his own perspective. A terrible an unnecessary slowing of ornithopod science, and a waste of duplicated effort. (Thankfully it’s only ornithopods.)

And of course David Norman is very far from being alone. Pretty much any palaeontologist you talk to will tell you of a handful of papers — many more in some cases — that were finished many years previously but have never seen the light of day. (I still have a couple myself, but there is no point in resurrecting them now because progress has overtaken them.) I wonder what proportion of all Ph.D work ever sees the light of day? Half? Less? It’s crazy.

Figure 8. Sauropod cervical vertebrae showing anteriorly and posteriorly directed spurs projecting from neurapophyses. 1, cervical 5 of Sauroposeidon holotype OMNH 53062 in right lateral view, photograph by MJW. 2, cervical 9 of Mamenchisaurus hochuanensis holotype CCG V 20401 in left lateral view, reversed, from photograph by MPT. 3, cervical 7 or 8 of Omeisaurus junghsiensisYoung, 1939 holotype in right lateral view, after Young (1939, figure 2). (No specimen number was assigned to this material, which has since been lost. D. W. E. Hone personal communication, 2008.)

Publish now, publish later

So, please folks: we all need to be posting our work on preprint servers as soon as we consider it finished. It doesn’t mean that the posted versions can’t subsequently be obsoleted by improved versions that have gone through peer-review and been published in conventional journals. But it does mean that the world can know about the work, and build on it, and get the benefit of it, as soon as it’s done.

You see, we have a very fundamental problem in academia: publishing fulfils two completely separate roles. Its primary role (or at least the role that should be primary) is to make work available to the community; the secondary role is to provide a means of keeping score — something that can be used when making decisions about who to appoint to jobs, when to promote, who gets grants, who gets tenure and so on. I am not going to argue that the latter shouldn’t happen at all — clearly a functioning community needs some way to infer the standing of its participants. But I do think it’s ridiculous when the bean-counting function of publication trumps the actual publication role of publication. Yet we’ve all been in a position where we have essentially complete work that could easily go on a blog, or in the PalAss newsletter, or in a minor journal, or somewhere — but we hang onto it because we want to get it into a Big Journal.

Let me say again that I do realise how unusual and privileged my own position is: that a lot of my colleagues do need to play the Publication Prestige game for career reasons (though it terrifies my how much time some colleagues waste squeezing their papers into two-and-a-half-page format in the futile hope of rolling three sixes on the Science ‘n’ Nature 3D6). Let’s admit right now that most palaeontologists do need to try to get their work into Proc B, or Paleobiology, or what have you. Fair enough. They should feel free. But the crucial point is this: that is no reason not to post pre-prints so we can all get on with actually benefitting from your work in the mean time.

Actually, I feel pretty stupid that it’s taken me this long to realise that all my work should go up on arXiv.

Figure 11. Archosaur cervical vertebrae in posterior view, Showing muscle attachment points in phylogenetic context. Blue arrows indicate epaxial muscles attaching to neural spines, red arrows indicate epaxial muscles attaching to epipophyses, and green arrows indicate hypaxial muscles attaching to cervical ribs. While hypaxial musculature anchors consistently on the cervical ribs, the principle epaxial muscle migrate from the neural spine in crocodilians to the epipophyses in non-avial theropods and modern birds, with either or both sets of muscles being significant in sauropods. 1, fifth cervical vertebra of Alligator mississippiensis, MCZ 81457, traced from 3D scans by Leon Claessens, courtesy of MCZ. Epipophyses are absent. 2, eighth cervical vertebra ofGiraffatitan brancai paralectotype HMN SII, traced from Janensch (1950, figures 43 and 46). 3, eleventh cervical vertebra of Camarasaurus supremus, reconstruction within AMNH 5761/X, “cervical series I”, modified from Osborn and Mook (1921, plate LXVII). 4, fifth cervical vertebra of the abelisaurid theropod Majungasaurus crenatissimus,UA 8678, traced from O’Connor (2007, figures 8 and 20). 5, seventh cervical vertebra of a turkey, Meleagris gallopavo, traced from photographs by MPT.


So are there any special cases? Any kinds of papers that we should keep dry until they make it into actual journals? I can think of two classes that you could argue for — one of them convincingly, the other not.

First, the unconvincing one. When I discussed this with Matt (and half the fun of doing that is that usually neither of us really knows what we think about this stuff until we’re done arguing it through), he suggested to me that we couldn’t have put the Brontomerus paper on arXiv, because that would have leaked the name, creating a nomen nudum. My initial reaction was to agree with him that this is an exception. But when I thought about it a bit more, I realised there’s actually no compelling reason not to post such a paper on arXiv. So you create a nomen nudum? So what? Really: what is the negative consequence of that? I can’t think of one. OK, the name will appear on Wikipedia and mailing lists before the ICZN recognises it — but who does that hurt? No-one that I can think of. The only real argument against posting is that it could invite scooping. But is that a real threat? I doubt it. I can’t think of anyone who would be barefaced enough to scoop a taxon that had already been published on arXiv — and if they did, the whole world would know unambiguously exactly what had happened.

So what is the one real reason not to post a preprint? I think that might be a legitimate choice when publicity needs to be co-ordinated. So while nomenclatural issues should not have stopped us from arXiving the Brontomerus paper, publicity should. In preparation for that paper’s publication day, we did a lot of careful work with the UCL publicity team: writing non-specialist summaries, press-releases and FAQs, soliciting and preparing illustrations and videos, circulating materials under embargo, and so on. In general, mainsteam media are only interested in a story if it’s news, and that means you need to make sure it’s new when they first hear about it. Posting the article in advance on a publicly accessible archive would mess that up, and probably damage the work’s coverage in the press, TV and radio.

Publication venues are a continuum

It’s become apparent to us only gradually that there’s really no clear cut-off where a paper becomes “properly published”. There’s a continuum that runs from least to most formal and exclusive:

SV-POW! — arXiv — PLOS ONE — JVP — Nature

1. On SV-POW!, we write what we want and publish it when we want. We can promise you that it won’t go away, but you only have our word for it. But some of what we write here is still science, and has been cited in papers published in more formal venues — though, as far as I know, only by Matt and me so far.

2. On arXiv, there is a bit more of a barrier to clear: you have to get an existing arXiv user to endorse your membership application, and each article you submit is given a cursory check by staff to ensure that it really is a piece of scientific research rather than a diary entry, movie review or spam. Once it’s posted, the paper is guaranteed to remain at the same URL, unchanged, so long as arXiv endures (and it’s supported by Cornell). Crucially, the maths, physics and computer science communities that use arXiv uncontroversially consider this degree of filtering and permanence sufficient to constitute a published, citeable source.

3. At PLOS ONE, your paper only gets published if it’s been through peer-review — but the reviewing criteria pertain only to scientific soundness and do not attempt to evaluate likely impact or importance.

4. At JVP and other conventional journals, your paper has to make it through a two-pronged peer-review process: it has to be judged both sound scientifically (as at PLOS ONE) and also sufficiently on-topic and important to merit appearing in the journal.

5. Finally, at Nature and Science, your paper has to be sound and be judged sexy — someone has to guess that it’s going to prove important and popular.

Where along this continuum does the formal scientific record begin? We could make a case that all of it counts, provided that measures are taken to make the SV-POW! posts permanent and immutable. (This can be done submitting them to WebCite or to a service such as Nature Precedings used to provide.) But whether or not you accept that, it seems clear that arXiv and upwards is permanent, scientific and citeable.

This raises an interesting question: do we actually need to go ahead and publish our neck-anatomy paper in a more conventional venue? I’m honestly not sure at the moment, and I’d be interested to hear arguments in either direction. In terms of the progress of science, probably not: our actual work is out there, now, for the world to use as it sees fit. But from a career perspective, it’s probably still worth our while to get it into a journal, just so it can sit more neatly on our publication lists and help Matt’s tenure case more. And yet I don’t honestly expect any eventual journal-published version to be better in any meaningful way than the one on arXiv. After all, it’s already benefitted from two rounds of peer-review, three if you count the comments of my dissertation examiners. More likely, a journal will be less useful, as we have to cut length, eliminate illustrations, and so on.

So it seems to me that we have a hard choice ahead of us now. Call that paper done and more onto making more science? Or spend more time and effort on re-publishing it in exchange for prestige? I really don’t know.

For what it’s worth, it seems that standard practice in maths, physics and computer science is to republish arXiv articles in journals. But there are some scientists who routinely do not do this, instead allowing the arXiv version to stand as the only version of record. Perhaps that is a route best left to tenured greybeards rather than bright young things like Matt.

Figure 5. Simplified myology of that sauropod neck, in left lateral view, based primarily on homology with birds, modified from Wedel and Sanders (2002, figure 2). Dashed arrows indicate muscle passing medially behind bone. A, B. Muscles inserting on the epipophyses, shown in red. C, D, E. Muscles inserting on the cervical ribs, shown in green. F, G. Muscles inserting on the neural spine, shown in blue. H. Muscles inserting on the ansa costotransversaria (“cervical rib loop”), shown in brown. Specifically: A. M. longus colli dorsalis. B. M. cervicalis ascendens. C. M. flexor colli lateralis. D. M. flexor colli medialis. E. M. longus colli ventralis. In birds, this muscle originates from the processes carotici, which are absent in the vertebrae of sauropods. F. Mm. intercristales. G. Mm. interspinales. H. Mm. intertransversarii. Vertebrae modified from Gilmore (1936, plate 24).

Citing papers in arXiv

Finally, a practicality: since it’ll likely be a year or more before any journal-published version of our neck-anatomy paper comes out, people wanting to use it in their own work will need to know how to cite a paper in arXiv. Standard procedure seems to be just to use authors, year, title and arXiv ID. But in a conventional-journal citation, I like the way that the page-range gives you a sense of how long the paper is. So I think it’s worth appending page-count to the citations. And while you’re at it, you may as well throw in the figure and table counts, too, yielding the version that we’ve been using:

  • Taylor, Michael P., and Mathew J. Wedel. 2012. Why sauropods had long necks; and why giraffes have short necks. arXiv:1209.5439. 39 pages, 11 figures, 3 tables.

An interesting conversation arose in the comments to Matt’s last post — interesting to me, at least, but then since I wrote much of it, I am biased.  I think it merits promotion to its own post, though.  Paul Graham, among many others, has written about how one of the most important reasons to write about a subject is that the process of doing so helps you work through exactly what you think about it.  And that is certainly what’s happening to me in this series of Open Access posts.

Dramatis personae

Liz Smith: Director of Global Internal Communications at Elsevier
Mike Taylor: me, your co-host here at SV-POW!
Andy Farke: palaeontologist, ceratopsian lover, and PLoS ONE volunteer academic editor


In a long and interesting comment, Liz wrote (among much else):

This is where there seems to be deliberate obtuseness. Sticking a single PDF up online is easy. But there are millions of papers published every year. It takes a hell of a lot of people and resources to make that happen. You can’t just sling it online and hope somebody can find it. The internet doesn’t happen by magic.

And I replied:

Actually, you can and I do. That is exactly how the Internet works. I don’t have to do anything special to make sure my papers are found — Google and other search engines pick them up, just like they do everything. So to pick an example at random, if you search for brachiosaurus re-evaluation, the very first hit will be my self-hosted PDF of my 2009 JVP paper on that subject. [Correction: I now see that it’s the third hit; the PDF of the correction is top.] Similarly, search for xenoposeidon pdf and the top hit is — get ready for a shock! — my self-hosted PDF of my 2007 Palaeontology paper on that subject.

So in fact, this is a fine demonstration of just how obsolete much of the work that publishers do has now become — all that indexing, abstracting and aggregation, work that used to be very important, but which is now done much faster, much better, for free, by computers and networks.

Really: what advantages accrue to me in having my Xenoposeidon paper available on Wiley’s site as well as mine? [It’s paywalled on their site, so useless to 99% of potential visitors, but ignore that for now. Let’s pretend it’s freely available.] What else does that get me that Google’s indexing of my self-hosted PDF doesn’t?

Liz is quite rightly taking a break over the weekend, so she’s not yet replied to this; but Andy weighed in with some important points:

To address your final statement, I see three main advantages to having a PDF on a publisher’s site, rather than just a personal web page (this follows some of our Twitter discussion the other day, but I post it here just to have it in an alternative forum):

1) Greater permanence. Personal web pages (even with the best of intentions) have a history of non-permanence; there is no guarantee your site will be around 40 or 50 years from now. Just ask my Geocities page from 1998. Of course, there also is no guarantee that Wiley’s website will be around in 2073 either, but I think it’s safe to say there’s a greater likelihood that it will be around in some incarnation than a personal website.

2) Document security. By putting archiving in the hands of the authors, there is little to prevent them from editing out embarrassing details, or adding in stuff they wanted published but the reviewers told them to take out, or whatever. I’m not saying this is something that most people would do, but it is a risk of not having an “official” copy somewhere.

3) Combating author laziness. You have an excellent track record of making your work available, but most other authors do not, for various reasons.

It is also important to note that none of the above requirements needs a commercial publisher – in fact, they would arguably be better served by taking them out of the commercial sector. My main point is that self-hosting, although a short-term solution for distribution and archival, is not a long-term one.

Finally, just as a minor pedantic note, search results depend greatly on the search engine used. Baidu – probably the most popular search engine in China – doesn’t give your self-hosted PDF anywhere in its three pages of search results (neither does it give Wiley’s version, though).

And now, here is my long reply — the one that, when I’d finished it, made me want to post this as an article:

On permanence, there are a few things to say. One is that with the rate of mergers, web-site “upgrades” and suchlike I am actually far from confident that (say) the Wiley URL for my Xenoposeidon paper will last longer than my own. In fact, let’s make it a challenge! :-) If theirs goes away, you buy me a beer; if mine does, I buy you one! But I admit that, as an IT professional who’s been running a personal website since the 1990s — no Geocities for me! — I am not a typical case.

But the more important point is that it doesn’t matter. The Web doesn’t actually run on permanent addresses, it runs on what gets indexed. If I deleted my Xenoposeidon PDF today and put it up somewhere else — say, directly on SV-POW! — within a few days it would be indexed again, and coming out at or near the top of search-engine results. Librarians and publishers used to have a very important curation role — abstracting and indexing and all that — but the main reason they keep doing these things now is habit.

And that’s because of the wonderful loosely coupled nature of the Internet. Back when people first started posting research papers on the web, there were no search engines — CERN, famously, maintained a list of all the world’s web-sites. Search engines and crawlers as we know them today were never part of the original vision of the web: they were invented and put together from spare parts. And that is the glory of the open web. The people at Yahoo and AltaVista and Google didn’t need anyone’s permission to start crawling and indexing — they didn’t need to sign up to someone’s Developer Partnership Program and sign a non-disclosure form before they were allowed to see the API documentation, and then apply for an API Key that is good for up to 100 accesses per day. All these encumberances apply when you try to access data in publishers’ silos (trust me: my day-job employers have just spent literally months trying to suck the information out of Elsevier that is necessary to use their crappy 2001-era SOAP-based web services to search metadata. Not even content.) And this is why I can’t get remotely excited about things like ScienceDirect and Scopus. Walled gardens can give us some specific functionality, sure, but they will always be limited by what the vendor thinks of, and what the vendor can turn a profit on. Whereas if you just shove things up on the open web, anyone can do anything with them.

With that said, your point about document security is well made — we do need some system for preventing people from tampering with versions of record. Perhaps something along the lines of the DOI register maintaining an MD5 checksum of the version-of-record PDF?

You are also right that not all authors will bother to post their PDFs — though frankly, heaven alone knows why not, when it takes five minutes to do something that will triple the accessibility of work you’ve spent a year on. This seems like an argument for repositories (whether institutional or subject-based) and mandatory deposition — e.g. as a condition of a grant.

Is that the same as the Green OA route? No, I want to see version-of-record PDFs reposited, not accepted manuscripts — for precisely the anti-tampering reason you mention above, among other reasons. Green OA is much, much better than nothing. But it’s not the real thing.

Finally: if Baidu lists neither my self-hosted Xenoposeidon PDF or Wiley’s version anywhere in its first three pages of search results, then it is Just Plain Broken. I can’t worry about the existence of broken tools. Someone will make a better one and knock it off its perch, just like Google did to AltaVista.

And there, for the moment, matters stand.  I’m sure that Liz and Andy, and hopefully others, will have more to say.

One of the things I like about this is the way that a discussion that was originally about publisher behaviour mutated into one on the nature of the Open Web — really, where we ended up is nothing to do with Open Access per se.  The bottom line is that free systems (and here I mean free-as-in-freedom, not zero-cost) don’t just open up more opportunities than proprietary ones, they open up more kinds of opportunities, including all kinds of ideas that the original group never even thought of.

And that, really — bringing it all back to where we started — is why I care about Open Access.  Full, BOAI-compliant, Open Access.  Not just so that people can read papers at zero cost (important though that is), but so that we and a million other groups around the world can use them to build things that we haven’t even thought of yet — things as far advanced beyond the current state of the art as Google is over CERN’s old static list of web-sites.