New (but very old) preprint: A survey of dinosaur diversity by clade, age, place of discovery and year of description
July 11, 2014
Today, available for the first time, you can read my 2004 paper A survey of dinosaur diversity by clade, age, place of discovery and year of description. It’s freely available (CC By 4.0) as a PeerJ Preprint. It’s one of those papers that does exactly what it says on the tin — you should be able to find some interesting patterns in the diversity of your own favourite dinosaur group.
“But Mike”, you say, “you wrote this thing ten years ago?”
Yes. It’s actually the first scientific paper I ever wrote (bar some scraps of computer science) beginning in 2003. It’s so old that all the illustrations are grey-scale. I submitted it to Acta Palaeontologica Polonica way back on on 24 October 2004 (three double-spaced hard-copies in the post!) , but it was rejected without review. I was subsequently able to publish a greatly truncated version (Taylor 2006) in the proceedings of the 2006 Symposium on Mesozoic Terrestrial Ecosystems, but that was only one tenth the length of the full manuscript — much potentially valuable information was lost.
My finally posting this comes (as so many things seem to) from a conversation with Matt. Off work sick, he’d been amusing himself by re-reading old SV-POW! posts (yes, we do this). He was struck by my exhortation in Tutorial 14: “do not ever give a conference talk without immediately transcribing your slides into a manuscript”. He bemoaned how bad he’s been at following that advice, and I had to admit I’ve done no better, listing a sequence of old my SVPCA talks that have still never been published as papers.
The oldest of these was my 2004 presentation on dinosaur diversity. Commenting on this, I wrote in email: “OK, I got the MTE four-pager out of this, but the talk was distilled from a 40ish-page manuscript that was never published and never will be.” Quick as a flash, Matt replied:
If I had written this and sent it to you, you’d tell me to put it online and blog about how I went from idea to long paper to talk to short paper, to illuminate the process of science.
And of course he was right — hence this preprint.
I will never update this manuscript, as it’s based on a now wildly outdated database and I have too much else happening. (For one thing, I really ought to get around to finishing up the paper based on my 2005 SVPCA talk!) So in a sense it’s odd to call it a “pre-print” — it’s not pre anything.
Despite the data being well out of date, this manuscript still contains much that is (I think) of interest, and my sense is that the ratios of taxon counts, if not the absolute numbers, are still pretty accurate.
I don’t expect ever to submit a version of this to a journal, so this can be considered the final and definitive version.
- Taylor, Michael P. 2006. Dinosaur diversity analysed by clade, age, place and year of description. pp. 134-138 in Paul M. Barrett and Susan E. Evans (eds.), Ninth international symposium on Mesozoic terrestrial ecosystems and biota, Manchester, UK. Cambridge Publications. Natural History Museum, London, UK. 187 pp.
- Taylor, Michael P. 2014 (written in 2004). A survey of dinosaur diversity by clade, age, place of discovery and year of description. PeerJ PrePrints 2:e434v1. doi:10.7287/peerj.preprints.434v1
December 23, 2012
After the authors’ own work, the biggest contribution to a published paper is the reviews provided, gratis, by peers. When peer-review works as it’s supposed to, they add significant value to the final paper. But the actual reviews are never seen by anyone except the authors and the handling editor.
This is bad for several reasons.
First, good reviewers don’t get the credit they deserve. That’s unfair on those who do a good job — who generously invest a lot of time and effort in others’ work.
Second, bad reviewers don’t get the blame they deserve. That leaves them free to act in bad faith: blocking papers by people they don’t like, or whose work is critical of their own; or just doing a completely inadequate job. Because there are no negative consequences for doing a bad job, people have no external incentive to straighten up and fly right.
Third, the effort that goes into reviewing is largely wasted. Often the reviews themselves are significant pieces of work (that’s certainly true when I’m the one giving the review) and the wider community could benefit from seeing them. Frequently reviews contain extended discussion, not only of the paper’s subject matter but of scientific philosophy such as approaches to taxonomy or narrative structure.
Fourth, editors’ decisions remain unexplained. Most editors handle manucripts efficiently and fairly, but there are cases when this isn’t the case — as for example when I was one of three reviewers who wholeheartedly recommended acceptance but the editor rejected the paper. Even discussing that situation was difficult, because the reviews in question were not available for the world to read.
Fifth, and more general than any of the above, the reviewing process is opaque to the world. In times past, logistical reasons such as lack of space in printed journals meant that the sausage-machine approach to the review process was the only feasible one: no-one wants to see what goes into the machine or what goes on inside, we only want the final product. But we live in an increasingly open world, and consensus is that pretty much all processes benefit from openness.
There are various initiatives under way to change the legacy system of reviewing, including F1000 Research and the eLife decision-letter system. But at the moment only a small minority of papers are submitted to such venues.
What to do about the others?
And so I found myself wondering … what would happen if I just unilaterally posted the reviews I receive? I already make pages on this site for each of my published papers (example): it would be easy to extend those pages by also adding:
- The submitted version of the manuscript
- All the reviews I received
- The editor’s decision letter
- My response letter to the editor
- The final published paper.
I know this is “not done”. My question is: why not? Is there an actual reason, other than inertia? Wouldn’t we all be better off if this was standard operating procedure?
[Note that this is orthogonal to reviewer anonymity. As it happens, I think that is also a bad thing, but it's independent of what I'm proposing here. I could post an unsigned review as-is, without revealing who wrote it even if I knew.]
December 13, 2012
We know that most academic journals and edited volumes ask authors to sign a copyright transfer agreement before proceeding with publication. When this is done, the publisher becomes the owner of the paper; the author may retain some rights according to the grace or otherwise of the publisher.
Plenty of authors have rightly railed against this land-grab, which publishers have been quite unable to justify. On occasion we’ve found ways to avoid the transfer, including the excellent structured approach that is the SPARC Author Addendum and my tactic of transferring copyright to my wife.
Works produced by the U.S. Federal Government are not protected by copyright. For example, papers written by Bill Parker as part of his work at Petrified Forest National Park are in the public domain.
Journals know this, and have clauses in their copyright transfer agreements to deal with it. For example, Elsevier’s template agreement has a box to check that says “I am a US Government employee and there is no copyright to transfer”, and the publishing agreement itself reads as follows (emphasis added):
Assignment of publishing rights
I hereby assign to <Copyright owner> the copyright in the manuscript identified above (government authors not electing to transfer agree to assign a non-exclusive licence) and any supplemental tables, illustrations or other information submitted therewith that are intended for publication as part of or as a supplement to the manuscript (the “Article”) in all forms and media (whether now known or hereafter developed), throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication.
So journals and publishers are already set up to deal with public domain works that have no copyright. And that made me wonder why this option should be restricted to U.S. Federal employees.
What would happen if I just unilaterally place my manuscript in the public domain before submitting it? (This is easy to do: you can use the Creative Commons CC0 tool.)
Once I’d done that, I would be unable to sign a copyright transfer agreement. Not merely unwilling — I wouldn’t need to argue with publishers, “Oh, I don’t want to sign that”. It would be simpler than this. It’s would just be “There is no copyright to transfer”.
What would publishers say?
What could they say?
“We only publish public-domain works if they were written by U.S. federal employees”?
February 18, 2012
An interesting conversation arose in the comments to Matt’s last post — interesting to me, at least, but then since I wrote much of it, I am biased. I think it merits promotion to its own post, though. Paul Graham, among many others, has written about how one of the most important reasons to write about a subject is that the process of doing so helps you work through exactly what you think about it. And that is certainly what’s happening to me in this series of Open Access posts.
Liz Smith: Director of Global Internal Communications at Elsevier
Mike Taylor: me, your co-host here at SV-POW!
Andy Farke: palaeontologist, ceratopsian lover, and PLoS ONE volunteer academic editor
In a long and interesting comment, Liz wrote (among much else):
This is where there seems to be deliberate obtuseness. Sticking a single PDF up online is easy. But there are millions of papers published every year. It takes a hell of a lot of people and resources to make that happen. You can’t just sling it online and hope somebody can find it. The internet doesn’t happen by magic.
And I replied:
Actually, you can and I do. That is exactly how the Internet works. I don’t have to do anything special to make sure my papers are found — Google and other search engines pick them up, just like they do everything. So to pick an example at random, if you search for brachiosaurus re-evaluation, the very first hit will be my self-hosted PDF of my 2009 JVP paper on that subject. [Correction: I now see that it's the third hit; the PDF of the correction is top.] Similarly, search for xenoposeidon pdf and the top hit is — get ready for a shock! — my self-hosted PDF of my 2007 Palaeontology paper on that subject.
So in fact, this is a fine demonstration of just how obsolete much of the work that publishers do has now become — all that indexing, abstracting and aggregation, work that used to be very important, but which is now done much faster, much better, for free, by computers and networks.
Really: what advantages accrue to me in having my Xenoposeidon paper available on Wiley’s site as well as mine? [It's paywalled on their site, so useless to 99% of potential visitors, but ignore that for now. Let's pretend it's freely available.] What else does that get me that Google’s indexing of my self-hosted PDF doesn’t?
Liz is quite rightly taking a break over the weekend, so she’s not yet replied to this; but Andy weighed in with some important points:
To address your final statement, I see three main advantages to having a PDF on a publisher’s site, rather than just a personal web page (this follows some of our Twitter discussion the other day, but I post it here just to have it in an alternative forum):
1) Greater permanence. Personal web pages (even with the best of intentions) have a history of non-permanence; there is no guarantee your site will be around 40 or 50 years from now. Just ask my Geocities page from 1998. Of course, there also is no guarantee that Wiley’s website will be around in 2073 either, but I think it’s safe to say there’s a greater likelihood that it will be around in some incarnation than a personal website.
2) Document security. By putting archiving in the hands of the authors, there is little to prevent them from editing out embarrassing details, or adding in stuff they wanted published but the reviewers told them to take out, or whatever. I’m not saying this is something that most people would do, but it is a risk of not having an “official” copy somewhere.
3) Combating author laziness. You have an excellent track record of making your work available, but most other authors do not, for various reasons.
It is also important to note that none of the above requirements needs a commercial publisher – in fact, they would arguably be better served by taking them out of the commercial sector. My main point is that self-hosting, although a short-term solution for distribution and archival, is not a long-term one.
Finally, just as a minor pedantic note, search results depend greatly on the search engine used. Baidu – probably the most popular search engine in China – doesn’t give your self-hosted PDF anywhere in its three pages of search results (neither does it give Wiley’s version, though).
And now, here is my long reply — the one that, when I’d finished it, made me want to post this as an article:
On permanence, there are a few things to say. One is that with the rate of mergers, web-site “upgrades” and suchlike I am actually far from confident that (say) the Wiley URL for my Xenoposeidon paper will last longer than my own. In fact, let’s make it a challenge! :-) If theirs goes away, you buy me a beer; if mine does, I buy you one! But I admit that, as an IT professional who’s been running a personal website since the 1990s — no Geocities for me! — I am not a typical case.
But the more important point is that it doesn’t matter. The Web doesn’t actually run on permanent addresses, it runs on what gets indexed. If I deleted my Xenoposeidon PDF today and put it up somewhere else — say, directly on SV-POW! — within a few days it would be indexed again, and coming out at or near the top of search-engine results. Librarians and publishers used to have a very important curation role — abstracting and indexing and all that — but the main reason they keep doing these things now is habit.
And that’s because of the wonderful loosely coupled nature of the Internet. Back when people first started posting research papers on the web, there were no search engines — CERN, famously, maintained a list of all the world’s web-sites. Search engines and crawlers as we know them today were never part of the original vision of the web: they were invented and put together from spare parts. And that is the glory of the open web. The people at Yahoo and AltaVista and Google didn’t need anyone’s permission to start crawling and indexing — they didn’t need to sign up to someone’s Developer Partnership Program and sign a non-disclosure form before they were allowed to see the API documentation, and then apply for an API Key that is good for up to 100 accesses per day. All these encumberances apply when you try to access data in publishers’ silos (trust me: my day-job employers have just spent literally months trying to suck the information out of Elsevier that is necessary to use their crappy 2001-era SOAP-based web services to search metadata. Not even content.) And this is why I can’t get remotely excited about things like ScienceDirect and Scopus. Walled gardens can give us some specific functionality, sure, but they will always be limited by what the vendor thinks of, and what the vendor can turn a profit on. Whereas if you just shove things up on the open web, anyone can do anything with them.
With that said, your point about document security is well made — we do need some system for preventing people from tampering with versions of record. Perhaps something along the lines of the DOI register maintaining an MD5 checksum of the version-of-record PDF?
You are also right that not all authors will bother to post their PDFs — though frankly, heaven alone knows why not, when it takes five minutes to do something that will triple the accessibility of work you’ve spent a year on. This seems like an argument for repositories (whether institutional or subject-based) and mandatory deposition — e.g. as a condition of a grant.
Is that the same as the Green OA route? No, I want to see version-of-record PDFs reposited, not accepted manuscripts — for precisely the anti-tampering reason you mention above, among other reasons. Green OA is much, much better than nothing. But it’s not the real thing.
Finally: if Baidu lists neither my self-hosted Xenoposeidon PDF or Wiley’s version anywhere in its first three pages of search results, then it is Just Plain Broken. I can’t worry about the existence of broken tools. Someone will make a better one and knock it off its perch, just like Google did to AltaVista.
And there, for the moment, matters stand. I’m sure that Liz and Andy, and hopefully others, will have more to say.
One of the things I like about this is the way that a discussion that was originally about publisher behaviour mutated into one on the nature of the Open Web — really, where we ended up is nothing to do with Open Access per se. The bottom line is that free systems (and here I mean free-as-in-freedom, not zero-cost) don’t just open up more opportunities than proprietary ones, they open up more kinds of opportunities, including all kinds of ideas that the original group never even thought of.
And that, really — bringing it all back to where we started — is why I care about Open Access. Full, BOAI-compliant, Open Access. Not just so that people can read papers at zero cost (important though that is), but so that we and a million other groups around the world can use them to build things that we haven’t even thought of yet — things as far advanced beyond the current state of the art as Google is over CERN’s old static list of web-sites.