Recently, I published an old manuscript of mine as a PeerJ Preprint.

I wrote this paper in 2003-4, and it was rejected without review when I submitted it back then. (For, I think, specious reasons, but that’s a whole nother discussion. Forget I mentioned it.)

I haven’t touched the manuscript since then (except to single-space it for submission as a preprint). It’s ten years old. That’s a problem because it’s an analysis of a database of dinosaur diversity, and as everyone knows, the rate of recognising new dinosaurs has gone through the roof. That’s the reason I never made any attempt to update and resubmit it: dinosaur diversity is a fast-moving target, and each time through the submit-reject cycle takes long enough for the data to be outdated.

So much for the history. Now the question: how should I cite this paper? Specifically, what date should I give it? If I cite it as from 2004, it will give the misleading impression that the paper has been available for ten years; but if I cite it as from 2014, it will imply that it’s been worked on at some point in the last ten years. Both approaches seem misleading to me.

At the moment, I am citing it as “Taylor (2014 for 2004)”, which seems to more or less capture what’s meant, but I don’t know whether it’s an established convention. Is there an established convention?

Releated: where in mv publications list should it appear? At present I am sorting it under 2014, since that’s when it came out; but should it be under  2004, when it was written? I guess publication date is the one to go far — after all, it’s not unusual even now for papers to spend a year or more in press, and it’s the later (publication) date that’s cited.

Help me out. How should this be done?

References

As recently noted, it was my pleasure and privilege on 25 June to give a talk at the ESOF2014 conference in Copenhagen (the EuroScience Open Forum). My talk was one of four, followed by a panel discussion, in a session on the subject “Should science always be open?“.

Banner

I had just ten minutes to lay out the background and the problem, so it was perhaps a bit rushed. But you can judge for yourself, because the whole session was recorded on video. The image is not the greatest (it’s hard to make out the slides) and the audio is also not all it could be (the crowd noise is rather loud). But it’s not too bad, and I’ve embedded it below. (I hope the conference organisers will eventually put out a better version, cleaned up by video professionals.)

Subbiah Arunachalam (from Arun, Chennai, India) asked me whether the full text of the talk was available — the echoey audio is difficult for non-native English speakers. It wasn’t but I’ve sinced typed out a transcript of what I said (editing only to remove “er”s and “um”s), and that is below. Finally, you may wish to follow the slides rather than the video: if so, they’re available in PowerPoint format and as a PDF.

Enjoy!

It’s very gracious of you all to hold this conference in English; I deeply appreciate it.

“Should science always be open?” is our question, and I’d like to open with one of the greatest scientists there’s ever been, Isaac Newton, who humility didn’t come naturally to. But he did manage to say this brilliant humble thing: “If I have seen further, it’s by standing on the shoulders of giants.”

And the reason I love this quote is not just because it’s insightful in itself, but because he stole it from something John of Salisbury said right back in 1159. “Bernard of Chartres used to say that we were like dwarfs seated on the shoulders of giants. If we see more and further than they, it is not due to our own clear eyes or tall bodies, but because we are raised on high and upborne by their gigantic bigness.”

Well, so Newton — I say he stole this quote, but of course he did more than that: he improved it. The original is long-winded, it goes around the houses. But Newton took that, and from that he made something better and more memorable. So in doing that, he was in fact standing on the shoulders of giants, and seeing further.

And this is consistently where progress comes from. It’s very rare that someone who’s locked in a room on his own thinking about something will have great insights. It’s always about free exchange of ideas. And we see this happening in lots of different fields.

Over the last ten or fifteen years, enormous advances in the kinds of things computers working in networks can do. And that’s come from the culture of openness in APIs and protocols, in Silicon Valley and elsewhere, where these things are designed.

Going back further and in a completely different field, the Impressionist painters of Paris lived in a community where they were constantly — not exactly working together, but certainly nicking each other’s ideas, improving each other’s techniques, feeding back into this developing sense of what could be done. Resulting in this fantastic art.

And looking back yet further, Florence in the Renaissance was a seat of all sorts of advances in the arts and the sciences. And again, because of this culture of many minds working together, and yielding insights and creativity that would not have been possible with any one of them alone.

And this is because of network effects; or Metcalfe’s Law expresses this by saying that the value of a network is proportional to the square of the number of nodes in that network. So in terms of scientific reasearch, what that means is that if you have a corpus of published research output, of papers, then the value of that goes — it doesn’t just increase with the number of papers, it goes up with the square of the number of papers. Because the value isn’t so much in the individual bits of research, but in the connections between them. That’s where great ideas come from. One researcher will read one paper from here and one from here, and see where the connection or the contradiction is; and from that comes the new idea.

So it’s very important to increase the size of the network of what’s available. And that’s why we have a very natural tendency, I think among scientists particularly, but I think we can say researchers in other areas as well, have a natural tendency to share.

Now until recently, the big difficulty we’ve had with sharing has been logistical. It was just difficult to make and distribute copies of pieces of research. So this [picture of a printing press] is how we made copies, this [picture of stacks of paper] was what we stored them on, and this was how we transmitted them from one researcher to another.

And they were not the most efficient means, or at least not as efficient as what we now have available. And because of that, and because of the importance of communication and the links between research, I would argue that maybe the most important invention of the last hundred years is the Internet in general and the World Wide Web in particular. And the purpose of the Web, as it was initially articulated in the first public post that Tim Berners-Lee made in 1991 — he explained not just what the Web was but what it was for, and he said: “The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.”

So that’s what the Web is for; and here’s why it’s important. I’m quoting here from Cameron Neylon, who’s great at this kind of thing. And again it comes down to connections, and I’m just going to read out loud from his blog: “Like all developments of new communication networks, SMS, fixed telephones, the telegraph, the railways, and writing itself, the internet doesn’t just change how well we can do things, it qualitatively changes what we can do.” And then later on in the same post: “At network scale the system ensures that resources get used in unexpected ways. At scale you can have serendipity by design, not by blind luck.”

Now that’s a paradox; it’s almost a contradiction, isn’t it? Serendipity by definition is what you get by blind luck. But the point is, when you have enough connections — enough papers floating around the same open ecosystem — all the collisions happening between them, it’s inevitable that you’re going to get interesting things coming out. And that’s what we’re aiming towards.

And of course it’s never been more important, with health crises, new diseases, the diminishing effectiveness of antibiotics, the difficulties of feeding a world of many billions of people, and the results of climate change. It’s not as though we’re short of significant problems to deal with.

So I love this Jon Foley quote. He said, “Your job” — as a researcher — “Your job is not to get tenure! Your job is to change the world”. Tenure is a means to an end, it’s not what you’re there for.

So this is the importance of publishing. Of course the word “publish” comes from the same root as the word “public”: to publish a piece of research means to make that piece of research public. And the purpose of publishing is to open research up to the world, and so open up the world itself.

And that’s why it’s so tragic when we run into this [picture of a paywalled paper]. I think we’ve all seen this at various times. You go to read a piece of research that’s valuable, that’s relevant to either the research you’re doing, or the job you’re doing in your company, or whatever it might be. And you run into this paywall. Thirty five dollars and 95 cents to read this paper. It’s a disaster. Because what’s happened is we’ve got a whole industry whose existence is to make things public, and who because of accidents of history have found themselves doing the exact opposite. Now no-one goes into publishing with the intent of doing this. But this is the unfortunate outcome.

So what we end up with is a situation where we’re re-imposing on the research community barriers that were necessarily imposed by the inadequate technology of 20 or 30 years ago, but which we’ve now transcended in technological terms but we’re still strugging with for, frankly, commercial reasons. This is why we’re struggling with this.

And I don’t like to be critical, but I think we have to just face the fact that there is a real problem when organisations, for many years have been making extremely high profits — these [36%, 32%, 34%, 42%] are the profit margins of the “big four” academic publishers which together hugely dominate the scholarly publishing market — and as you can see they’re in the range 32% to 42% of revenue, is sheer profit. So every time your university library spends a dollar on subscriptions, 40% of that goes straight out of the system to nowhere.

And it’s not surprising that these companies are hanging on desperately to the business model that allows them to do that.

Now the problem we have in advocating for open access is that when we stand against publishers who have an existing very profitable business model, they can complain to governments and say, “Look, we have a market that’s economically significant, it’s worth somewhere in the region of 10-15 billion US dollars a year.” And they will say to governments, “You shouldn’t do anything that might damage this.” And that sounds effective. And we struggle to argue against that because we’re talking about an opportunity cost, which is so much harder to measure.

You know, I can stand here — as I have done — and wave my hands around, and talk about innovation and opportunity, and networks and connections, but it’s very hard to quantify in a way that can be persuasive to people in a numeric way. Say, they have a 15 billion dollar business, we’re talking about saving three trillion’s worth of economic value (and I pulled that number out of thin air). So I would love, if we can, when we get to the discussions, to brainstorm some way to quantify the opportunity cost of not being open. But this is what it looks like [picture of flooding due to climate change]. Economically I don’t know what it’s worth. But in terms of the world we live in, it’s just essential.

So we’ve got to remember the mission that we’re on. We’re not just trying to save costs by going to open access publishing. We’re trying to transform what research is, and what it’s for.

So should science always be open? Of course, the name of the session should have been “Of course science should always be open”.

 

Today, available for the first time, you can read my 2004 paper A survey of dinosaur diversity by clade, age, place of discovery and year of description. It’s freely available (CC By 4.0) as a PeerJ Preprint. It’s one of those papers that does exactly what it says on the tin — you should be able to find some interesting patterns in the diversity of your own favourite dinosaur group.

Fig. 1. Breakdown of dinosaur diversity by phylogeny. The number of genera included in each clade is indicated in parentheses. Non-terminal clades additionally have, in square brackets, the number of included genera that are not also included in one of the figured subclades. For example, there are 63 theropods that are neither carnosaurs nor coelurosaurs. The thickness of the lines is proportional to the number of genera in the clades they represent.

Taylor (2014 for 2004), Figure 1. Breakdown of dinosaur diversity by phylogeny. The number of genera included in each clade is indicated in parentheses. Non-terminal clades additionally have, in square brackets, the number of included genera that are not also included in one of the figured subclades. For example, there are 63 theropods that are neither carnosaurs nor coelurosaurs. The thickness of the lines is proportional to the number of genera in the clades they represent.

“But Mike”, you say, “you wrote this thing ten years ago?”

Yes. It’s actually the first scientific paper I ever wrote (bar some scraps of computer science) beginning in 2003. It’s so old that all the illustrations are grey-scale. I submitted it to Acta Palaeontologica Polonica way back on on 24 October 2004 (three double-spaced hard-copies in the post!) , but it was rejected without review. I was subsequently able to publish a greatly truncated version (Taylor 2006) in the proceedings of the 2006 Symposium on Mesozoic Terrestrial Ecosystems, but that was only one tenth the length of the full manuscript — much potentially valuable information was lost.

My finally posting this comes (as so many things seem to) from a conversation with Matt. Off work sick, he’d been amusing himself by re-reading old SV-POW! posts (yes, we do this). He was struck by my exhortation in Tutorial 14: “do not ever give a conference talk without immediately transcribing your slides into a manuscript”. He bemoaned how bad he’s been at following that advice, and I had to admit I’ve done no better, listing a sequence of old my SVPCA talks that have still never been published as papers.

The oldest of these was my 2004 presentation on dinosaur diversity. Commenting on this, I wrote in email: “OK, I got the MTE four-pager out of this, but the talk was distilled from a 40ish-page manuscript that was never published and never will be.” Quick as a flash, Matt replied:

If I had written this and sent it to you, you’d tell me to put it online and blog about how I went from idea to long paper to talk to short paper, to illuminate the process of science.

And of course he was right — hence this preprint.

Fig. 2. Breakdown of dinosaurian diversity by high-level taxa. "Other sauropodomorphs" are the "prosauropods" sensu lato. "Other theropods" include coelophysoids, neoceratosaurs, torvosaurs (= megalosaurs) and spinosaurs. "Other ornithischians" are basal forms, including heterodontosaurs and those that fall into Marginocephalia or Thyreophora but not into a figured subclade.

Taylor (2014 for 2004), Figure 2. Breakdown of dinosaurian diversity by high-level taxa. “Other sauropodomorphs” are the “prosauropods” sensu lato. “Other theropods” include coelophysoids, neoceratosaurs, torvosaurs (= megalosaurs) and spinosaurs. “Other ornithischians” are basal forms, including heterodontosaurs and those that fall into Marginocephalia or Thyreophora but not into a figured subclade.

I will never update this manuscript, as it’s based on a now wildly outdated database and I have too much else happening. (For one thing, I really ought to get around to finishing up the paper based on my 2005 SVPCA talk!) So in a sense it’s odd to call it a “pre-print” — it’s not pre anything.

Despite the data being well out of date, this manuscript still contains much that is (I think) of interest, and my sense is that the ratios of taxon counts, if not the absolute numbers, are still pretty accurate.

I don’t expect ever to submit a version of this to a journal, so this can be considered the final and definitive version.

References

 

JZool paleoethology special issue

Got this in my inbox this morning. I presume this means that the 30 days start now. But if you’re interested in this stuff, don’t tarry.

And you should be interested in this stuff. This volume brings together some very active and knowledgeable researchers–including our fellow SV-POW!sketeer, Darren Naish, and sometime coauthor Dave Hone–writing on a broad range of interesting topics under the umbrella of behavior.

Here’s the link.

In a couple of weeks (in the early afternoon of 25 June), I’ll be speaking at ESOF 2014 (the EuroScience Open Forum) in Copenhagen, Denmark. The session I’m part of is entitled “Should science always be open?“, and the irony is not lost on me that, as that page says, “You must be registered and signed in to download session materials.”

So here is the abstract for my talk — one of four in the session, to be followed by an open discussion.

Yes, of course science should always be open!

“If I have seen further it is by standing on the shoulders of giants”, said Isaac Newton. Since the earliest days of science, progress has always been achieved by the free exchange and re-use of ideas. Understanding this, scientists have always leaned in the direction of openness. Science outside of trade secrets and state secrets has a natural tendency to be open.

Until recently, the principle barrier to sharing science has been the logistic difficulty of printing and distributing copies of papers. The World Wide Web was originally designed to solve precisely this problem. By making research freely available worldwide, the Web doesn’t just change how well we can do things, it changes what we can do. As Cameron Neylon has observed, at network scale you achieve serendipity by design, not by blind luck. At a time when the world is in dire need of scientific breakthroughs, the removal of barriers and use of content-mining promises progress in health, climate, agriculture and other crucial areas.

So it’s nothing short of tragic when publishers — whose job it is to make research public — purposely erect barriers that prevent this. The iniquity of paywalls is not just that they prevent citizens from accessing work their taxes pay for. Much more fundamentally, paywalls deliberately destroy the incredible value that the Web creates.

Openness is indispensable simply because the opportunity cost not being open is appalling and incalculable. Publishers must find business models that don’t break science, or they must go away.

The idea is to present this as slickly as possible in ten minutes, in a “TED-like” format. I might try to make a video of it here at home once I have it all straight in my mind, and all the slides done.

 

“In the public interest” is an article that was published in C&RL News back in July/August 2005. It’s Sharon Terry’s first-person account of being the parent of children with a pseudoxanthoma elasticum (PXE), a genetic disease. It recounts her and her husband’s attempts to find out about PXE, and eventually to contribute to the research on it.

Here are the lengths they were driven to early in the process:

We spent hours copying articles from bound journals. But fees gate the research libraries of private medical schools. These fees became too costly for us to manage, and we needed to gain access to the material without paying for entry into the library each time.

We learned that by volunteering at a hospital associated with a research library, we could enter the library for free. After several months of this, policies changed and we resorted to masking our outdated volunteer badge and following a legitimate student (who would distract the guard) into the library. When that became too risky we knew we would have to find a way to ac­cess information in a more cost ­effective and reasonable manner.

Did the arrival of PubMed change everything?

Today, ten years after our children’s diagnosis, I can use a wonderful, freely ac­cessible tool created by the National Library of Medicine (NLM), called PubMed. I can call up bibliographic information on the hundreds of papers relative to PXE in a few seconds. Further, I can narrow the field to just a dozen papers on which I have been an author. Then, as I click on each article, I am not able to access any of them.

And so things continue much as before:

I am still forced to do end­-runs around the system. I travel to libraries and photocopy. I hire students in large medical schools to go to the stacks and copy articles for me, I “borrow” the journal login information from colleagues.

Terry provides a prescient diagnosis of what enables this dysfunctional and exploitative system to continue — the acquiescence of researchers working under perverse incentives:

We see how the barriers to access to publicly funded science are part of a larger system that seems to place a higher value on prestigious publications, tenure, and continued public support than on ensuring the most rapid exchange of knowledge to ease human suffering

Towards the end comes this optimistic projection:

Fortunately, change is in the works. NIH Director Elias Zerhouni confirmed some months ago that the “status quo is unac­ceptable.” In fact, under his direction and endorsed by the U.S. House of Representa­tives, NIH has implemented a cost­-effective and balanced policy that, for the first time, will make virtually all NIH­-funded research free and accessible online to all Americans through the NLM.

Here we are, nine years later. PubMed Central proudly proclaims “3 MILLION Articles are archived in PMC” on its front page, which is great. Yet only in 2012 did its compliance rate reach 75% (having been at only 49% as recently as 2008). Which means that a quarter of NIH-funded research is still not available to the world.

There’s no need for me to add much commentary to this. Please go and read the original article to get the full sense of what its like for such parents. And check out the Who Needs Access? site for other (shorter) stories of non-academics who desperately need access to research.

 

[NOTE: see the updates at the bottom. In summary, there's nothing to see here and I was mistaken in posting this in the first place.]

Elsevier’s War On Access was stepped up last year when they started contacting individual universities to prevent them from letting the world read their research. Today I got this message from a librarian at my university:

babys-first-takedown

The irony that this was sent from the Library’s “Open Access Team” is not lost on me. Added bonus irony: this takedown notification pertains to an article about how openness combats mistrust and secrecy. Well. You’d almost think NPG wants mistrust and secrecy, wouldn’t you?

It’s sometimes been noted that by talking so much about Elsevier on this blog, we can appear to be giving other barrier-based publishers a free ride. If we give that impression, it’s not deliberate. By initiating this takedown, Nature Publishing Group has self-identified itself as yet another so-called academic publisher that is in fact an enemy of science.

So what next? Anyone who wants a PDF of this (completely trivial) letter can still get one very easily from my own web-site, so in that sense no damage has been done. But it does leave me wondering what the point of the Institutional Repository is. In practice it seems to be a single point of weakness allowing “publishers” to do the maximum amount of damage with a single attack.

But part of me thinks the thing to do is take the accepted manuscript and format it myself in the exact same way as Nature did, and post that. Just because I can. Because the bottom line is that typesetting is the only actual service they offered Andy, Matt and me in exchange for our right to show our work to the world, and that is a trivial service.

The other outcome is that this hardens my determination never to send anything to Nature again. Now it’s not like my research program is likely to turn up tabloid-friendly results anyway, so this is a bit of a null resolution. But you never know: if I happen to stumble across sauropod feather impressions in an overlooked Wealden fossil, then that discovery is going straight to PeerJ, PLOS, BMC, F1000 Research, Frontiers or another open-access publisher, just like all my other work.

And that’s sheer self-interest at work there, just as much as it’s a statement. I will not let my best work be hidden from the world. Why would anyone?

Let’s finish with another outing for this meme-ready image.

Publishers ... You're doing it wrong

Update (four hours later)

David Mainwaring (on Twitter) and James Bisset (in the comment below) both pointed out that I’ve not seen an actual takedown request from NPG — just the takedown notification from my own library. I assumed that the library were doing this in response to hassle from NPG, but of course it’s possible that my own library’s Open Access Team is unilaterally trying to prevent access to the work of its university’s researchers.

I’ve emailed Lyn Duffy to ask for clarification. In the mean time, NPG’s Grace Baynes has tweeted:

So it looks like this may be even more bizarre than I’d realised.

Further bulletins as events warrant.

Update 2 (two more hours later)

OK, consensus is that I read this completely wrong. Matt’s comment below says it best:

I have always understood institutional repositories to be repositories for author’s accepted manuscripts, not for publisher’s formatted versions of record. By that understanding, if you upload the latter, you’re breaking the rules, and basically pitting the repository against the publisher.

Which is, at least, not a nice thing to do to the respository.

So the conclusion is: I was wrong, and there’s nothing to see here apart from me being embarrassed. That’s why I’ve struck through much of the text above. (We try not to actually delete things from this blog, to avoid giving a false history.)

My apologies to Lyn Duffy, who was just doing her job.

Update 3 (another hour later)

This just in from Lyn Duffy, confirming that, as David and James guessed, NPG did not send a takedown notice:

Dear Mike,

This PDF was removed as part of the standard validation work of the Open Access team and was not prompted by communication from Nature Publishing. We validate every full-text document that is uploaded to Pure to make sure that the publisher permits posting of that version in an institutional repository. Only after validation are full-text documents made publicly available.

In this case we were following the regulations as stated in the Nature Publishing policy about confidentiality and pre-publicity. The policy says, ‘The published version — copyedited and in Nature journal format — may not be posted on any website or preprint server’ (http://www.nature.com/authors/policies/confidentiality.html). In the information for authors about ‘Other material published in Nature’ it says, ‘All articles for all sections of Nature are considered according to our usual conditions of publication’ (http://www.nature.com/nature/authors/gta/others.html#correspondence). We took this to mean that material such as correspondence have the same posting restrictions as other material published by Nature Publishing.

If we have made the wrong decision in this case and you do have permission from Nature Publishing to make the PDF of your correspondence publicly available via an institutional repository, we can upload the PDF to the record.

Kind regards,
Open Access Team

Appendix

Here’s the text of the original notification email so search-engines can pick it up. (If you read the screen-grab above, you can ignore this.)

University of Bristol — Pure

Lyn Duffy has added a comment

Sharing: public databases combat mistrust and secrecy
Farke, A. A., Taylor, M. P. & Wedel, M. J. 22 Oct 2009 In : Nature. 461, 7267, p. 1053

Research output: Contribution to journal › Article

Lyn Duffy has added a comment 7/05/14 10:23

Dear Michael, Apologies for the delay in checking your record. It appears that the document you have uploaded alongside this record is the publishers own version/PDF and making this version openly accessible in Pure is prohibited by the publisher, as a result the document has been removed from the record. In this particular instance the publisher would allow you to make accessible the postprint version of the paper, i.e., the article in the form accepted for publication in the journal following the process of peer review. Please upload an acceptable version of the paper if you have one. If you have any questions about this please get back to us, or send an email directly to open-access@bristol.ac.uk Kind regards, Lyn Duffy Library Open Access Team.

Follow

Get every new post delivered to your Inbox.

Join 400 other followers