Last night, I did a Twitter interview with Open Access Nigeria (@OpenAccessNG). To make it easy to follow in real time, I created a list whose only members were me and OA Nigeria. But because Twitter lists posts in reverse order, and because each individual tweet is encumbered with so much chrome, it’s rather an awkward way to read a sustained argument.

So here is a transcript of those tweets, only lightly edited. They are in bold; I am in regular font. Enjoy!

So @MikeTaylor Good evening and welcome. Twitterville wants to meet you briefly. Who is Mike Taylor?

In real life, I’m a computer programmer with Index Data, a tiny software house that does a lot of open-source programming. But I’m also a researching scientist — a vertebrate palaeontologist, working on sauropods: the biggest and best of the dinosaurs. Somehow I fit that second career into my evenings and weekends, thanks to a very understanding wife (Hi, Fiona!) …

As of a few years ago, I publish all my dinosaur research open access, and I regret ever having let any of my work go behind paywalls. You can find all my papers online, and read much more about them on the blog that I co-write with Matt Wedel. That blog is called Sauropod Vertebra Picture of the Week, or SV-POW! for short, and it is itself open access (CC By)

Sorry for the long answer, I will try to be more concise with the next question!

Ok @MikeTaylor That’s just great! There’s been so much noise around twitter, the orange colour featuring prominently. What’s that about?

Actually, to be honest, I’m not really up to speed with open-access week (which I think is what the orange is all about). I found a while back that I just can’t be properly on Twitter, otherwise it eats all my time. So these days, rather selfishly, I mostly only use Twitter to say things and get into conversations, rather than to monitor the zeitgeist.

That said, orange got established as the colour of open access a long time ago, and is enshrined in the logo:

OAlogo

In the end I suppose open-access week doesn’t hit my buttons too strongly because I am trying to lead a whole open-access life.

… uh, but thanks for inviting me to do this interview, anyway! :-)

You’re welcome @MikeTaylor. So what is open access?

Open Access, or OA, is the term describing a concept so simple and obvious and naturally right that you’d hardly think it needs a name. It just means making the results of research freely available on the Internet for anyone to read, remix and otherwise use.

You might reasonably ask, why is there any other kind of published research other than open access? And the only answer is, historical inertia. For reasons that seemed to make some kind of sense at the time, the whole research ecosystem has got itself locked into this crazy equilibrium where most published research is locked up where almost no-one can see it, and where even the tiny proportion of people who can read published works aren’t allowed to make much use of them.

So to answer the question: the open-access movement is an attempt to undo this damage, and to make the research world sane.

Are there factors perpetuating this inertia you talked about?

Oh, so many factors perpetuting the inertia. Let me list a few …

  1. Old-school researchers who grew up when it was hard to find papers, and don’t see why young whippersnappers should have it easier
  2. Old-school publishers who have got used to making profits of 30-40% turnover (they get content donated to them, then charge subscriptions)
  3. University administrators who make hiring/promotion/tenure decisions based on which old-school journals a researcher’s papers are in.
  4. Feeble politicians who think it’s important to keep the publishing sector profitable, even at the expense of crippling research.

I’m sure there are plenty of others who I’ve overlooked for the moment. I always say regarding this that there’s plenty of blame to go round.

(This, by the way, is why I called the current situation an equilibrium. It’s stable. Won’t fix itself, and needs to be disturbed.)

So these publishers who put scholarly articles behind paywalls online, do they pay the researchers for publishing their work?

HAHAHAHAHAHAHAHAHAHA!

Oh, sorry, please excuse me while I wipe the tears of mirth from my eyes. An academic publisher? Paying an author? Hahahahaha! No.

Not only do academic publishers never pay authors, in many cases they also levy page charges — that is, they charge the authors. So they get paid once by the author, in page-charges, then again by all the libraries that subscribe to read the paywalled papers. Which of course is why, even with their gross inefficiencies, they’re able to make these 30-40% profit margins.

So @MikeTaylor why do many researchers continue to take their work to these restricted access publishers and what can we do about it?

There are a few reasons that play into this together …

Part of it is just habit, especially among more senior researchers who’ve been using the same journals for 20 or 30 years.

But what’s more pernicious is the tendency of academics — and even worse, academic administrators — to evaluate research not by its inherent quality, but by the prestige of the journal that publishes it. It’s just horrifyingly easy for administrators to say “He got three papers out that year, but they were in journals with low Impact Factors.”

Which is wrong-headed on so many levels.

First of all, they should be looking at the work itself, and making an assessment of how well it was done: rigour, clarity, reproducibility. But it’s much easier just to count citations, and say “Oh, this has been cited 50 times, it must be good!” But of course papers are not always cited because they’re good. Sometimes they’re cited precisely because they’re so bad! For example, no doubt the profoundly flawed Arsenic Life paper has been cited many times — by people pointing out its numerous problems.

But wait, it’s much worse than that! Lazy or impatient administrators won’t count how many times a paper has been cited. Instead they will use a surrogate: the Impact Factor (IF), which is a measure not of papers but of journals.

Roughly, the IF measures the average number of citations received by papers that are published in the journal. So at best it’s a measure of journal quality (and a terrible measure of that, too, but let’s not get into that). The real damage is done when the IF is used to evaluate not journals, but the papers that appear in them.

And because that’s so widespread, researchers are often desperate to get their work into journals that have high IFs, even if they’re not OA. So we have an idiot situation where a selfish, rational researcher is best able to advance her career by doing the worst thing for science.

(And BTW, counter-intuitively, the number of citations an individual paper receives is NOT correlated significantly with the journal’s IF. Bjorn Brembs has discussed this extensively, and also shows that IF is correlated with retraction rate. So in many respects the high-IF journals are actually the worst ones you can possibly publish your work in. Yet people feel obliged to.)

*pant* *pant* *pant* OK, I had better stop answering this question, and move on to the next. Sorry to go on so long. (But really! :-) )

This is actually all so enlightening. You just criticised Citation Index along with Impact Factor but OA advocates tend to hold up a higher Citation Index as a reason to publish Open Access. What do you think regarding this?

I think that’s realpolitik. To be honest, I am also kind of pleased that the PLOS journals have pretty good Impact Factors: not because I think the IFs mean anything, but because they make those journals attractive to old-school researchers.

In the same way, it is a well-established fact that open-access articles tend to be cited more than paywalled ones — a lot more, in fact. So in trying to bring people across into the OA world, it makes sense to use helpful facts like these. But they’re not where the focus is.

But the last thing to say about this is that even though raw citation-count is a bad measure of a paper’s quality, it is at least badly measuring the right thing. Evaluating a paper by its journal’s IF is like judging someone by the label of their clothes

So @MikeTaylor Institutions need to stop evaluating research papers based on where they are published? Do you know of any doing it right?

I’m afraid I really don’t know. I’m not privy to how individual institution do things.

All I know is, in some countries (e.g. France) abuse of IF is much more strongly institutionalised. It’s tough for French researchers

What are the various ways researchers can make their work available for free online?

Brilliant, very practical question! There are three main answers. (Sorry, this might go on a bit …)

First, you can post your papers on preprint servers. The best known one is arXiv, which now accepts papers from quite a broad subject range. For example, a preprint of one of the papers I co-wrote with Matt Wedel is freely available on arXiv. There are various preprint servers, including arXiv for physical sciences, bioRxiv, PeerJ Preprints, and SSRN (Social Science Research Network).

You can put your work on a preprint server whatever your subsequent plans are for it — even if (for some reason) it’s going to a paywall. There are only a very few journals left that follow the “Ingelfinger rule” and refuse to publish papers that have been preprinted.

So preprints are option #1. Number 2 is Gold Open Access: publishing in an open-access journal such as PLOS ONE, a BMC journal or eLife. As a matter of principle, I now publish all my own work in open-access journals, and I know lots of other people who do the same — ranging from amateurs like me, via early-career researchers like Erin McKiernan, to lab-leading senior researchers like Michael Eisen.

There are two potential downsides to publishing in an OA journal. One, we already discussed: the OA journals in your field may not be be the most prestigious, so depending on how stupid your administrators are you could be penalised for using an OA journal, even though your work gets cited more than it would have done in a paywalled journal.

The other potential reason some people might want to avoid using an OA journal is because of Article Processing Charges (APC). Because OA publishers have no subscription revenue, one common business model is to charge authors an APC for publishing services instead. APCs can vary wildly, from $0 up to $5000 in the most extreme case (a not-very-open journal run by the AAAS), so they can be offputting.

There are three things to say about APCs.

First, remember that lots of paywalled journals demand page charges, which can cost more!

But second, please know that more than half of all OA journals actually charge no APC at all. They run on different models. For example in my own field, Acta Palaeontologica Polonica and Palaeontologia Electronica are well respected OA journals that charge no APC.

And the third thing is APC waivers. These are very common. Most OA publishers have it as a stated goal that no-one should be prevented from publishing with them by lack of funds for APCs. So for example PLOS will nearly always give a waiver when requested. Likewise Ubiquity, and others.

So there are lots of ways to have your work appear in an OA journal without paying for it to be there.

Anyway, all that was about the second way to make your work open access. #1 was preprints, #2 is “Gold OA” in OA journals …

And #3 is “Green OA”, which means publishing in a paywalled journal, but depositing a copy of the paper in an open repository. The details of how this works can be a bit complicated: different paywall-based publishers allow you to do different things, e.g. it’s common to say “you can deposit your peer-reviewed, accepted but unformatted manuscript, but only after 12 months“.

Opinions vary as to how fair or enforceable such rules are. Some OA advocates prefer Green. Others (including me) prefer Gold. Both are good.

See this SV-POW! post on the practicalities of negotiating Green OA if you’re publishing behind a paywall.

So to summarise:

  1. Deposit preprints
  2. Publish in an OA journal (getting a fee waiver if needed)
  3. Deposit postprints

I’ve written absolutely shedloads on these subjects over the last few years, including this introductory batch. If you only read one of my pieces about OA, make it this one: The parable of the farmers & the Teleporting Duplicator.

Last question – Do restricted access publishers pay remuneration to peer reviewers?

I know of no publisher that pays peer reviewers. But actually I am happy with that. Peer-review is a service to the community. As soon as you encumber it with direct financial incentives, things get more complicated and there’s more potential for Conflict of interest. What I do is, I only perform peer-reviews for open-access journals. And I am happy to put that time/effort in knowing the world will benefit.

And so we bring this edition to a close. We say a big thanks to our special guest @MikeTaylor who’s been totally awesome and instructive.

Thanks, it’s been a privilege.

I am just about out of patience with academic departments putting up endless idiot arguments about open access.

Bottom line: we pay you good money out of the public purse to do a highly desirable job where you get to work on what you love — jobs that have tens or dozens of candidates for every post. That job is: make new knowledge for the world. Not just for you and a few of your mates: for the world. If you’re not prepared to do that, then get the heck out of the job, and vacate a position for someone who will actually do what we pay them for.

Sheesh. I try to be understanding, I really do. But all this “Oh, oh, it’s not like it used to be in the old days” whining has worn me down. No, it’s not like it was in the old days, when you got paid to play, with nothing expected in return. Earn your damned keep, or get out of the road.

(And, yes, this is a toned down version of the comment I originally composed in my head.)

[Originally posted as a comment at The Guardian.]

Regulars will remember that nearly two years ago, I reviewed a paper for the Royal Society’s journal Biology Letters, recommended acceptance with only trivial changes (as did both other reviewers) and was astonished to see that it was rejected outright. There was an invitation to resubmit, with wording that made it clear that the resubmission would be treated as a brand new manuscript; but when the “resubmission” was made, it was accepted almost immediately without being sent to reviewers at all — proving that it was in fact a minor revision.

What’s worse, the published version gives the dates “Received August 21, 2012.
Accepted September 13, 2012″, for a submission-to-acceptance time of just 23 days. But my review was done before August 21. This is a clear falsifying of the true time taken to process the manuscript, a misrepresentation unworthy of the Royal Society, and which provoked Matt and me to declare that we would no longer provide peer-review for the Society until they fix this.

By the way, we should be clear that the Royal Society is not the only publisher that does this. For example, one commenter had had the same experience with Molecular Ecology. Misreporting the submission/revision cycle like this works to publishers’ benefit in two ways: it makes them look faster than they really are, and makes the rejection rate look higher (which a lot of people still use as a proxy for prestige).

To the Society’s credit, they were quick to get in touch, and I had what at time seemed like a fruitful conversation with Dr Stuart Taylor, their Commercial Director. The result was that they made some changes:

  • Editors now have the additional decision option of ‘revise’. This provides a middle way between ‘reject and resubmit’ and ‘accept with minor revisions’. [It's hard to believe this didn't exist before, but I guess it's so.]
  • The Society now publicises ‘first decision’ times rather than ‘first acceptance’ times on their website.

As I noted at the time, while this is definitely progress, it doesn’t (yet) fix the problem.

A few days ago, I checked whether things have improved by looking at a recent article, and was disappointed to see that they had not. I posted two tweets:

Again, I want to acknowledge that the Royal Society is taking this seriously: less than a week later I heard from Phil Hurst at the Society:

I was rather surprised to read your recent tweets about us not fixing this bug. I thought it was resolved to your satisfaction.

I replied:

Because newly published articles still only have two dates (submitted and accepted) it’s impossible to tell whether the “submitted” date is that of the original submission (which would be honest) or that of the revision, styled “a new submission” even though it’s not, that follows a “reject and resubmit” verdict.

Also: if the journals are still issuing “reject and resubmit” and then accepting the supposed new submissions without sending them out for peer-review (I can’t tell whether this is the case) then that is also wrong.

Sorry to be so hard to satisfy :-) I hope you will see and agree that it comes from a desire to have the world’s oldest scientific society also be one that leads the way in transparency and honesty.

And Phil’s response (which I quote with his kind permission):

I feel the changes we have made provide transparency.

Now that the Editors have the ‘revise’ option, this revision time is now incorporated in the published acceptance times. If on the other hand the ‘reject and resubmit’ option is selected, the paper has clearly been rejected and the author may or may not re-submit. Clearly if a paper had been rejected from another journal and then submitted to us, we would not include the time spent at that journal, so I feel our position is logical.

We only advertise the average ‘receipt to first decision’ time. As stated previously, we feel this is more meaningful as it gives prospective authors an indication of the time, irrespective of decision.

After all that recapitulation, I am finally in a position to lay out what the problems are, as I perceive them, in how things currently stand.

  1. Even in recently published articles, only two dates are given: “Received May 13, 2014. Accepted July 8, 2014″. It’s impossible to tell whether the first of those dates is that of the original submission, or the “new submission” that is really a minor revision following a reject-and-resubmit verdict.
  2. It’s also impossible to tell what “receipt to first decision” time is in the journal’s statistics. Is “receipt” the date of the revision?
  3. We don’t know what the journals’ rejection rates mean. Do they include the rejections of articles that are in fact published a couple of weeks later?

So we have editorials like this one from 2012 that trumpet a rejection rate of 78% (as though wasting the time of 78% of their authors is something to be proud of), but we have no idea what that number represents. Maybe they reject all articles initially, then accept 44% of them immediately on resubmission, and call that a 22% acceptance rate. We just can’t tell.

All of this uncertainly comes from the same root cause: the use of “reject and resubmit” to mean “accept with minor revisions”.

What can the Royal Society do to fix this? Here is one approach:

  1. Each article should report three dates instead of two. The date of initial submission, the date of resubmission, and the date of acceptance. Omitting the date of initial submission is actively misleading.
  2. For each of the statistics they report, add prose that is completely clean on what is being measured. In particular, be clear about what “receipt” means.

But a much better and simpler and more honest approach is just to stop issuing “reject and resubmit” verdicts for minor revisions. All the problems just go away then.

“Minor revisions” should mean “we expect the editor to be able to make a final decision based on the changes you make”.

“Major revisions” should mean “we expect to send the revised manuscript back out to the reviewers, so they can judge whether you’ve made the necessary changes”.

And “reject and resubmit” should mean “this paper is rejected. If you want to completely retool it and resubmit, feel free”. It is completely inappropriate to accept a resubmitted paper without sending it out to peer review: doing so unambiguously gives the lie to the claim in the decision letter that “The resubmission will be treated as a new manuscript”.

Come on, Royal Society. You’ve been publishing science since 1665. Three hundred and forty-nine years should be long enough to figure out what “reject” means. You’re better than this.

And once the Royal Society gets this fixed, it will become much easily to persuade other publishers who’ve been indulging in this shady practice to mend their ways, too.

Recently, I published an old manuscript of mine as a PeerJ Preprint.

I wrote this paper in 2003-4, and it was rejected without review when I submitted it back then. (For, I think, specious reasons, but that’s a whole nother discussion. Forget I mentioned it.)

I haven’t touched the manuscript since then (except to single-space it for submission as a preprint). It’s ten years old. That’s a problem because it’s an analysis of a database of dinosaur diversity, and as everyone knows, the rate of recognising new dinosaurs has gone through the roof. That’s the reason I never made any attempt to update and resubmit it: dinosaur diversity is a fast-moving target, and each time through the submit-reject cycle takes long enough for the data to be outdated.

So much for the history. Now the question: how should I cite this paper? Specifically, what date should I give it? If I cite it as from 2004, it will give the misleading impression that the paper has been available for ten years; but if I cite it as from 2014, it will imply that it’s been worked on at some point in the last ten years. Both approaches seem misleading to me.

At the moment, I am citing it as “Taylor (2014 for 2004)”, which seems to more or less capture what’s meant, but I don’t know whether it’s an established convention. Is there an established convention?

Releated: where in mv publications list should it appear? At present I am sorting it under 2014, since that’s when it came out; but should it be under  2004, when it was written? I guess publication date is the one to go far — after all, it’s not unusual even now for papers to spend a year or more in press, and it’s the later (publication) date that’s cited.

Help me out. How should this be done?

References

arborization of science

Modified from an original SEM image of branching blood vessels, borrowed from http://blogs.uoregon.edu/artofnature/2013/12/03/fractal-of-the-week-blood-vessels/.

I was reading a rant on another site about how pretentious it is for intellectuals and pseudo-intellectuals to tell the world about their “media diets” and it got me thinking–well, angsting–about my scientific media diet.

And then almost immediately I thought, “Hey, what am I afraid of? I should just go tell the truth about this.”

And that truth is this: I can’t tell you what forms of scientific media I keep up with, because I don’t feel like I am actually keeping up with any of them.

Papers – I have no systematic method of finding them. I don’t subscribe to any notifications or table of contents updates. Nor, to be honest, am I in the habit of regularly combing the tables of contents of any journals.

Blogs – I don’t follow any in a timely fashion, although I do check in with TetZoo, Laelaps, and a couple of others every month or two. Way back when we started SV-POW!, we made a command decision not to list any sites other than our own on the sideboard. At the time, that was because we didn’t want to have any hurt feelings or drama over who we did and didn’t include. But over time, a strong secondary motive to keep things this way is that we’re not forced to keep up with the whole paleo blogosphere, which long ago outstripped my capacity to even competently survey. Fortunately, those overachievers at Love in the Time of Chasmosaurs have a pretty exhaustive-looking set of links on their sidebar, so globally speaking, someone is already on that.

The contraction in my blog reading is a fairly recent thing. When TetZoo was on ScienceBlogs, I was over there all the time, and there were probably half a dozen SciBlogs that I followed pretty regularly and another dozen or so that I at least kept tabs on. But ScienceBlogs burned down the community I was interested in, and the Scientific American Blog Network is sufficiently ugly (in the UI sense) and reader-unfriendly to not be worth my dealing with it. So I am currently between blog networks–or maybe past my last one.

Social Media – I’m not on Twitter, and I tend to only log into Facebook when I get an interesting notice in my Gmail “Social” folder. Sometimes I’m not on FB for a week or two at a time. So I miss a lot of stuff that goes down there, including notices about new papers. I could probably fix that if I just followed Andy Farke more religiously.

What ends up happening – I mainly find papers relevant to specific projects as I execute those projects; each new project is a new front in my n-dimensional invasion of the literature. My concern is that in doing this, I tend to find the papers that I’m looking for, whereas the papers that have had the most transformative effect on me are the ones I was not looking for at the time.

Beyond that, I find out about new papers because the authors take it on themselves to include me when they email the PDF out to a list of potentially interested colleagues (and many thanks to all of you who are doing that!), or Mike, Darren, or Andy send it to me, or it turns up in the updates to my Google Scholar profile.

So far, this combination of ad hoc and half-assed methods seems to be working, although it does mean that I have unfairly outsourced much of my paper discovery to other people without doing much for them in return. When I say that it’s working, I mean that I don’t get review comments pointing out that I have missed important recent papers. I do get review comments saying that I need to cite more stuff,* but these tend to be papers that I already know of and maybe even cited already, just not in the right ways to satisfy the reviewers.**

* There is a sort of an arrow-of-inevitability thing here, in that reviewers almost always ask you to cite more papers rather than fewer. Only once ever have I been asked to cite fewer sources, and that is when I had submitted my dinosaur nerve paper (Wedel 2012) to a certain nameless anatomy journal that ended up not publishing it. One of the reviewers said that I had cited several textbooks and popular science books and that was poor practice, I should have cited primary literature. Apparently this subgenius did not realize that I was citing all of those popular sources as examples of publications that held up the recurrent laryngeal nerve of giraffes as evidence for evolution, which was part of the point that I was making: giraffe RLNs are overrated.

** My usual sin is that I mentally categorize papers in one or two holes and forget that a given paper also mentioned C and D in addition to saying a lot about A and B. It’s something that vexes me about some of my own papers. I put so much stuff into the second Sauroposeidon paper (Wedel et al. 2000b) that some it has never been cited–although that paper has been cited plenty, it often does not come up in discussions where some of the data presented therein is relevant, I think because there’s just too much stuff in that paper for anyone (who cares about that paper less than I do) to hold in their heads. But that’s a problem to be explored in another post.

The arborization of science

Part of the problem with keeping up with the literature is just that there is so much more of it than there was even a few years ago. When I first got interested in sauropod pneumaticity back in the late 90s, you were pretty much up to speed if you’d read about half a dozen papers:

  • Seeley (1870), who first described pneumaticity in sauropods as such, even if he didn’t know what sauropods were yet;
  • Longman (1933), who first realized that sauropod vertebrae could be sorted into two bins based on their internal structures, which are crudely I-beam-shaped or honeycombed;
  • Janensch (1947), who wrote the first ever paper that was primarily about pneumaticity in dinosaurs;
  • Britt (1993), who first CTed dinosaur bones looking for pneumaticity, independently rediscovered Longman’s two categories, calling them ‘camerate’ and ‘camellate’ respectively, and generally put the whole investigation of dinosaur pneumaticity on its modern footing;
  • Witmer (1997), who provided what I think is the first compelling explanation of how and why skeletal pneumaticity works the way it does, using a vast amount of evidence culled from both living and fossil systems;
  • Wilson (1999), who IIRC was the first to seriously discuss the interplay of pneumaticity and biomechanics in determining the form of sauropod vertebrae.

Yeah, there you go: up until the year 2000, you could learn pretty much everything important that had been published on pneumaticity in dinosaurs by reading five papers and one dissertation. “Dinosaur pneumaticity” wasn’t a field yet. It feels like it is becoming one now. To get up to speed today, in addition to the above you’d need to read big swaths of the work of Roger Benson, Richard Butler, Leon Claessens, Pat O’Connor (including a growing body of work by his students), Emma Schachner (not on pneumaticity per se, but too closely related [and too awesome] to ignore), Daniela Schwarz, and Jeff Wilson (and his students), plus important singleton papers like Woodward and Lehman (2009), Cerda et al. (2012), Yates et al. (2012), and Fanti et al. (2013). Not to mention my own work, and some of Mike’s and Darren’s. And Andy Farke and the rest of Witmer, if you’re into cranial pneumaticity. And still others if you care about pneumaticity in pterosaurs, which you should if you want to understand how–and, crucially, when–the anatomical underpinnings of ornithodiran pneumaticity evolved. Plus undoubtedly some I’ve forgotten–apologies in advance to the slighted, please prod me in the comments.

You see? If I actually listed all of the relevant papers by just the authors I named above, it would probably run to 50 or so papers. So someone trying to really come to grips with dinosaur pneumaticity now faces a task roughly equal to the one I faced in 1996 when I was first trying to grokk sauropods. This is dim memory combined with lots of guesswork and handwaving, but I probably had to read about 50 papers on sauropods before I felt like I really knew the group. Heck, I read about a dozen on blood pressure alone.

(Note to self: this is probably a good argument for writing a review paper on dinosaur pneumaticity, possibly in collaboration with some of the folks mentioned above–sort of a McIntosh [1990] for the next generation.)

When I wrote the first draft of this post, I was casting about for a word to describe what is going on in science, and the first one that came to mind is “fragmentation”. But that’s not the right word–science isn’t getting more fragmented. If anything, it’s getting more interconnected. What it’s really doing is arborizing–branching fractally, like the blood vessels in the image at the top of this post. I think it’s pointless to opine about whether this is a good or bad thing. Like the existence of black holes and fuzzy ornithischians, it’s just a fact now, and we’d better get on with trying to make progress in this new reality.

How do I feel about all this, now that my little capillary of science has grown into an arteriole and threatens to become a full-blown artery? It is simultaneously exhilarating and worrying. Exhilarating because lots of people are discovering lots of cool stuff about my favorite system, and I have a lot more people to bounce ideas around with than I did when I started. Worrying because I feel like I am gradually losing my ability to keep tabs on the whole thing. Sound familiar?

Conclusion: Help a brother out

Having admitted all of this, it seems imperative that I get my act together and establish some kind of systematic new-paper-discovery method, beyond just sponging off my friends and hoping that they’ll continue to deliver everything I need. But it seems inevitable that I am either going to have to be come more selective about what I consume–which sounds both stupid and depressing–or lose all of my time just trying to keep up with things.

Hi, I’m Matt. I just arrived here in Toomuchnewscienceistan. How do you find your way around?

References

As recently noted, it was my pleasure and privilege on 25 June to give a talk at the ESOF2014 conference in Copenhagen (the EuroScience Open Forum). My talk was one of four, followed by a panel discussion, in a session on the subject “Should science always be open?“.

Banner

I had just ten minutes to lay out the background and the problem, so it was perhaps a bit rushed. But you can judge for yourself, because the whole session was recorded on video. The image is not the greatest (it’s hard to make out the slides) and the audio is also not all it could be (the crowd noise is rather loud). But it’s not too bad, and I’ve embedded it below. (I hope the conference organisers will eventually put out a better version, cleaned up by video professionals.)

Subbiah Arunachalam (from Arun, Chennai, India) asked me whether the full text of the talk was available — the echoey audio is difficult for non-native English speakers. It wasn’t but I’ve sinced typed out a transcript of what I said (editing only to remove “er”s and “um”s), and that is below. Finally, you may wish to follow the slides rather than the video: if so, they’re available in PowerPoint format and as a PDF.

Enjoy!

It’s very gracious of you all to hold this conference in English; I deeply appreciate it.

“Should science always be open?” is our question, and I’d like to open with one of the greatest scientists there’s ever been, Isaac Newton, who humility didn’t come naturally to. But he did manage to say this brilliant humble thing: “If I have seen further, it’s by standing on the shoulders of giants.”

And the reason I love this quote is not just because it’s insightful in itself, but because he stole it from something John of Salisbury said right back in 1159. “Bernard of Chartres used to say that we were like dwarfs seated on the shoulders of giants. If we see more and further than they, it is not due to our own clear eyes or tall bodies, but because we are raised on high and upborne by their gigantic bigness.”

Well, so Newton — I say he stole this quote, but of course he did more than that: he improved it. The original is long-winded, it goes around the houses. But Newton took that, and from that he made something better and more memorable. So in doing that, he was in fact standing on the shoulders of giants, and seeing further.

And this is consistently where progress comes from. It’s very rare that someone who’s locked in a room on his own thinking about something will have great insights. It’s always about free exchange of ideas. And we see this happening in lots of different fields.

Over the last ten or fifteen years, enormous advances in the kinds of things computers working in networks can do. And that’s come from the culture of openness in APIs and protocols, in Silicon Valley and elsewhere, where these things are designed.

Going back further and in a completely different field, the Impressionist painters of Paris lived in a community where they were constantly — not exactly working together, but certainly nicking each other’s ideas, improving each other’s techniques, feeding back into this developing sense of what could be done. Resulting in this fantastic art.

And looking back yet further, Florence in the Renaissance was a seat of all sorts of advances in the arts and the sciences. And again, because of this culture of many minds working together, and yielding insights and creativity that would not have been possible with any one of them alone.

And this is because of network effects; or Metcalfe’s Law expresses this by saying that the value of a network is proportional to the square of the number of nodes in that network. So in terms of scientific reasearch, what that means is that if you have a corpus of published research output, of papers, then the value of that goes — it doesn’t just increase with the number of papers, it goes up with the square of the number of papers. Because the value isn’t so much in the individual bits of research, but in the connections between them. That’s where great ideas come from. One researcher will read one paper from here and one from here, and see where the connection or the contradiction is; and from that comes the new idea.

So it’s very important to increase the size of the network of what’s available. And that’s why we have a very natural tendency, I think among scientists particularly, but I think we can say researchers in other areas as well, have a natural tendency to share.

Now until recently, the big difficulty we’ve had with sharing has been logistical. It was just difficult to make and distribute copies of pieces of research. So this [picture of a printing press] is how we made copies, this [picture of stacks of paper] was what we stored them on, and this was how we transmitted them from one researcher to another.

And they were not the most efficient means, or at least not as efficient as what we now have available. And because of that, and because of the importance of communication and the links between research, I would argue that maybe the most important invention of the last hundred years is the Internet in general and the World Wide Web in particular. And the purpose of the Web, as it was initially articulated in the first public post that Tim Berners-Lee made in 1991 — he explained not just what the Web was but what it was for, and he said: “The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.”

So that’s what the Web is for; and here’s why it’s important. I’m quoting here from Cameron Neylon, who’s great at this kind of thing. And again it comes down to connections, and I’m just going to read out loud from his blog: “Like all developments of new communication networks, SMS, fixed telephones, the telegraph, the railways, and writing itself, the internet doesn’t just change how well we can do things, it qualitatively changes what we can do.” And then later on in the same post: “At network scale the system ensures that resources get used in unexpected ways. At scale you can have serendipity by design, not by blind luck.”

Now that’s a paradox; it’s almost a contradiction, isn’t it? Serendipity by definition is what you get by blind luck. But the point is, when you have enough connections — enough papers floating around the same open ecosystem — all the collisions happening between them, it’s inevitable that you’re going to get interesting things coming out. And that’s what we’re aiming towards.

And of course it’s never been more important, with health crises, new diseases, the diminishing effectiveness of antibiotics, the difficulties of feeding a world of many billions of people, and the results of climate change. It’s not as though we’re short of significant problems to deal with.

So I love this Jon Foley quote. He said, “Your job” — as a researcher — “Your job is not to get tenure! Your job is to change the world”. Tenure is a means to an end, it’s not what you’re there for.

So this is the importance of publishing. Of course the word “publish” comes from the same root as the word “public”: to publish a piece of research means to make that piece of research public. And the purpose of publishing is to open research up to the world, and so open up the world itself.

And that’s why it’s so tragic when we run into this [picture of a paywalled paper]. I think we’ve all seen this at various times. You go to read a piece of research that’s valuable, that’s relevant to either the research you’re doing, or the job you’re doing in your company, or whatever it might be. And you run into this paywall. Thirty five dollars and 95 cents to read this paper. It’s a disaster. Because what’s happened is we’ve got a whole industry whose existence is to make things public, and who because of accidents of history have found themselves doing the exact opposite. Now no-one goes into publishing with the intent of doing this. But this is the unfortunate outcome.

So what we end up with is a situation where we’re re-imposing on the research community barriers that were necessarily imposed by the inadequate technology of 20 or 30 years ago, but which we’ve now transcended in technological terms but we’re still strugging with for, frankly, commercial reasons. This is why we’re struggling with this.

And I don’t like to be critical, but I think we have to just face the fact that there is a real problem when organisations, for many years have been making extremely high profits — these [36%, 32%, 34%, 42%] are the profit margins of the “big four” academic publishers which together hugely dominate the scholarly publishing market — and as you can see they’re in the range 32% to 42% of revenue, is sheer profit. So every time your university library spends a dollar on subscriptions, 40% of that goes straight out of the system to nowhere.

And it’s not surprising that these companies are hanging on desperately to the business model that allows them to do that.

Now the problem we have in advocating for open access is that when we stand against publishers who have an existing very profitable business model, they can complain to governments and say, “Look, we have a market that’s economically significant, it’s worth somewhere in the region of 10-15 billion US dollars a year.” And they will say to governments, “You shouldn’t do anything that might damage this.” And that sounds effective. And we struggle to argue against that because we’re talking about an opportunity cost, which is so much harder to measure.

You know, I can stand here — as I have done — and wave my hands around, and talk about innovation and opportunity, and networks and connections, but it’s very hard to quantify in a way that can be persuasive to people in a numeric way. Say, they have a 15 billion dollar business, we’re talking about saving three trillion’s worth of economic value (and I pulled that number out of thin air). So I would love, if we can, when we get to the discussions, to brainstorm some way to quantify the opportunity cost of not being open. But this is what it looks like [picture of flooding due to climate change]. Economically I don’t know what it’s worth. But in terms of the world we live in, it’s just essential.

So we’ve got to remember the mission that we’re on. We’re not just trying to save costs by going to open access publishing. We’re trying to transform what research is, and what it’s for.

So should science always be open? Of course, the name of the session should have been “Of course science should always be open”.

 

Today, available for the first time, you can read my 2004 paper A survey of dinosaur diversity by clade, age, place of discovery and year of description. It’s freely available (CC By 4.0) as a PeerJ Preprint. It’s one of those papers that does exactly what it says on the tin — you should be able to find some interesting patterns in the diversity of your own favourite dinosaur group.

Fig. 1. Breakdown of dinosaur diversity by phylogeny. The number of genera included in each clade is indicated in parentheses. Non-terminal clades additionally have, in square brackets, the number of included genera that are not also included in one of the figured subclades. For example, there are 63 theropods that are neither carnosaurs nor coelurosaurs. The thickness of the lines is proportional to the number of genera in the clades they represent.

Taylor (2014 for 2004), Figure 1. Breakdown of dinosaur diversity by phylogeny. The number of genera included in each clade is indicated in parentheses. Non-terminal clades additionally have, in square brackets, the number of included genera that are not also included in one of the figured subclades. For example, there are 63 theropods that are neither carnosaurs nor coelurosaurs. The thickness of the lines is proportional to the number of genera in the clades they represent.

“But Mike”, you say, “you wrote this thing ten years ago?”

Yes. It’s actually the first scientific paper I ever wrote (bar some scraps of computer science) beginning in 2003. It’s so old that all the illustrations are grey-scale. I submitted it to Acta Palaeontologica Polonica way back on on 24 October 2004 (three double-spaced hard-copies in the post!) , but it was rejected without review. I was subsequently able to publish a greatly truncated version (Taylor 2006) in the proceedings of the 2006 Symposium on Mesozoic Terrestrial Ecosystems, but that was only one tenth the length of the full manuscript — much potentially valuable information was lost.

My finally posting this comes (as so many things seem to) from a conversation with Matt. Off work sick, he’d been amusing himself by re-reading old SV-POW! posts (yes, we do this). He was struck by my exhortation in Tutorial 14: “do not ever give a conference talk without immediately transcribing your slides into a manuscript”. He bemoaned how bad he’s been at following that advice, and I had to admit I’ve done no better, listing a sequence of old my SVPCA talks that have still never been published as papers.

The oldest of these was my 2004 presentation on dinosaur diversity. Commenting on this, I wrote in email: “OK, I got the MTE four-pager out of this, but the talk was distilled from a 40ish-page manuscript that was never published and never will be.” Quick as a flash, Matt replied:

If I had written this and sent it to you, you’d tell me to put it online and blog about how I went from idea to long paper to talk to short paper, to illuminate the process of science.

And of course he was right — hence this preprint.

Fig. 2. Breakdown of dinosaurian diversity by high-level taxa. "Other sauropodomorphs" are the "prosauropods" sensu lato. "Other theropods" include coelophysoids, neoceratosaurs, torvosaurs (= megalosaurs) and spinosaurs. "Other ornithischians" are basal forms, including heterodontosaurs and those that fall into Marginocephalia or Thyreophora but not into a figured subclade.

Taylor (2014 for 2004), Figure 2. Breakdown of dinosaurian diversity by high-level taxa. “Other sauropodomorphs” are the “prosauropods” sensu lato. “Other theropods” include coelophysoids, neoceratosaurs, torvosaurs (= megalosaurs) and spinosaurs. “Other ornithischians” are basal forms, including heterodontosaurs and those that fall into Marginocephalia or Thyreophora but not into a figured subclade.

I will never update this manuscript, as it’s based on a now wildly outdated database and I have too much else happening. (For one thing, I really ought to get around to finishing up the paper based on my 2005 SVPCA talk!) So in a sense it’s odd to call it a “pre-print” — it’s not pre anything.

Despite the data being well out of date, this manuscript still contains much that is (I think) of interest, and my sense is that the ratios of taxon counts, if not the absolute numbers, are still pretty accurate.

I don’t expect ever to submit a version of this to a journal, so this can be considered the final and definitive version.

References

 

Follow

Get every new post delivered to your Inbox.

Join 400 other followers