Recently, I published an old manuscript of mine as a PeerJ Preprint.

I wrote this paper in 2003-4, and it was rejected without review when I submitted it back then. (For, I think, specious reasons, but that’s a whole nother discussion. Forget I mentioned it.)

I haven’t touched the manuscript since then (except to single-space it for submission as a preprint). It’s ten years old. That’s a problem because it’s an analysis of a database of dinosaur diversity, and as everyone knows, the rate of recognising new dinosaurs has gone through the roof. That’s the reason I never made any attempt to update and resubmit it: dinosaur diversity is a fast-moving target, and each time through the submit-reject cycle takes long enough for the data to be outdated.

So much for the history. Now the question: how should I cite this paper? Specifically, what date should I give it? If I cite it as from 2004, it will give the misleading impression that the paper has been available for ten years; but if I cite it as from 2014, it will imply that it’s been worked on at some point in the last ten years. Both approaches seem misleading to me.

At the moment, I am citing it as “Taylor (2014 for 2004)”, which seems to more or less capture what’s meant, but I don’t know whether it’s an established convention. Is there an established convention?

Releated: where in mv publications list should it appear? At present I am sorting it under 2014, since that’s when it came out; but should it be under  2004, when it was written? I guess publication date is the one to go far — after all, it’s not unusual even now for papers to spend a year or more in press, and it’s the later (publication) date that’s cited.

Help me out. How should this be done?

References

I think it’s fair to say that this “bifurcation heat-map”, from Wedel and Taylor (2013a: figure 9), has been one of the best-received illustrations that we’ve prepared:

Wedel and Taylor 2013 bifurcation Figure 9 - bifurcatogram

(See comments from Jaime and from Mark Robinson.)

Back when the paper came out, Matt rashly said “Stand by for a post by Mike explaining how it came it be” — a post which has not materialised. Until now!

This illustration was (apart from some minor tweaking) produced by a program that I wrote for that purpose, snappily named “vcd2svg“. That name is because it converts a vertebral column description (VCD) into a scalable vector graphics (SVG) file, which you can look at with a web-browser or load into an image editor for further processing.

The vertebral column description is in a format designed for this purpose, and I think it’s fairly intuitive. Here, for example, is the fragment describing the first three lines of the figure above:

Taxon: Apatosaurus louisae
Specimen: CM 3018
Data: —–YVVVVVVVVV|VVVuuunnn-

Taxon: Apatosaurus parvus
Specimen: UWGM 155556/CM 563
Data: –nnn-VVV—V-V|VVVu——

Taxon: Apatosaurus ajax
Specimen: NMST-PV 20375
Data: –n–VVVVVVVVVV|VVVVYunnnn

Basically, you draw little ASCII pictures of the vertebral column. Other directives in the file explain how to draw the various glyphs represented by (in this case) “Y”, “V”, “u”, and “n”.

It’s pretty flexible. We used the same program to generate the right-hand side (though not the phylogenetic tree) of Wedel and Taylor (2013b: figure 2):

Wedel and Taylor (2013b: Figure 2).

Wedel and Taylor (2013b: Figure 2).

The reason I mention this is because I released the software today under the GNU General Public Licence v3.0, which is kind of like CC By-SA. It’s free for anyone to download, use, modify and redistribute either verbatim or in modified form, subject only to attribution and the requirement that the same licence be used for modified versions.

vcd2svg is written in Perl, and implemented in part by the SVG::VCD module, which is included in the package. It’s available as a CPAN module and on GitHub. There’s documentation of the command-line vcd2svg program, and of the VCD file format. Also included in the distribution are two documented examples: the bifurcation heat-map and the caudal pneumaticity diagram.

Folks, please use it! And feel free to contribute, too: as the change-log notes, there’s work still to be done, and I’ll be happy to take pull requests from those of you who are programmers. And whether you’re a programmer or not, if you find a bug, or want a new feature, feel free to file an issue.

A final thought: in academia, you don’t really get credit for writing software. So to convert the work that went into this release into some kind of coin, I’ll probably have to write a short paper describing it, and let that stand as a proxy for the actual program. Hopefully people will cite that paper when they generate a figure using the software, the way we all reflexively cite Swofford every time we use PAUP*.

Update (12 April 2014)

On Vertebrat’s suggestion, I have renamed the program VertFigure.

References

It’s now widely understood among researchers that the impact factor (IF) is a statistically illiterate measure of the quality of a paper. Unfortunately, it’s not yet universally understood among administrators, who in many places continue to judge authors on the impact factors of the journals they publish in. They presumably do this on the assumption that impact factor is a proxy for, or predictor of, citation count, which is turn is assumed to correlate with influence.

As shown by Lozano et al. (2012), the correlation between IF and citations is in fact very weak — r2 is about 0.2 — and has been progressively weakening since the dawn of the Internet era and the consequent decoupling of papers from the physical journal that they appear in. This is a counter-intuitive finding: given that the impact factor is calculated from citation counts you’d expect it to correlate much more strongly. But the enormous skew of citation rates towards a few big winners renders the average used by the IF meaningless.

To bring this home, I plotted my own personal impact-factor/citation-count graph. I used Google Scholar’s citation counts of my articles, which recognises 17 of my papers; then I looked up the impact factors of the venues they appeared in, plotted citation count against impact factor, and calculated a best-fit line through my data-points. Here’s the result (taken from a slide in my Berlin 11 satellite conference talk):

berlin11-satellite-taylor-what-we-can-do--impact-factor-graph

I was delighted to see that the regression slope is actually negative: in my case at least, the higher the impact factor of the venue I publish in, the fewer citations I get.

There are a few things worth unpacking on that graph.

First, note the proud cluster on the left margin: publications in venues with impact factor zero (i.e. no impact factor at all). These include papers in new journals like PeerJ, in perfectly respectable established journals like PaleoBios, edited-volume chapters, papers in conference proceedings, and an arXiv preprint.

My most-cited paper, by some distance, is Head and neck posture in sauropod dinosaurs inferred from extant animals (Taylor et al. 2009, a collaboration between all three SV-POW!sketeers). That appeared in Acta Palaeontologia Polonica, a very well-respected journal in the palaeontology community but which has a modest impact factor of 1.58.

My next most-cited paper, the Brachiosaurus revision (Taylor 2009), is in the Journal of Vertebrate Palaeontology — unquestionably the flagship journal of our discipline, despite its also unspectacular impact factor of 2.21. (For what it’s worth, I seem to recall it was about half that when my paper came out.)

In fact, none of my publications have appeared in venues with an impact factor greater than 2.21, with one trifling exception. That is what Andy Farke, Matt and I ironically refer to as our Nature monograph (Farke et al. 2009). It’s a 250-word letter to the editor on the subject of the Open Dinosaur Project. (It’ a subject that we now find profoundly embarrassing given how dreadfully slowly the project has progressed.)

Google Scholar says that our Nature note has been cited just once. But the truth is even better: that one citation is in fact from an in-prep manuscript that Google has dug up prematurely — one that we ourselves put on Google Docs, as part of the slooow progress of the Open Dinosaur Project. Remove that, and our Nature note has been cited exactly zero times. I am very proud of that record, and will try to preserve it by persuading Andy and Matt to remove the citation from the in-prep paper before we submit. (And please, folks: don’t spoil my record by citing it in your own work!)

What does all this mean? Admittedly, not much. It’s anecdote rather than data, and I’m posting it more because it amuses me than because it’s particularly persuasive. In fact if you remove the anomalous data point that is our Nature monograph, the slope becomes positive — although it’s basically meaningless, given that all my publications cluster in the 0–2.21 range. But then that’s the point: pretty much any data based on impact factors is meaningless.

References

 

As things stand there are two principal types of written communication in science: papers and blog posts. We’ve discussed the relative merits of formally published papers and more informal publications such as blog-posts a couple of times, but perhaps never really dug into what the differences are between them.

Matt and I have been discussing this offline, and at one point Matt suggested that authorial intent is one of the key differences. When we write and submit a paper, we are sending a different message from when we post on a blog.

That’s true — at least in general, although there are edge-cases such as the formal research paper that Zen Faulkes recently posted as an entry on his blog. But even when it’s true, I’m not sure it’s relevant. As Matt pointed out, authorial intent ceases to be a factor once something is published. The audience will read it how they like and do with it what they want. So I think we need to consider the paper-vs.-blog-post question in terms of the artifact itself, and discount what the author intended.

When we do that, what differences do we see? Generalising, we find that:

  • Papers are PDF while blog-posts are HTML. (That’s not quite a trivial distinction: PDFs have less clutter.)
  • Blog-posts allow and invite comments, but papers do not.
  • Blog-posts are part of an ongoing discussion whereas papers are stand-alone.
  • Papers are archived on publisher sites, whereas blog-posts are on blogs, which may be more vulnerable or ephemeral.
  • Papers are immutable once published, whereas blog-posts can be edited after initial publication
  • Papers are peer-reviewed, while blog-posts are not.
  • Blog-posts are fast, but papers are slow.

Which of these are important? Which count as wins for papers and which as wins for blog-posts? Which of them are tied together with each other? Which are fundamentally properties of the medium, and which are associated with it only by tradition?

Comments, please!

Posting palaeo papers on arXiv

September 28, 2012

Over on Facebook, where Darren posted a note about our new paper, most of the discussion has not been about its content but about where it was published. We’re not too surprised by that, even though we’d love to be talking about the science. We did choose arXiv with our eyes open, knowing that there’s no tradition of palaeontology being published there, and wanting to start a new tradition of palaeontology being routinely published there. Having now made the step for the first time, I see no reason ever to not post a paper on arXiv, as soon as it’s ready, before — or maybe even instead of — submitting it to a journal.

(Instead of? Maybe. We’ll discuss that below.)

The key issue is this: science isn’t really science until it’s out there where it can be used. We wrote the bulk of the neck-anatomy paper back in 2008 — the year that we first submitted it to a journal. In the four years since then, all the observations and deductions that it contains have been unavailable to the world. And that is stupid. The work might just as well never have been done. Now that it’s on arXiv, that’s over. I was delighted to get an email less than 24 hours after the paper was published, from an author working on a related issue, thanking us for posting the paper, saying that he will now revise his own in-prep manucript in light of its findings, and cite our paper. Which of course is the whole point: to get our science out there where it can do some damage.

Because the alternative is horrible, really. Horribly wasteful, horribly dispiriting, horribly retarding for science. For example, a couple of weeks ago in his SVPCA talk, David Norman was lamenting again that he never got around to publishing the iguanodont systematic work that was in his dissertation, I-don’t-know-how-many-years-ago. The result of that interminable delay is that others have done other, conflicting iguanodont systematic work, and Norman is now trying belatedly to undo that and bring his own perspective. A terrible an unnecessary slowing of ornithopod science, and a waste of duplicated effort. (Thankfully it’s only ornithopods.)

And of course David Norman is very far from being alone. Pretty much any palaeontologist you talk to will tell you of a handful of papers — many more in some cases — that were finished many years previously but have never seen the light of day. (I still have a couple myself, but there is no point in resurrecting them now because progress has overtaken them.) I wonder what proportion of all Ph.D work ever sees the light of day? Half? Less? It’s crazy.

Figure 8. Sauropod cervical vertebrae showing anteriorly and posteriorly directed spurs projecting from neurapophyses. 1, cervical 5 of Sauroposeidon holotype OMNH 53062 in right lateral view, photograph by MJW. 2, cervical 9 of Mamenchisaurus hochuanensis holotype CCG V 20401 in left lateral view, reversed, from photograph by MPT. 3, cervical 7 or 8 of Omeisaurus junghsiensisYoung, 1939 holotype in right lateral view, after Young (1939, figure 2). (No specimen number was assigned to this material, which has since been lost. D. W. E. Hone personal communication, 2008.)

Publish now, publish later

So, please folks: we all need to be posting our work on preprint servers as soon as we consider it finished. It doesn’t mean that the posted versions can’t subsequently be obsoleted by improved versions that have gone through peer-review and been published in conventional journals. But it does mean that the world can know about the work, and build on it, and get the benefit of it, as soon as it’s done.

You see, we have a very fundamental problem in academia: publishing fulfils two completely separate roles. Its primary role (or at least the role that should be primary) is to make work available to the community; the secondary role is to provide a means of keeping score — something that can be used when making decisions about who to appoint to jobs, when to promote, who gets grants, who gets tenure and so on. I am not going to argue that the latter shouldn’t happen at all — clearly a functioning community needs some way to infer the standing of its participants. But I do think it’s ridiculous when the bean-counting function of publication trumps the actual publication role of publication. Yet we’ve all been in a position where we have essentially complete work that could easily go on a blog, or in the PalAss newsletter, or in a minor journal, or somewhere — but we hang onto it because we want to get it into a Big Journal.

Let me say again that I do realise how unusual and privileged my own position is: that a lot of my colleagues do need to play the Publication Prestige game for career reasons (though it terrifies my how much time some colleagues waste squeezing their papers into two-and-a-half-page format in the futile hope of rolling three sixes on the Science ‘n’ Nature 3D6). Let’s admit right now that most palaeontologists do need to try to get their work into Proc B, or Paleobiology, or what have you. Fair enough. They should feel free. But the crucial point is this: that is no reason not to post pre-prints so we can all get on with actually benefitting from your work in the mean time.

Actually, I feel pretty stupid that it’s taken me this long to realise that all my work should go up on arXiv.

Figure 11. Archosaur cervical vertebrae in posterior view, Showing muscle attachment points in phylogenetic context. Blue arrows indicate epaxial muscles attaching to neural spines, red arrows indicate epaxial muscles attaching to epipophyses, and green arrows indicate hypaxial muscles attaching to cervical ribs. While hypaxial musculature anchors consistently on the cervical ribs, the principle epaxial muscle migrate from the neural spine in crocodilians to the epipophyses in non-avial theropods and modern birds, with either or both sets of muscles being significant in sauropods. 1, fifth cervical vertebra of Alligator mississippiensis, MCZ 81457, traced from 3D scans by Leon Claessens, courtesy of MCZ. Epipophyses are absent. 2, eighth cervical vertebra ofGiraffatitan brancai paralectotype HMN SII, traced from Janensch (1950, figures 43 and 46). 3, eleventh cervical vertebra of Camarasaurus supremus, reconstruction within AMNH 5761/X, “cervical series I”, modified from Osborn and Mook (1921, plate LXVII). 4, fifth cervical vertebra of the abelisaurid theropod Majungasaurus crenatissimus,UA 8678, traced from O’Connor (2007, figures 8 and 20). 5, seventh cervical vertebra of a turkey, Meleagris gallopavo, traced from photographs by MPT.

Exceptions?

So are there any special cases? Any kinds of papers that we should keep dry until they make it into actual journals? I can think of two classes that you could argue for — one of them convincingly, the other not.

First, the unconvincing one. When I discussed this with Matt (and half the fun of doing that is that usually neither of us really knows what we think about this stuff until we’re done arguing it through), he suggested to me that we couldn’t have put the Brontomerus paper on arXiv, because that would have leaked the name, creating a nomen nudum. My initial reaction was to agree with him that this is an exception. But when I thought about it a bit more, I realised there’s actually no compelling reason not to post such a paper on arXiv. So you create a nomen nudum? So what? Really: what is the negative consequence of that? I can’t think of one. OK, the name will appear on Wikipedia and mailing lists before the ICZN recognises it — but who does that hurt? No-one that I can think of. The only real argument against posting is that it could invite scooping. But is that a real threat? I doubt it. I can’t think of anyone who would be barefaced enough to scoop a taxon that had already been published on arXiv — and if they did, the whole world would know unambiguously exactly what had happened.

So what is the one real reason not to post a preprint? I think that might be a legitimate choice when publicity needs to be co-ordinated. So while nomenclatural issues should not have stopped us from arXiving the Brontomerus paper, publicity should. In preparation for that paper’s publication day, we did a lot of careful work with the UCL publicity team: writing non-specialist summaries, press-releases and FAQs, soliciting and preparing illustrations and videos, circulating materials under embargo, and so on. In general, mainsteam media are only interested in a story if it’s news, and that means you need to make sure it’s new when they first hear about it. Posting the article in advance on a publicly accessible archive would mess that up, and probably damage the work’s coverage in the press, TV and radio.

Publication venues are a continuum

It’s become apparent to us only gradually that there’s really no clear cut-off where a paper becomes “properly published”. There’s a continuum that runs from least to most formal and exclusive:

SV-POW! — arXiv — PLOS ONE — JVP — Nature

1. On SV-POW!, we write what we want and publish it when we want. We can promise you that it won’t go away, but you only have our word for it. But some of what we write here is still science, and has been cited in papers published in more formal venues — though, as far as I know, only by Matt and me so far.

2. On arXiv, there is a bit more of a barrier to clear: you have to get an existing arXiv user to endorse your membership application, and each article you submit is given a cursory check by staff to ensure that it really is a piece of scientific research rather than a diary entry, movie review or spam. Once it’s posted, the paper is guaranteed to remain at the same URL, unchanged, so long as arXiv endures (and it’s supported by Cornell). Crucially, the maths, physics and computer science communities that use arXiv uncontroversially consider this degree of filtering and permanence sufficient to constitute a published, citeable source.

3. At PLOS ONE, your paper only gets published if it’s been through peer-review — but the reviewing criteria pertain only to scientific soundness and do not attempt to evaluate likely impact or importance.

4. At JVP and other conventional journals, your paper has to make it through a two-pronged peer-review process: it has to be judged both sound scientifically (as at PLOS ONE) and also sufficiently on-topic and important to merit appearing in the journal.

5. Finally, at Nature and Science, your paper has to be sound and be judged sexy — someone has to guess that it’s going to prove important and popular.

Where along this continuum does the formal scientific record begin? We could make a case that all of it counts, provided that measures are taken to make the SV-POW! posts permanent and immutable. (This can be done submitting them to WebCite or to a service such as Nature Precedings used to provide.) But whether or not you accept that, it seems clear that arXiv and upwards is permanent, scientific and citeable.

This raises an interesting question: do we actually need to go ahead and publish our neck-anatomy paper in a more conventional venue? I’m honestly not sure at the moment, and I’d be interested to hear arguments in either direction. In terms of the progress of science, probably not: our actual work is out there, now, for the world to use as it sees fit. But from a career perspective, it’s probably still worth our while to get it into a journal, just so it can sit more neatly on our publication lists and help Matt’s tenure case more. And yet I don’t honestly expect any eventual journal-published version to be better in any meaningful way than the one on arXiv. After all, it’s already benefitted from two rounds of peer-review, three if you count the comments of my dissertation examiners. More likely, a journal will be less useful, as we have to cut length, eliminate illustrations, and so on.

So it seems to me that we have a hard choice ahead of us now. Call that paper done and more onto making more science? Or spend more time and effort on re-publishing it in exchange for prestige? I really don’t know.

For what it’s worth, it seems that standard practice in maths, physics and computer science is to republish arXiv articles in journals. But there are some scientists who routinely do not do this, instead allowing the arXiv version to stand as the only version of record. Perhaps that is a route best left to tenured greybeards rather than bright young things like Matt.

Figure 5. Simplified myology of that sauropod neck, in left lateral view, based primarily on homology with birds, modified from Wedel and Sanders (2002, figure 2). Dashed arrows indicate muscle passing medially behind bone. A, B. Muscles inserting on the epipophyses, shown in red. C, D, E. Muscles inserting on the cervical ribs, shown in green. F, G. Muscles inserting on the neural spine, shown in blue. H. Muscles inserting on the ansa costotransversaria (“cervical rib loop”), shown in brown. Specifically: A. M. longus colli dorsalis. B. M. cervicalis ascendens. C. M. flexor colli lateralis. D. M. flexor colli medialis. E. M. longus colli ventralis. In birds, this muscle originates from the processes carotici, which are absent in the vertebrae of sauropods. F. Mm. intercristales. G. Mm. interspinales. H. Mm. intertransversarii. Vertebrae modified from Gilmore (1936, plate 24).

Citing papers in arXiv

Finally, a practicality: since it’ll likely be a year or more before any journal-published version of our neck-anatomy paper comes out, people wanting to use it in their own work will need to know how to cite a paper in arXiv. Standard procedure seems to be just to use authors, year, title and arXiv ID. But in a conventional-journal citation, I like the way that the page-range gives you a sense of how long the paper is. So I think it’s worth appending page-count to the citations. And while you’re at it, you may as well throw in the figure and table counts, too, yielding the version that we’ve been using:

  • Taylor, Michael P., and Mathew J. Wedel. 2012. Why sauropods had long necks; and why giraffes have short necks. arXiv:1209.5439. 39 pages, 11 figures, 3 tables.

Item 1: With his new piece at the Guardian,  “Persistent myths about open access scientific publishing”, Mike continues to be a thorn in the side of exploitative commercial publishers, who just can’t seem to keep their facts straight. This time Mike unravels some choice bits of nonsense that keep getting circulated about open access publishing: that OA publishing must necessarily cost as much as barrier-based publishing, that the peer review process is expensive for publishers, and that authors who can’t pay OA publication fees will be left out in the cold. It’s cleanly and compellingly argued–go read for yourself.

Item 2: The Yates et al. prosauropod pneumaticity paper is officially published in the latest issue of Acta Palaeontologica Polonica, and I have updated the citation and links accordingly. This may not seem like big news, in that the accepted manuscript has been available online for 13 months, and the final published version does not differ materially from that version other than being pretty. But it’s an opportunity to talk about something that we haven’t really addressed here before, which is the potential for prompt publication to accelerate research.

A bit of background: standard practice at APP is to post accepted manuscripts as soon as they’re, well, accepted, unless the authors ask otherwise (for example, because the paper contains taxonomic acts and the first public version needs to be the version of record). Not everyone likes this policy–I know Darren objects, and I’m sure there are others. The chief complaint is that it muddies the waters around when the paper is published. Is a paper published when a manuscript is posted to a preprint server like arXiv, or when the accepted manuscript is made freely available by a journal, or when the official, formatted version is published online, or when it arrives in printed hardcopy?

Now, this is an interesting question to ponder, but I think it’s only interesting from the standpoint of rules (e.g., codes governing nomenclature) and how we’re going to decide what counts. From the standpoint of moving science forward, the paper is published as soon as it is available for other researchers to use openly–i.e., not just to use in private in their own research, but also to cite. And since that’s the axis I care most about, I prefer to see accepted manuscripts made widely available as soon as possible, and I support APP’s policy. In the case of Yates et al. (2012), having the accepted manuscript online for the past year meant that it was available for Butler et al. (2012) to use, and cite, in their broad reassessment of pneumaticity in Triassic archosaurs. If our manuscript has not been published, that might not have been the case; Adam gave a talk on our project at the 2009 SVP in Bristol, but Butler et al. might have been loathe to cite an abstract, and some journals explicitly forbid it.

So I say bring it on. Let’s really accelerate research, by letting people see the content as early as possible. Making other researchers wait just so they can see a prettier version of the same information seems to me to be a triumph of style over science.

References

Here’s an excerpt from a Google chat conversation that Mike and I had last May. I’m posting it now as a break from the OA Wars, and because it’s annoying to have to keep track of stuff that we know about but haven’t talked about publicly.

Matt: Something occurred to me the other day, and I can’t remember whether I’ve discussed it with you or not. So sorry in advance if it’s a dupe.
Mike: np.
Matt: You had pointed out that a pers. comm. is a link that goes nowhere. Obviously one of the concerns with citing blog posts is their permanence.
Mike: True. The only REAL concern, in fact. And 4wiw, a concern just as valid for other web-based resources.
Matt: The failure mode of a blog citation is a pers. comm.
Mike: Oh, good point. It degrades gracefully, as we say in programming.
Matt: Yes, exactly. Citing a blog post is better than a pers.comm. while the post is up, and no worse if it goes away.

I’ll break in here and point out that the same is true for pers. obs., unpubl. data, in prep., and other citations that don’t point to resources available to the reader: IF there’s a relevant blog post (and there may not be), citing the blog post gives readers more info than one of those “link to nowhere” modes of citation, and no less info if the blog post ever goes away. Obviously there are times when you’d prefer to keep unpublished data and in prep work out of the public eye until you’re ready to deploy it. But for people doing true open notebook science, there is no need to ever cite “unpubl. data” because there’s no such thing. I wonder if that’s the shape of the future? Also, if you have a blog, there’s no need to ever do a pers. obs. citation. Just blog about it and then cite your blog. If an editor or reviewer gives you grief, point out that the alternative pers. obs. citation would have been objectively inferior to putting the information online and then citing it!

The  conversation continues:

Mike: I’ve had another thought on this.
Matt: Do tell!
Mike: At the moment, the article “How big was Amphicoelias fragillimus? I mean, really?” lives at https://svpow.wordpress.com/2010/02/19/how-big-was-amphicoelias-fragillimus-i-mean-really/ BUT if that web-page ever goes away, it’ll be because we’ve moved SV-POW! elsewhere. The article will still be out there, just in a different location. So citing blog posts by URL is a bit like citing the specific copy of The Dinosauria that’s on the shelf behind me, and which will go away if my house burns down. That citation doesn’t bother anyone because they know they can just look at another copy. But actually, I’ve many times found copies of web-pages I wanted, after they’ve gone away, just by googling the titles. So I think we should just encourage a lot of copying and mirroring and PDFing of pages and passing around copies of the PDFs and suchlike.
Matt: Yeah, that would be good.

This is an attempt to deal with the problem raised in the first part, which is the possible impermanence of web sources. DOIs and WebCitation and so on are other approaches to the same problem.

I think this is a big deal. Right now we–as in, humans, or at least the wired world–are going through a revolution wherein, to a first approximation, all of human knowledge is becoming available to anyone anywhere with a computer (or tablet, smart phone, etc.) and an internet connection. Things like SOPA and PIPA and RWA and paywalls and RIAA lawsuits against filesharing sites and Elsevier lawsuits against libraries are all attempts to either stop this revolution or put limits on it. I say ‘attempts’ because none of those specific instances look like they’re going to be successful. In fact, I don’t think there is way to stop it, except to withdraw from the wired world. And even then, if you’re passing information around on hardcopies, there’s no guarantee that someone won’t scan them and post all the information to the ‘net without your permission (e.g., WikiLeaks).

Okay, none of that was news for anyone who is alive and awake. But there’s more.

Coming along hand-in-hand with the access revolution is the permanence problem. Anything particularly entertaining, valuable or salacious will be copied and shared until it cannot possibly be suppressed (the Streisand effect). But what about stuff that is valuable to only a few, or only accessed rarely and by specialists? Say, a monograph from the 1920s on some obscure insect order. The disappearance of that information would be potentially crippling to the specialists who work on that order or on related clades. One answer is to just scan everything and make sure that copies are widely distributed; as Mike has pointed out, PDFs are not going away. The amount of scientific literature that has been produced in the last four or five centuries is finite; given how inexpensive storage is these days, I could probably buy enough external hard drives to store ALL of it in PDF form and still make rent next month (if it was all openly available, which it ain’t).

That will get us caught up to now. But if we’re worried about the permanence of blog posts and so on, we have a bigger problem, because unlike published literature few people are archiving blog posts (that we know of), and without backups somewhere the information really can be lost. And that’s what Mike was getting at in that conversation when he suggested PDFing valuable pages.

(Along those lines, I note that Blogger now has a feature where posts can have a PDF button at the bottom, and clicking the button saves a formatted version of the post as a PDF. That seems incredibly useful, and a lot better than the copy-and-paste-into-Word-and-then-save-as-PDF thing I’ve had to do for the times when I’ve wanted a permanent portable version of a WordPress post. Maybe WordPress has the same function and I just don’t know it; I’ll look around and if it doesn’t exist yet I’ll agitate for it to be added.)

At least for now, for the practical problem at hand, I can’t think of a better solution than PDFing useful pages and posts and passing copies around (which doesn’t mean that there isn’t a better solution). The point of the post is just that even in the absence of a better solution, or any solution at all, blog citations are better than pers. comms. at best, and precisely equivalent to pers. comms. at worst. So, IMHO, any individual or journal that accepts pers. comm. citations but not citations of blog posts is just being silly; consistency should dictate either accepting both, accepting neither, or, if you’re only going to accept one, accepting citations of blog posts, which are better unless and until they get deleted.

Finally, we shouldn’t lose track of the fact that this is yet another instance of “how do we deal with useful information that is not published [in the traditional sense]?”–or, in short, “what counts?” And the answer is, we don’t know yet. Both questions are symptoms of the ongoing collision between traditional forms of scientific communication with the realities of the newly wired world, in which everything is open, amateurs can have public, automatically archived high-level technical conversations about published work (that the authors probably can’t afford to ignore), and nobody knows what the landscape will look like in another decade.

I’ll give Mike the last word, in another quote from that Gchat conversation:

Mike: I know all this is just more riffing on What Counts?, but that theme is proving to be a profound and complex one. […] I truly don’t know (A) what WILL happen, (B) what SHOULD happen, or even (C) what I WANT to happen.

I don’t know either. But I have a feeling that we’re in the process of finding out.

I have a much less realised view of the digital future than Matt does, so I won’t be making a lot of predictions here.  But I do have some questions to ask, and — predictably — some whining to do.

What counts, what doesn’t, and why?

Assuming you have made some science (e.g. a description of fossil, a palaeobiological hypothesis supported by evidence, a taxonomic revision), there are plenty of different ways you can present it to the world.  I may have missed some, but here are the ones I’ve thought of, in roughly descending order of respectability/citability/prestige:

  • Peer-reviewed paper/book chapter
  • Unreviewed paper/book chapter
  • Peer-reviewed electronic-only paper
  • Published abstract (e.g. for SVP)
  • Conference talk
  • Conference poster
  • Dissertation
  • Online supplementary information
  • Blog post
  • Blog comment
  • Email to the DML (which is archived on the web)
  • Personal email
  • Chat over a beer

How many of these are Science?  Where is the line?  Is the line hard or fuzzy?  Why is it OK to cite SVP abstracts but not so much SVPCA abstracts?  And other such questions. I think a very good case can be made that dissertations — provided they are made available — are better sources than conference talks, posters and abstracts; and a pretty good case can be made that blog posts are (especially when webcitation’ed — see below).  Both dissertations and (good) blog posts have the advantage over talks and posters that they have a permanent existence, and over abstracts the simple fact that they are substantial: a 200-word abtract cannot, by its very nature, say anything much.

Zoological nomenclature

Unfortunately, for nomenclatural purposes, the ICZN’s Article 8 currently says that only publications on paper count, period, which counts out dissertations.  I say unfortunately because were it not for this rule, then at least part of Aetogate would never have happened: the ramifications of Bill Parker’s case would not have been so awful if the perfectly good description of Heliocanthus in his (2003) dissertation had been allowed priority over Lucas et al.’s (2006) rush-job which attached the name Rioarribasuchus to the same specimen. Happily, the ICZN is as we write this considering an amendment to recognise nomenclatural acts in electronic-only publications.  There has already been some published discussion of the pros and cons of this amendment, and the Commission is actively soliciting further comments, so those of you with strong feelings should put them in writing and send them to the Executive Secretary.  (I will certainly be doing so.)

Self-scooping

We all know that blog entries are Not Sufficiently Published to be citable, at least in most journals; but are they Too Published to let you re-use the same material?  When you submit to most journals, they ask you to formally state “this material has not previously been published” — is that true if we’ve blogged it?  I am guessing different editors would answer that differently. For what it’s worth, we’ve been reasonably careful up till now not to blog anything that we’re planning to make into a paper — which is why we were so mysteriously silent on the obviously important topic of sauropod neck posture during the first 19 months of SV-POW!.  We’ve not been 100% pure on this: for example, I have a paper on Brachiosaurus in press that mentions in passing the spinoparapophyseal laminae, absence of an infradiapophyseal laminae and perforate anterior centroparapophyseal laminae of the 8th dorsal vertebra of the Brachiosaurus brancai specimen HMN SII — the features that I have blogged here in detail, with illustrations that would certainly never have been given journal-space.  Since the relevant passage in my paper accounted for half a manuscript page (of a total of 75 pages), I’m assuming no-one’s bothered about that.  In a case like this, I guess the SV-POW! posts are best thought of as pre-emptive and unofficial online supplementary information.

Counts for what purpose?

We’ve already mentioned that dissertations, blog entries and suchlike don’t count for nomenclatural purposes.  Whether they count in the sense of being citable in published works is up for debate right now (and again, see below on webcitation).  It seems pretty clear that these forms of “grey publication” do count in establishing people’s reputations among their peers — dissertations are obviously important in this regard, and Darren’s ridiculously broad knowledge of tetrapods extant and extinct is near-universally recognised largely because of his blogging efforts (although you could argue — and Matt and I often have argued — that he might have been able to enhance his reputation even more if he’d taken some of that blogging time and invested it in formal publications). Conversely, it’s clear that blogs, however rigorous and scientific, count for squat when it comes to committees.  The world of dinosaur palaeontology is probably just as aware of Matt’s series of Aerosteon response articles here on SV-POW! as it would be if he’d put those together into a paper that was published in PLoS ONE; but when his tenure committee comes to count up the impact factors of the journals he’s published in, those articles will count for nothing.  One day that might change, but not while impact factors still exert their baleful influence.

Deciding what to blog and what to write up as a “proper paper”

Matt posted his response to the Aerosteon paper as a sequence of three blog entries even though he knew that what he had to say was substantial enough to make a paper.  Why throw away a potential publication that would look good on the CV?  Because he wanted to get it out there ASAP, and didn’t want to wait until all the media dust had settled.  So he fought people off when they pestered him to publish it as a paper.  He doesn’t really need to do it now, and he doesn’t really have time (especially since I keep badgering him about all the papers we’re supposed to be collaborating on).  If we were starving for publications, we could turn a lot of SV-POW! posts into LPUs — but we’re not starving.

Let me explain this by taking a digression though the economics of file-sharing and the way labels persistently — maybe deliberately — misunderstand them.  Let’s imagine for the sake of an example that a while back, I sent Matt the MP3s that make up Blue Oyster Cult’s awesome Fire Of Unknown Origin album.  Now anyone with their brain switched on can see that the net effect of this on his music-buying pattern would be positive: if he really liked Fire, there is a fair chance that he would then have gone and bought a BOC album or two, or three — just as I’ve been buying Dar Williams albums like crazy since someone slipped me MP3s of Mortal City.  The labels’ perception, however, is that instead I would have denied them a sale: that if I’d not sent the Fire of Unknown Origin MP3s, Matt would of course have bought his own legitimate copy, and so I’ve stiffed them out of $6.99 less whatever tiny slice they pass on to the artist.  The misunderstanding here is that they think — or would like to think, who knows if they really believe this themselves? — that people’s music consumption is limited by the time we have available to listen to music, and that one way or another we will obtain enough music to fulfil that need: for free if possible, but by paying for it if necessary.  But the truth is completely different: there would be zero chance of Matt’s ever buying any BOC album, since he’d never even heard of them (beyond Don’t Fear The Reaper, I guess) whereas in the hypothetical universe where I sent him the Fire MP3s, there is a non-zero chance.  And the labels’ failure to understand that is because of a wholly incorrect model of what factor limits music listening.

Digression ends.  Its relevance is this: in the same way, we are used to thinking that our ability to get papers published is limited by the number of publication-worthy ideas we have — so that every paper idea we “waste” on a blog entry is a net loss.  In truth, ideas are cheap, and our ability to get papers published is actually limited by our throughput — our ability to find time to actually write those ideas up with sufficient rigour, prepare high-resolution figures, format the manuscripts for journals, wait through the review period, deal with the reviews, revise, resubmit, handle editorial requests, and so on and on.  (That is especially true when the journal takes six months to come up with a rejection.) This is why Matt and I, like everyone else I know in palaeo who I’ve discussed this with, have huge stacks of POOP that we’ve not yet found time to convert into papers.  So when we spend a paper-worthy idea on a blog entry, we’re not wasting it: we’re putting it out there (in an admittedly inferior format) when otherwise it would never have made it out there at all. The remaining issue is whether the time we spend on blogging an idea would have been better spent on moving a paper further towards publication.  Maybe, sometimes.  But you have to stop and smell the roses every now and again.  So the real cost of SV-POW! for us is not the “waste” of paperable ideas, but the time we spend on writing it.  I am guessing that in the time I’ve put into SV-POW! so far, I could have got two more papers out — certainly one.  Has it been worth it?  I think so, but it’s not a no-brainer.  On the other hand, SV-POW! probably acts as a reader-funnel, so that when I do get a paper out, more people read it than otherwise would.  How big that effect is, I don’t know, and I can’t think of a way to measure it.

How to cite blog entries: WebCite

One of the great things about writing for SV-POW! is that you can learn some really useful stuff from the comments; and the most useful comment I’ve seen so far is the one in which Cameron Neylon pointed us at WebCite (http://webcitation.org/).  This is a superbly straightforward site that makes permanent archive copies of web-pages, and mirrors them around the world.  In doing so, it deals with the problems of web pages being vulnerable to disappearance and prone to change.  (In off-list emails with Matt, I had suggested that I might build something like this myself, as I am software engineer in my day job; I am delighted that these guys have done it properly instead.) So if you ever want to cite Matt’s second Aerosteon post in a journal, use the archive URL http://webcitation.org/5hPYTmWpW — and if you want to cite any other SV-POW! article, just submit its URL to WebCite yourself, and get back an archive URL which you can use. And tell all your friends about WebCite!

Oh, and by the way …

Here’s that photo of a monitor lizard getting its arse kicked by an elephant that you ordered:

Monitor lizard postcranium, aerial. Photograph by Hira Punjabi, downloaded from National Geographic.

Monitor lizard postcranium, aerial, strongly inclined. Photograph by Hira Punjabi, downloaded from National Geographic

References

  • Lucas, S. G., Hunt, A. P. and Spielmann, J. A. 2006. Rioarribasuchus, a new name for an aetosaur from the Upper Triassic of north-central New Mexico. New Mexico Museum of Natural History and Science, Bulletin 37: 581-582.
  • Parker, W. G. 2003a. Description of a new specimen of Desmatosuchus haplocerus from the Late Triassic of Northern Arizona. Unpublished MS thesis. Northern Arizona University, Flagstaff. 315 pp.