Blog citations are better than pers. comms.
February 6, 2012
Here’s an excerpt from a Google chat conversation that Mike and I had last May. I’m posting it now as a break from the OA Wars, and because it’s annoying to have to keep track of stuff that we know about but haven’t talked about publicly.
Matt: Something occurred to me the other day, and I can’t remember whether I’ve discussed it with you or not. So sorry in advance if it’s a dupe.
Matt: You had pointed out that a pers. comm. is a link that goes nowhere. Obviously one of the concerns with citing blog posts is their permanence.
Mike: True. The only REAL concern, in fact. And 4wiw, a concern just as valid for other web-based resources.
Matt: The failure mode of a blog citation is a pers. comm.
Mike: Oh, good point. It degrades gracefully, as we say in programming.
Matt: Yes, exactly. Citing a blog post is better than a pers.comm. while the post is up, and no worse if it goes away.
I’ll break in here and point out that the same is true for pers. obs., unpubl. data, in prep., and other citations that don’t point to resources available to the reader: IF there’s a relevant blog post (and there may not be), citing the blog post gives readers more info than one of those “link to nowhere” modes of citation, and no less info if the blog post ever goes away. Obviously there are times when you’d prefer to keep unpublished data and in prep work out of the public eye until you’re ready to deploy it. But for people doing true open notebook science, there is no need to ever cite “unpubl. data” because there’s no such thing. I wonder if that’s the shape of the future? Also, if you have a blog, there’s no need to ever do a pers. obs. citation. Just blog about it and then cite your blog. If an editor or reviewer gives you grief, point out that the alternative pers. obs. citation would have been objectively inferior to putting the information online and then citing it!
The conversation continues:
Mike: I’ve had another thought on this.
Matt: Do tell!
Mike: At the moment, the article “How big was Amphicoelias fragillimus? I mean, really?” lives at https://svpow.wordpress.com/2010/02/19/how-big-was-amphicoelias-fragillimus-i-mean-really/ BUT if that web-page ever goes away, it’ll be because we’ve moved SV-POW! elsewhere. The article will still be out there, just in a different location. So citing blog posts by URL is a bit like citing the specific copy of The Dinosauria that’s on the shelf behind me, and which will go away if my house burns down. That citation doesn’t bother anyone because they know they can just look at another copy. But actually, I’ve many times found copies of web-pages I wanted, after they’ve gone away, just by googling the titles. So I think we should just encourage a lot of copying and mirroring and PDFing of pages and passing around copies of the PDFs and suchlike.
Matt: Yeah, that would be good.
This is an attempt to deal with the problem raised in the first part, which is the possible impermanence of web sources. DOIs and WebCitation and so on are other approaches to the same problem.
I think this is a big deal. Right now we–as in, humans, or at least the wired world–are going through a revolution wherein, to a first approximation, all of human knowledge is becoming available to anyone anywhere with a computer (or tablet, smart phone, etc.) and an internet connection. Things like SOPA and PIPA and RWA and paywalls and RIAA lawsuits against filesharing sites and Elsevier lawsuits against libraries are all attempts to either stop this revolution or put limits on it. I say ‘attempts’ because none of those specific instances look like they’re going to be successful. In fact, I don’t think there is way to stop it, except to withdraw from the wired world. And even then, if you’re passing information around on hardcopies, there’s no guarantee that someone won’t scan them and post all the information to the ‘net without your permission (e.g., WikiLeaks).
Okay, none of that was news for anyone who is alive and awake. But there’s more.
Coming along hand-in-hand with the access revolution is the permanence problem. Anything particularly entertaining, valuable or salacious will be copied and shared until it cannot possibly be suppressed (the Streisand effect). But what about stuff that is valuable to only a few, or only accessed rarely and by specialists? Say, a monograph from the 1920s on some obscure insect order. The disappearance of that information would be potentially crippling to the specialists who work on that order or on related clades. One answer is to just scan everything and make sure that copies are widely distributed; as Mike has pointed out, PDFs are not going away. The amount of scientific literature that has been produced in the last four or five centuries is finite; given how inexpensive storage is these days, I could probably buy enough external hard drives to store ALL of it in PDF form and still make rent next month (if it was all openly available, which it ain’t).
That will get us caught up to now. But if we’re worried about the permanence of blog posts and so on, we have a bigger problem, because unlike published literature few people are archiving blog posts (that we know of), and without backups somewhere the information really can be lost. And that’s what Mike was getting at in that conversation when he suggested PDFing valuable pages.
(Along those lines, I note that Blogger now has a feature where posts can have a PDF button at the bottom, and clicking the button saves a formatted version of the post as a PDF. That seems incredibly useful, and a lot better than the copy-and-paste-into-Word-and-then-save-as-PDF thing I’ve had to do for the times when I’ve wanted a permanent portable version of a WordPress post. Maybe WordPress has the same function and I just don’t know it; I’ll look around and if it doesn’t exist yet I’ll agitate for it to be added.)
At least for now, for the practical problem at hand, I can’t think of a better solution than PDFing useful pages and posts and passing copies around (which doesn’t mean that there isn’t a better solution). The point of the post is just that even in the absence of a better solution, or any solution at all, blog citations are better than pers. comms. at best, and precisely equivalent to pers. comms. at worst. So, IMHO, any individual or journal that accepts pers. comm. citations but not citations of blog posts is just being silly; consistency should dictate either accepting both, accepting neither, or, if you’re only going to accept one, accepting citations of blog posts, which are better unless and until they get deleted.
Finally, we shouldn’t lose track of the fact that this is yet another instance of “how do we deal with useful information that is not published [in the traditional sense]?”–or, in short, “what counts?” And the answer is, we don’t know yet. Both questions are symptoms of the ongoing collision between traditional forms of scientific communication with the realities of the newly wired world, in which everything is open, amateurs can have public, automatically archived high-level technical conversations about published work (that the authors probably can’t afford to ignore), and nobody knows what the landscape will look like in another decade.
I’ll give Mike the last word, in another quote from that Gchat conversation:
Mike: I know all this is just more riffing on What Counts?, but that theme is proving to be a profound and complex one. […] I truly don’t know (A) what WILL happen, (B) what SHOULD happen, or even (C) what I WANT to happen.
I don’t know either. But I have a feeling that we’re in the process of finding out.