Blog citations are better than pers. comms.
February 6, 2012
Here’s an excerpt from a Google chat conversation that Mike and I had last May. I’m posting it now as a break from the OA Wars, and because it’s annoying to have to keep track of stuff that we know about but haven’t talked about publicly.
Matt: Something occurred to me the other day, and I can’t remember whether I’ve discussed it with you or not. So sorry in advance if it’s a dupe.
Mike: np.
Matt: You had pointed out that a pers. comm. is a link that goes nowhere. Obviously one of the concerns with citing blog posts is their permanence.
Mike: True. The only REAL concern, in fact. And 4wiw, a concern just as valid for other web-based resources.
Matt: The failure mode of a blog citation is a pers. comm.
Mike: Oh, good point. It degrades gracefully, as we say in programming.
Matt: Yes, exactly. Citing a blog post is better than a pers.comm. while the post is up, and no worse if it goes away.
I’ll break in here and point out that the same is true for pers. obs., unpubl. data, in prep., and other citations that don’t point to resources available to the reader: IF there’s a relevant blog post (and there may not be), citing the blog post gives readers more info than one of those “link to nowhere” modes of citation, and no less info if the blog post ever goes away. Obviously there are times when you’d prefer to keep unpublished data and in prep work out of the public eye until you’re ready to deploy it. But for people doing true open notebook science, there is no need to ever cite “unpubl. data” because there’s no such thing. I wonder if that’s the shape of the future? Also, if you have a blog, there’s no need to ever do a pers. obs. citation. Just blog about it and then cite your blog. If an editor or reviewer gives you grief, point out that the alternative pers. obs. citation would have been objectively inferior to putting the information online and then citing it!
The conversation continues:
Mike: I’ve had another thought on this.
Matt: Do tell!
Mike: At the moment, the article “How big was Amphicoelias fragillimus? I mean, really?” lives at https://svpow.wordpress.com/2010/02/19/how-big-was-amphicoelias-fragillimus-i-mean-really/ BUT if that web-page ever goes away, it’ll be because we’ve moved SV-POW! elsewhere. The article will still be out there, just in a different location. So citing blog posts by URL is a bit like citing the specific copy of The Dinosauria that’s on the shelf behind me, and which will go away if my house burns down. That citation doesn’t bother anyone because they know they can just look at another copy. But actually, I’ve many times found copies of web-pages I wanted, after they’ve gone away, just by googling the titles. So I think we should just encourage a lot of copying and mirroring and PDFing of pages and passing around copies of the PDFs and suchlike.
Matt: Yeah, that would be good.
This is an attempt to deal with the problem raised in the first part, which is the possible impermanence of web sources. DOIs and WebCitation and so on are other approaches to the same problem.
I think this is a big deal. Right now we–as in, humans, or at least the wired world–are going through a revolution wherein, to a first approximation, all of human knowledge is becoming available to anyone anywhere with a computer (or tablet, smart phone, etc.) and an internet connection. Things like SOPA and PIPA and RWA and paywalls and RIAA lawsuits against filesharing sites and Elsevier lawsuits against libraries are all attempts to either stop this revolution or put limits on it. I say ‘attempts’ because none of those specific instances look like they’re going to be successful. In fact, I don’t think there is way to stop it, except to withdraw from the wired world. And even then, if you’re passing information around on hardcopies, there’s no guarantee that someone won’t scan them and post all the information to the ‘net without your permission (e.g., WikiLeaks).
Okay, none of that was news for anyone who is alive and awake. But there’s more.
Coming along hand-in-hand with the access revolution is the permanence problem. Anything particularly entertaining, valuable or salacious will be copied and shared until it cannot possibly be suppressed (the Streisand effect). But what about stuff that is valuable to only a few, or only accessed rarely and by specialists? Say, a monograph from the 1920s on some obscure insect order. The disappearance of that information would be potentially crippling to the specialists who work on that order or on related clades. One answer is to just scan everything and make sure that copies are widely distributed; as Mike has pointed out, PDFs are not going away. The amount of scientific literature that has been produced in the last four or five centuries is finite; given how inexpensive storage is these days, I could probably buy enough external hard drives to store ALL of it in PDF form and still make rent next month (if it was all openly available, which it ain’t).
That will get us caught up to now. But if we’re worried about the permanence of blog posts and so on, we have a bigger problem, because unlike published literature few people are archiving blog posts (that we know of), and without backups somewhere the information really can be lost. And that’s what Mike was getting at in that conversation when he suggested PDFing valuable pages.
(Along those lines, I note that Blogger now has a feature where posts can have a PDF button at the bottom, and clicking the button saves a formatted version of the post as a PDF. That seems incredibly useful, and a lot better than the copy-and-paste-into-Word-and-then-save-as-PDF thing I’ve had to do for the times when I’ve wanted a permanent portable version of a WordPress post. Maybe WordPress has the same function and I just don’t know it; I’ll look around and if it doesn’t exist yet I’ll agitate for it to be added.)
At least for now, for the practical problem at hand, I can’t think of a better solution than PDFing useful pages and posts and passing copies around (which doesn’t mean that there isn’t a better solution). The point of the post is just that even in the absence of a better solution, or any solution at all, blog citations are better than pers. comms. at best, and precisely equivalent to pers. comms. at worst. So, IMHO, any individual or journal that accepts pers. comm. citations but not citations of blog posts is just being silly; consistency should dictate either accepting both, accepting neither, or, if you’re only going to accept one, accepting citations of blog posts, which are better unless and until they get deleted.
Finally, we shouldn’t lose track of the fact that this is yet another instance of “how do we deal with useful information that is not published [in the traditional sense]?”–or, in short, “what counts?” And the answer is, we don’t know yet. Both questions are symptoms of the ongoing collision between traditional forms of scientific communication with the realities of the newly wired world, in which everything is open, amateurs can have public, automatically archived high-level technical conversations about published work (that the authors probably can’t afford to ignore), and nobody knows what the landscape will look like in another decade.
I’ll give Mike the last word, in another quote from that Gchat conversation:
Mike: I know all this is just more riffing on What Counts?, but that theme is proving to be a profound and complex one. […] I truly don’t know (A) what WILL happen, (B) what SHOULD happen, or even (C) what I WANT to happen.
I don’t know either. But I have a feeling that we’re in the process of finding out.
February 6, 2012 at 2:31 pm
Very cool and insightful post. I agree 100% that for now blog citations are superior to personal communications. In terms of long-term solutions, I think I share your uncertainty. Here it sounds like one primary solution is to let the “archive” develop organically, through distribution of PDFs through a network of interested individuals. Presumably, for long-term permanence, we need to have an institutional presence with longer-term permanence? Would that be Library of Congress? Some similar institution?
February 6, 2012 at 4:00 pm
Permanence isn’t the only issue – mutability is as well. Anyone can change a blog post later, and if the original isn’t archived it’s just gone. A less scrupulous person (or just someone embarrassed at their past mistake) could remove or alter it.
Of course this could also be addressed by your pdf archiving solution, or some other archived database, but I think we’ll need some sort of DOI-like register then, so people can verify that we are all looking at the same version of a quoted post.
BTW, a somewhat similar subject is covered (in part) here: http://techcrunch.com/2012/02/05/the-future-of-peer-review/
February 6, 2012 at 4:21 pm
For this very reason, I have seriously considered submitting some of the more substantial SV-POW! posts to Nature Precedings. I should stop seriously considering it and start doing it.
February 6, 2012 at 4:38 pm
You definitely should. Not that I’m one to throw stones…
February 6, 2012 at 9:32 pm
Regarding the comment on citing ONS documentation, we have used lab notebook pages as regular references in papers without any problem. It is also helpful to point to a snapshot archive of the entire notebook and associated raw data in a zip file for example taken on a particular day. We have uploaded these to our institutional repository, data disk publishers (like LuLu) and other servers for redundancy. Nature Precedings is also an excellent location to register archival data but only PDF files (or PPT) are accepted – in this case we have converted ONS generated databases into PDF books and uploaded them to Nature Precedings with links from there to full archives. See examples of these strategies here: http://onsbooks.wikispaces.com/
February 6, 2012 at 9:32 pm
Here it sounds like one primary solution is to let the “archive” develop organically, through distribution of PDFs through a network of interested individuals. Presumably, for long-term permanence, we need to have an institutional presence with longer-term permanence? Would that be Library of Congress? Some similar institution?
Dunno. A related problem: what gets archived? Even the most content-rich blogs occasionally have goofy posts that don’t really need to be preserved for posterity. OTOH, at the blogs in question those posts are in the minority and would add very little to the storage space required to archive the blog.
I’m more worried about situations where a mostly goofy blog has a few content-rich posts. My old blog, Ask Dr Vector, probably falls into that bin.
If archives develop organically, those of us who care about such things might find and archive all the relevant posts, no matter where they are. Might. Although I am regularly freaked out by the amount of stuff that goes on in the paleo-blogosphere that I don’t find out about for months; makes me wonder what I’m missing entirely.
If blog archiving is done institutionally, how are the folks at the Library of Congress going to know that Eclectic McBloggy has a few really crucial posts on tardigrades mixed in with the vegan BBQ recipes and elevator music album reviews?
Since the idea of blog posts as scientific literature is so new, there are probably a lot of people who would be content to miss out on those posts. Heck, some people might even be happy that the “official” discussions on topic X would not be contaminated by material that “doesn’t count”. And depending on the quality of the posts, they might even be right. I just don’t think we can make those judgments a priori anymore; at least some blog posts (and DML posts, etc.) are sufficiently valuable that we probably can’t afford to lose them.
As usual, I don’t have any real answers, I’m just coming to grips with the scope of the problem.
February 6, 2012 at 9:40 pm
Thanks, Jean-Claude, it’s great to get some perspective from someone who is actually doing open-notebook science.
February 6, 2012 at 10:56 pm
It is pretty straightforward to archive a blog snapshot with a tool like WebCite (http://www.webcitation.org/), so impermanence and mutability really only apply to using the original URLs as the links.
February 6, 2012 at 11:54 pm
In theory I think WebCite is great. Unfortunately, the one time I tried to use it, it screwed me. My 2009 BZN paper on the inevitability of electronically published nomenclatural acts cites seven blogs or comments, so I WebCited them all and added the archive links to the manuscript before submitting it. While it was in review, WebCite changed the links — all seven of them — rendering them obviously useless. So I took the “permanent” links out of the revised manuscript.
I emailed Dr. Gunther Eysenbach, the WebCite project initiator, about this. Twice, in fact. Never got a reply.
So I am afraid it’s thumbs down from me.
February 6, 2012 at 11:58 pm
I agree that WebCite looks like the front-runner in archiving blog posts. On the organic-versus-institutional spectrum that Andy and I were discussing above, it seems to be somewhere in the middle, in that it is a single institutional repository (if ‘repository’ is the right word) but accepts ‘organic’ input. So maybe that’s the ideal solution. [Assuming the permanence issues of WebCite itself are, ahem, permanently resolved–Mike’s comment and this one passed in the ether.]
I had a follow-up thought on the issue of goofy posts at serious blogs and vice versa, which is that whether a particular post is on-topic really depends on what topic the citing author is interested in. For example, in Mike’s BZN paper on electronic publication he cited this post of mine which has nothing to do with sauropod vertebrae except for a throwaway image at the tail end. Someone who only comes to SV-POW! for sauropod vertebrae might not consider that post worth archiving, whereas someone whose primary interest is in documenting the OA wars might not give a hoot about all of our sauropod vertebra posts.
The more I think about it, the more an organic backup system (or a hybrid system with organic input, like WebCite) seems like the way to go, because the decisions about usefulness can be made by the people who actually need the material. The only thing that gives me pause is the mutability problem Scott mentioned coupled with the long shelf-life of ideas in science. What if someone doesn’t need post Y until five years from now, and by that time it’s been changed or deleted? The safest course is just to document everything, but even in this era of ultra-cheap storage that’s probably not possible, except maybe for Google. And, realistically, I’m not going to go around PDFing or WebCiting a bunch of posts that I think I might need someday.
Then again, if we got in the habit of archiving (in PDF, WebCite, or whatever) every SV-POW! post as it went up, and plowed through our own archives doing the same (which could probably be done in a long evening or over a weekend), then this blog would be done with a relatively minor amount of hassle.* Most science bloggers could probably handle their own archiving, and for the handful that don’t have time or couldn’t be arsed (I’m thinking of Darren at TetZoo here), the work could probably be crowdsourced. So maybe it’s possible after all.
*Note to Mike and my future self: we really should do this sooner rather than later. If SV-POW! ever went down, either because of some catastrophic WordPress server failure or because Elsevier leaned on them to do it, we would lose a LOT of good stuff. I have maybe half a dozen posts backed up somewhere, but the rest are vulnerable.
February 7, 2012 at 12:23 am
It’s very far from a complete or perfect achiving facility, but WordPress allows you to download an big XML document containing all the posts on your blog (and optionally all the comments, too). I did this recently for SV-POW! and have it safely backed up in multiple locations. (It came 14.5 Mb including comments.) This doesn’t include the images, though.
March 15, 2013 at 7:36 pm
[…] that’s not the world we live in. We’re happy enough to cite blog posts, etc. (they’re better than pers. comms., at least), but not everyone is, and the minimum bound of What Counts is controlled by people at […]
April 10, 2013 at 5:46 pm
[…] Refusing to cite in-prep papers, dissertations and blogs (while accepting pers. comm.) […]