You know what’s wrong with scholarly publishing?

Wait, scrub that question. We’ll be here all day. Let me jump straight to the chase and tell you the specific problem with scholarly publishing that I’m thinking of.

There’s nowhere to go to find all open-access papers, to download their metadata, to access it via an open API, to find out what’s new, to act as a platform for the development of new tools. Yes, there’s PubMed Central, but that’s only for work funded by the NIH. Yes, there’s Google Scholar, but that has no API, and at any moment could go the way of Google Wave and Google Reader when Google loses interest.

Instead, we have something like 4000 repositories out there, balkanised by institution, by geographical region, and by subject area. They have different UIs, different underlying data models, different APIs (if any). They’re built on different software platforms. It’s a jungle out there!

81zeSfGzaUL._SL1500_

As researchers, we don’t need 4000 repos. You know what we need? One Repo.

Hey! That would be a good name for a project!

I’ve mentioned before how awesome and pro-open my employers, Index Data, are. (For those who are not regular readers, I’m a palaeontologist only in my spare time. By day, I’m a software engineer.) Now we’re working on an index of green/gold OA publishing. Metadata of every article across every repository and publisher. We want it to be complete, in the sense that we will be going aggressively for the long tail as opposed to focusing on some region or speciality, or things that are easily harvestable by OAI-PMH or other standards. We want it to be of a high, consistent quality in terms of metadata. We want it to be up to date. And most importantly, we want it to be fully open for all and any kind of re-use, by any other actor. This will include downloadable data files, OAI-PMH access, search-retrieve web services, embeddable widgets and more. We also envisage a Linked Data representation with a CRUD interface that allows third parties to contribute supplemental information, entity reconciliation, tagging, etc.

Instead of 4000 fragments, one big, meaty chunk of data.

bodyCover_334

Because we at Index Data have spent the last ten years helping aggregators and publishers and others getting access to difficult-to-access information through all kinds of crazy mechanisms, we have a unique combination of the skills, the tools, and the desire to pursue this venture.

So The One Repo is born. At the noment, we have:

  • Harvesting set up for an initial set of 20 repositories.
  • A demonstrator of one possible UI.
  • A whitepaper describing the motivation and some of the technical aspects.
  • A blog about the project’s progress.
  • An advisory board of some of the brightest, most experienced and wisest people in the world of open access.

We’ve been flying under the radar for the last month and a bit. Now we’re ready for the world to know what we’re up to.

The One Repo is go!

I just read this on The Scholarly Kitchen and nearly fell out of my seat:

In an era with more access given to less qualified people (laypeople and an increasingly unqualified blogging corps presenting themselves as experts or journalists), not to mention to text-miners and others scouring the literature for connections, the obligation to better manage these materials seems to be growing. We can no longer depend on the scarcity of print or the difficulties of distance or barriers of professional expertise to narrow access down to experts with a true need.

I think this may be the most revealing thing ever written on The Scholarly Kitchen. It’s hard to see a way of reading it that isn’t contemptuous of everyone outside the Magic Circle. Ideally, the great unwashed should be excluded altogether; but if we can’t do that, then at least we must tell when what to read and how to use it. Heaven forfend that we let Ordinary People make such decisions for themselves. That is for the priestly caste to do.

A while back, we noted that seriously, Apatosaurus is just nuts, as proven by the illustrations in Ostrom and McIntosh (1966: plate 12).

Now I’m posting those illustrations again, in a modified form, to make the same point. Here ya go:

Brontosaurus excelsus holotype YPM 1980, cervical vertebra 8, in anterior, left lateral and ventral views. Adapted from Marsh's plates in Ostrom & McIntosh (1966).

Brontosaurus excelsus holotype YPM 1980, cervical vertebra 8, in anterior, left lateral and ventral views. Adapted from Marsh’s plates in Ostrom & McIntosh (1966: plates 12-13).

Here’s what’s changed since last time:

  1. Apatosaurusexcelsus is Brontosaurus again!
  2. I cleaned up the scans of the plates, removing all the labels
  3. In the lateral view, I added a reconstruction of the missing neural spine, based on that of Apatosaurus louisae (from Gilmore 1936: plate XXIV). This reconstruction first appeared in Taylor and Wedel (2013a: figure 7).
  4. Most importantly, I added the ventral view of the vertebra from plate 13. Only now can you properly appreciate the truly bizarre shape of this bone. (The prezygs appear to project further forward than they should because the illustrated aspect is not true ventral, but slightly anteroventral.)

If only those three views were enough to construct a 3D model by photogrammetry! Sadly, it’s not possible to get photos of the whole vertebra from different angles now, as it’s tied up in the mounted Brontosaurus skeleton at the YPM:

Part of the neck of the mounted skeleton of Brontosaurus excelsus holotype YPM 1980, in right posterodorsolateral view (i.e. from behind, above, and to the right). The vertebra in the centre of the picture may well be the one illustrated above, but don't hold me to it.

Part of the neck of the mounted skeleton of Brontosaurus excelsus holotype YPM 1980, in right posterodorsolateral view (i.e. from behind, above, and to the right). The vertebra in the centre of the picture may well be the one illustrated above, but don’t hold me to it.

The bottom line: these are some crazy-ass morphologically distinctive vertebrae. Those ventrolaterally projecting processes that bear the cervical ribs are, for my money, the single most distinctive feature of apatosaurine sauropods. And they reach their zenith (or maybe their nadir, since they point downwards) in Brontosaurus. These processes are the reason that apatosaurs had Toblerone-shaped necks — triangular in cross-section, with the base flat or even concave. Any restoration that shows a tubular neck is way off base.

References

Last week I went to Halifax, Nova Scotia, for the twice-yearly meet-up with my Index Data colleagues. On the last day, four of us took a day-trip out to Peggy’s Cove to eat lunch at Ryer Lobsters.

We stopped off at the Peggy’s Cove lighthouse on the way, and spotted a vertebrate, which I am pleased to present:

mike-with-whale

It’s a whale skull, but I have no idea what kind. Can anyone help out?

So much for vertebrates — it was really all about the inverts. Here are six of them:

mike-with-lobster

I have a 2lb lobster here; my colleague Jakub went for two 1lb lobsters, as did Jason and Wolfram (not pictured). That’s Wolfram’s lobster closest to the camera, giving a better impression of just what awesome beasts these were.

Peggy’s Cove: recommended. For vertebrates and inverts.

(Thanks to Wolfram Schneider for these photos.)

 

Aquilops tattoo

My 40th birthday present from Vicki. I commissioned the art from Brian Engh. I bow to no one in my love for his original Aquilops head reconstruction:

Life restoration of Aquilops by Brian Engh. Farke et al. (2014: fig. 6C). CC-BY.

Life restoration of Aquilops by Brian Engh. Farke et al. (2014: fig. 6C). CC-BY.

BUT it’s waaay too detailed for a tattoo unless I wanted a full back piece. I sent Brian this sketch to convey what I wanted – to emphasize the strong lines of the piece, punch up the spines and spikes, basically shift it toward a comic book style without devolving into caricature:

Aquilops tattoo - Matt sketch raw

Originally I was going to have Aquilops‘ name and year of discovery in the tat. I decided to drop the lettering, for several reasons. One, it won’t hold up as well over the next few decades. Two, if someone is close enough to read it, we’ll probably be talking about the tattoo already. Third, the tattoo is a better conversation starter without a caption. First I get to tell people what Aquilops is, then I get to explain what ‘fourth author‘ means. ;-)

As he did for the original Aquilops head recon, Brian sent a selection of possible color schemes, mostly based on those of extant lizards. I couldn’t decide which I liked best, so I talked it over with my tattoo artist, Tanin McCoe at Birch Avenue Tattoo in Flagstaff, Arizona. I wasn’t just interested in what looks good on paper, but what would work well with my skin tone and still look good 20 years from now. Tanin really liked the earth-tone color scheme with the dark stripe across the eye, so that’s how we went. The tattoo Aquilops is facing left instead of right because it’s on my left shoulder – my right deltoid was already occupied.

They do good work at Birch Avenue – Vicki’s gotten three pieces there, including this skeleton key that was also done by Tanin:

Vicki skeleton key tattoo - 1200

Yes, the key’s bit is a human sphenoid – that was my idea.

Anyway, I’m super-happy with the tattoo, and I’m glad it’s healed enough to show off. Thanks, Brian and Tanin!

The longest cell in Andy Farke is one of the primary afferent (sensory) neurons responsible for sensing vibration or fine touch, which runs from the tip of his big toe to his brainstem. (NB: I have not actually dissected Andy to confirm this, or performed any viral neuron tracing on him, this is assumed based on comparative anatomy.) Here’s a diagram:
Longest cell in Andy Farke

This is what happens when (a) I need to create a diagram to illustrate the longest cell in the human body for my students, and (b) my friends put stuff online with a CC-BY license.

Found this while I was checking out Aquilops art online:

Aquilops_scale

It’s a derivative work by Andy IJReid, from this Wikimedia page, based on two PhyloPic silhouettes Andy created (go here for the pathetically tiny lower vertebrate and here for Aquilops).

wedel-rln-fig2

From there it was pretty straighforward to mash up Andy’s silhouette with the nerve stuff from Wedel (2012: fig. 2).

So if you want the full deets on licensing – which I am obligated to provide whether you want them or not – the image up top is a derivative image by me, based on work by Andy published at PhlyoPic under the Creative Commons Attribution 3.0 unported (CC-BY 3.0) license, and based on my own image published in Acta, also under a CC-BY license.

If you’d like to know more about the science behind very long nerves in vertebrates, please see these posts:

Also, keep making stuff and putting it online under a license people can actually use. It’s beneficial for science and education, and hugely entertaining for me.

Reference

Wedel, M.J. 2012. A monument of inefficiency: the presumed course of the recurrent laryngeal nerve in sauropod dinosaurs. Acta Palaeontologica Polonica 57(2):251-256.

I was contacted recently by David Goldenberg (dgoldenberg@gmail.com), a journalist who’s putting together a piece on the biggest dinosaurs. He asked me a few questions, and since I’d taken the time to write answers I thought I may as well post them here.

1) Do you think that we will ever know what the largest dinosaur (by mass) was?

In principle, we can never know that we’ve found the largest dinosaur. All we can know (and we probably can’t really know even this, as we’ll see below) is that we’ve found the largest so far. If we were dealing with animals where there’s a good sample size, there would be statistical techniques that we could use to figure out the likely size-range. But most giant dinosaur species are known only a handful of specimens — sometimes only a single one. How big did Puertasaurus get? We can’t possibly say: the best we can do is estimate how big the one known specimen of Puertasaurus was.

That said, we can sort of get a feel for size classes. There are quite a few sauropods that seem to come in at around 30-40 tonnes — Brachiosaurus, Giraffatitan, Supersaurus, Dreadnoughtus — which suggests there might be some kind of a limit there. But there are bigger titanosaurs (Argentinosaurus, Puertasaurus, Futalognkosaurus) which show that if the barrier exists at all, it’s a “soft” one. And of course the tantalising hints of super-giant sauropods.

There are at least three of these: Amphicoelias fragillimus, a diplodocid known from a drawing of a vertebral arch which has since been lost or destroyed, which could well have massed 100 tonnes. Bruhathkayosaurus, a giant titanosaur known from a two-meter tibia, since destroyed, which could conceivably have massed twice that; and the Broome Sandstone track-maker, known only from footprints, which might have been somewhere in between.

Any one of those, we might write off and say it’s too good to be true — all three stories are pretty vague as to evidence and require a lot of guesswork in the inferences. But the fact that we have all three of these makes me feel pretty certain that there were indeed sauropods out there in the 100-200 tonne range (i.e. the size of big whales). I only hope we find solid, verifiable, curated evidence for them some time soon.

2) What bones do you need to have before you can make an accurate measurement?

You can’t ever make an accurate measurement. Consider even a really well represented, essentially complete specimen such as MB.R.2181 (previously known as HM S II), the giant mounted skeleton in the Museum für Naturkunde Berlin. Peer-reviewed published estimates of the mass of that one individual have varied between 13,618 and 78,258 kg — a factor of 5.75. Even if you discard these obvious outlier estimates, recent and credible estimates vary from 23,337 to 38,000 kg, which is still a factor of 1.63.

And this is not completely crazy. Two humans with essentially identical skeletons can weigh 70 and 114 kg, after all. Soft tissue is essentially impossible to predict.

3) What do you make of the fact that so many different species have been given the title? Is that the fault of the media or scientists or what?

A big part of is that it depends on what you count. That Berlin brachiosaur is the biggest dinosaur known from an essentially complete skeleton, so Giraffatitan is a legitimate holder of the crown. (Confusing matters further, it used to be thought to be a species of Brachiosaurus). But there were definitely bigger sauropods than that — just not known from such complete specimens. Argentinosaurus was certainly bigger, for example. But there’s no way to put a meaningful whole-body mass estimate on it.

But yes, there is also an understandable tendency towards sensationalism, both from scientists and the press. There have been plenty of new discoveries that can legitimately be described as “could be the biggest yet”.

We as a community often ask ourselves how much it should cost to publish an open-access paper. (We know how much it does cost, roughly: typically $3000 with a legacy publisher, or an average of $900 with a born-open publisher, or nothing at all for many journals.)

We know that peer-review is essentially free to publishers, being donated free by scholars. We know that most handling editors also work for free or for peanuts. We know that hosting things on the Web is cheap (“publishing [in this sense] is just a button“).

Publishers have costs associated with rejecting manuscripts — checking that they’re by real people at real institutions, scanning for obvious pseudo-scholarship, etc. But let’s ignore those costs for now, as being primarily for the benefit of the publishers rather than the author. (When I pay a publisher an APC, they’re not serving me directly by running plagiarism checks.)

The tendency of many discussions I’ve been involved with has been that the main technical contribution of publishers is the process that is still, for historical reasons, known as “typesetting” — that is, the transformation of the manuscript from from an opaque form like an MS-Word file (or indeed a stack of hand-written sheets) into a semantically rich representation such as JATS XML. From there, actual typesetting into HTML or a pretty PDF can be largely automated.

So: what does it cost to typeset a manuscript?

First data point: I have heard that Kaveh Bazargan’s River Valley Technologies (the typesetter that PeerJ and many more mainstream publishers use) charges between £3.50 and £9 per page, including XML, graphics, PDF generation and proof correction.

Second data point: in a Scholarly Kitchen post that Kent Anderson intended as a criticism of PubMed Central but which in fact makes a great case for what good value it provides, he quotes an email from Kent A. Smith, a former Deputy Director of the NLM:

Under the % basis I am using here $47 per article. John [Mullican, a program analyst at NCBI] and I looked at this yesterday and based the number on a sampling of a few months billings. It consists on the average of about $34-35 per tagged article plus $10-11 for Q/A plus administrative fees of $2-3, where applicable.

Using the quoted figure of $47 per PMC article and the £6.25 midpoint of River Valley’s range of per-page prices (= $9.68 per page), that would be consistent with typical PMC articles being a bit under five pages long. The true figure is probably somewhat higher — maybe twice as long or more — but this seems to be at least in the same ballpark.

Third data point: Charles H. E. Ault, in a comment on that Scholarly Kitchen post, wrote:

As a production director at a small-to-middling university press that publishes no journals, I’m a bit reluctant to jump into this fray. But I must say that I am astonished at how much PMC is paying for XML tagging. Most vendors looking for the small amount of business my press can offer (say, maybe 10,000 pages a year at most) charge considerably less than $0.50 per page for XML tagging. Assuming a journal article is about 30 pages long, it should cost no more than $15 for XML tagging. Add another few bucks for quality assurance, and you might cross the $20 threshold. Does PMC have to pay a federally mandated minimum rate, like bridge construction projects? Where can I submit a bid?

I find the idea of 50-cent-per-page typesetting hard to swallow — it’s more than an order of magnitude cheaper than the River Valley/PMC level, and I’d like to know more about Ault’s operation. Is what they’re doing really comparable with what the others are doing?

Are there other estimates out there?

 

Re-reading an email that Matt sent me back in January, I see this:

One quick point about [an interesting sauropod specimen]. I can envision writing that up as a short descriptive paper, basically to say, “Hey, look at this weird thing we found! Morrison sauropod diversity is still underestimated!” But I honestly doubt that we’ll ever get to it — we have literally years of other, more pressing work in front of us. So maybe we should just do an SV-POW! post about the weirdness of [that specimen], so that the World Will Know.

Although as soon as I write that, I think, “Screw that, I’m going to wait until I’m not busy* and then just take a single week* and rock out a wiper* on it.”

I realize that this way of thinking represents a profound and possibly psychotic break with reality. *Thrice! But it still creeps up on me.

(For anyone not familiar with the the “wiper”, it refers to a short paper of only one or two pages. The etymology is left as an exercise to the reader.)

It’s just amazing how we keep on and on falling for this delusion that we can get a paper out quickly, even when we know perfectly well, going into the project, that it’s not going to work out that way. To pick a recent example, my paper on quantifying the effect of intervertebral cartilage on neutral posture was intended to be literally one page, an addendum to the earlier paper on cartilage: title, one paragraph of intro, diagram, equation, single reference, DONE! Instead, it landed up being 11 pages long with five illustrations and two tables.

I think it’s a reasonable approximation to say that any given project will require about an order of magnitude more work than we expect at the outset.

Even as I write this, the top of my palaeo-work priority list is a paper that I’m working on with Matt and two other colleagues, which he kicked off on 6 May, writing:

I really, really want to kill this off absolutely ASAP. Like, seriously, within a week or two. Is that cool? Is that doable?

To which I idiotically replied:

IT SHALL BE SO!

A month and a bit later, the answers to Matt’s questions are clear. Yes, it’s cool; and no, it’s not doable.

The thing is, I think that’s … kind of OK. The upshot is that we end up writing reasonably substantial papers, which is after all what we’re meant to be trying to do. If the reasonably substantial papers that end up getting written aren’t necessarily the ones we thought they were going to be, well, that’s not a problem. After all, as I’ve noted before, my entire Ph.D dissertation was composed of side-projects, and I never got around to doing the main project. That’s fine.

In 2011, Matt’s tutorial on how to find problems to work on discussed in detail how projects grow and mutate and anastamose. I’m giving up on thinking that this is a bad thing, abandoning the idea that I ought to be in control of my own research program. I’m just going to keep chasing whatever rabbits look good to me at the time, and see what happens.

Onwards!

In my blog-post announcing Haestasaurus as the new generic name for the misassigned species “Pelorosaurusbecklesii, I briefly surveyed the three phylogenetic analyses in the paper. Of the third — the one based on the Mannion et al. (2013) Lusotitan matrix using both discrete and continuous characters — I wrote that it …

… recovers Haestasaurus as a titanosaur — as sister to Diamantinasaurus and then Malawisaurus, making it a lithostrotian well down inside Titanosauria.

My mistake! I was working from the result of an earlier version of that analysis. In the final version included in the paper, things are rather different:

Fig 17. Strict consensus tree (LCDM). A strict consensus tree based on the 17 most parsimonious trees generated by analysis of the Mannion et al. [18] LCDM with the revised scores for Haestasaurus and the addition of six new characters. GC values (multiplied by 100) are shown in square brackets for all nodes where these values are greater than 0. Abbreviations: Brc, Brachiosauridae; Dd, Diplodocoidea. N.B. the tree topology shown here means that the clades defined by Brachiosaurus+Saltasaurus (Titanosauriformes) and Andesaurus+Saltasaurus (Titanosauria) are identical. See main text for details.

Upchurch et al. (2015: Fig 17). Strict consensus tree (LCDM).
A strict consensus tree based on the 17 most parsimonious trees generated by analysis of the Mannion et al. [18] LCDM with the revised scores for Haestasaurus and the addition of six new characters. GC values (multiplied by 100) are shown in square brackets for all nodes where these values are greater than 0. Abbreviations: Brc, Brachiosauridae; Dd, Diplodocoidea. N.B. the tree topology shown here means that the clades defined by Brachiosaurus+Saltasaurus (Titanosauriformes) and Andesaurus+Saltasaurus (Titanosauria) are identical. See main text for details.

As you can see, Haestasaurus is indeed a titanosaur in this analysis — but not a derived one at all. In fact, it’s part of the most basal clade of titanosaurs, along with Janenschia and Dongbeititan. In this tree, we have a really nice, big Brachiosauridae, containing 19 OTUs split fairly evenly between two subclades.

[Side-note: Upchurch et al. (2015) uses phylogenetic definitions that I’m not crazy about. I prefer the arrangement that I followed in my brachiosaur paper (Taylor 2009), in which Titanosauriformes = Brachiosauridae + Titanosauria is a node-stem triplet. Hopefully, some time soon, the wretched PhyloCode will finally be implemented, and we’ll be in a position to nail down a single set of definitions for the whole community to use.]

Anyway, the upshot of all this is that all three phylogenetic analyses in the paper return Haestasaurus as a pretty basal macronarian, and on the balance of evidence it’s likely not a titanosaur after all. (That’s why the name “Haestatitan“, which was in some earlier drafts of the paper, was changed to Haestasaurus. Kind of a shame, given how mundane -saurus names are, but probably the wisest course of action.)

What is the takeaway lesson from this? It’s not just “Haestasaurus is not a derived titanosaur”. It’s that all our phylogenetic hypotheses are just that — hypotheses. Papers that publish only a single cladogram are always at risk of being misinterpreted as conveying much more certainty than they really do, and Paul and Phil are to be commended for including the whole messy story in this paper. The position of Haestasaurus shifts around far too easily for us to have a strong sense of what it is, and it’s good that the paper makes that clear.

(It also makes me glad that way back in Taylor and Naish (2007), I and Darren didn’t give a more precise position of Xenoposeidon than that it’s probably some kind of neosauropod. And even that is not something I would put money on.)

References