Introducing The One Repo
June 30, 2015
You know what’s wrong with scholarly publishing?
Wait, scrub that question. We’ll be here all day. Let me jump straight to the chase and tell you the specific problem with scholarly publishing that I’m thinking of.
There’s nowhere to go to find all open-access papers, to download their metadata, to access it via an open API, to find out what’s new, to act as a platform for the development of new tools. Yes, there’s PubMed Central, but that’s only for work funded by the NIH. Yes, there’s Google Scholar, but that has no API, and at any moment could go the way of Google Wave and Google Reader when Google loses interest.
Instead, we have something like 4000 repositories out there, balkanised by institution, by geographical region, and by subject area. They have different UIs, different underlying data models, different APIs (if any). They’re built on different software platforms. It’s a jungle out there!
As researchers, we don’t need 4000 repos. You know what we need? One Repo.
Hey! That would be a good name for a project!
I’ve mentioned before how awesome and pro-open my employers, Index Data, are. (For those who are not regular readers, I’m a palaeontologist only in my spare time. By day, I’m a software engineer.) Now we’re working on an index of green/gold OA publishing. Metadata of every article across every repository and publisher. We want it to be complete, in the sense that we will be going aggressively for the long tail as opposed to focusing on some region or speciality, or things that are easily harvestable by OAI-PMH or other standards. We want it to be of a high, consistent quality in terms of metadata. We want it to be up to date. And most importantly, we want it to be fully open for all and any kind of re-use, by any other actor. This will include downloadable data files, OAI-PMH access, search-retrieve web services, embeddable widgets and more. We also envisage a Linked Data representation with a CRUD interface that allows third parties to contribute supplemental information, entity reconciliation, tagging, etc.
Instead of 4000 fragments, one big, meaty chunk of data.
Because we at Index Data have spent the last ten years helping aggregators and publishers and others getting access to difficult-to-access information through all kinds of crazy mechanisms, we have a unique combination of the skills, the tools, and the desire to pursue this venture.
So The One Repo is born. At the noment, we have:
- Harvesting set up for an initial set of 20 repositories.
- A demonstrator of one possible UI.
- A whitepaper describing the motivation and some of the technical aspects.
- A blog about the project’s progress.
- An advisory board of some of the brightest, most experienced and wisest people in the world of open access.
We’ve been flying under the radar for the last month and a bit. Now we’re ready for the world to know what we’re up to.
The One Repo is go!
Arrogance, elitism, paternalism
June 29, 2015
I just read this on The Scholarly Kitchen and nearly fell out of my seat:
In an era with more access given to less qualified people (laypeople and an increasingly unqualified blogging corps presenting themselves as experts or journalists), not to mention to text-miners and others scouring the literature for connections, the obligation to better manage these materials seems to be growing. We can no longer depend on the scarcity of print or the difficulties of distance or barriers of professional expertise to narrow access down to experts with a true need.
I think this may be the most revealing thing ever written on The Scholarly Kitchen. It’s hard to see a way of reading it that isn’t contemptuous of everyone outside the Magic Circle. Ideally, the great unwashed should be excluded altogether; but if we can’t do that, then at least we must tell when what to read and how to use it. Heaven forfend that we let Ordinary People make such decisions for themselves. That is for the priestly caste to do.
Brontosaurus cervical 8 … it just gets weirder
June 19, 2015
A while back, we noted that seriously, Apatosaurus is just nuts, as proven by the illustrations in Ostrom and McIntosh (1966: plate 12).
Now I’m posting those illustrations again, in a modified form, to make the same point. Here ya go:

Brontosaurus excelsus holotype YPM 1980, cervical vertebra 8, in anterior, left lateral and ventral views. Adapted from Marsh’s plates in Ostrom & McIntosh (1966: plates 12-13).
Here’s what’s changed since last time:
- “Apatosaurus” excelsus is Brontosaurus again!
- I cleaned up the scans of the plates, removing all the labels
- In the lateral view, I added a reconstruction of the missing neural spine, based on that of Apatosaurus louisae (from Gilmore 1936: plate XXIV). This reconstruction first appeared in Taylor and Wedel (2013a: figure 7).
- Most importantly, I added the ventral view of the vertebra from plate 13. Only now can you properly appreciate the truly bizarre shape of this bone. (The prezygs appear to project further forward than they should because the illustrated aspect is not true ventral, but slightly anteroventral.)
If only those three views were enough to construct a 3D model by photogrammetry! Sadly, it’s not possible to get photos of the whole vertebra from different angles now, as it’s tied up in the mounted Brontosaurus skeleton at the YPM:

Part of the neck of the mounted skeleton of Brontosaurus excelsus holotype YPM 1980, in right posterodorsolateral view (i.e. from behind, above, and to the right). The vertebra in the centre of the picture may well be the one illustrated above, but don’t hold me to it.
The bottom line: these are some crazy-ass morphologically distinctive vertebrae. Those ventrolaterally projecting processes that bear the cervical ribs are, for my money, the single most distinctive feature of apatosaurine sauropods. And they reach their zenith (or maybe their nadir, since they point downwards) in Brontosaurus. These processes are the reason that apatosaurs had Toblerone-shaped necks — triangular in cross-section, with the base flat or even concave. Any restoration that shows a tubular neck is way off base.
References
- Gilmore Charles W. 1936. Osteology of Apatosaurus, with special reference to specimens in the Carnegie Museum. Memoirs of the Carnegie Museum 11:175–300 and plates XXI–XXXIV.
- Ostrom, John H., and John S. McIntosh. 1966. Marsh’s Dinosaurs. Yale University Press, New Haven and London. 388 pages including 65 absurdly beautiful plates.
- Taylor, Michael P., and Mathew J. Wedel. 2013. Why sauropods had long necks; and why giraffes have short necks. PeerJ 1:e36. 41 pages, 11 figures, 3 tables. doi:10.7717/peerj.36
Vertebrates and invertebrates of Nova Scotia
June 16, 2015
Last week I went to Halifax, Nova Scotia, for the twice-yearly meet-up with my Index Data colleagues. On the last day, four of us took a day-trip out to Peggy’s Cove to eat lunch at Ryer Lobsters.
We stopped off at the Peggy’s Cove lighthouse on the way, and spotted a vertebrate, which I am pleased to present:
It’s a whale skull, but I have no idea what kind. Can anyone help out?
So much for vertebrates — it was really all about the inverts. Here are six of them:
I have a 2lb lobster here; my colleague Jakub went for two 1lb lobsters, as did Jason and Wolfram (not pictured). That’s Wolfram’s lobster closest to the camera, giving a better impression of just what awesome beasts these were.
Peggy’s Cove: recommended. For vertebrates and inverts.
(Thanks to Wolfram Schneider for these photos.)
New information on the integumentary ornamentation of Aquilops americanus (that I have on my shoulder)
June 14, 2015
My 40th birthday present from Vicki. I commissioned the art from Brian Engh. I bow to no one in my love for his original Aquilops head reconstruction:
BUT it’s waaay too detailed for a tattoo unless I wanted a full back piece. I sent Brian this sketch to convey what I wanted – to emphasize the strong lines of the piece, punch up the spines and spikes, basically shift it toward a comic book style without devolving into caricature:
Originally I was going to have Aquilops‘ name and year of discovery in the tat. I decided to drop the lettering, for several reasons. One, it won’t hold up as well over the next few decades. Two, if someone is close enough to read it, we’ll probably be talking about the tattoo already. Third, the tattoo is a better conversation starter without a caption. First I get to tell people what Aquilops is, then I get to explain what ‘fourth author‘ means. ;-)
As he did for the original Aquilops head recon, Brian sent a selection of possible color schemes, mostly based on those of extant lizards. I couldn’t decide which I liked best, so I talked it over with my tattoo artist, Tanin McCoe at Birch Avenue Tattoo in Flagstaff, Arizona. I wasn’t just interested in what looks good on paper, but what would work well with my skin tone and still look good 20 years from now. Tanin really liked the earth-tone color scheme with the dark stripe across the eye, so that’s how we went. The tattoo Aquilops is facing left instead of right because it’s on my left shoulder – my right deltoid was already occupied.
They do good work at Birch Avenue – Vicki’s gotten three pieces there, including this skeleton key that was also done by Tanin:
Yes, the key’s bit is a human sphenoid – that was my idea.
Anyway, I’m super-happy with the tattoo, and I’m glad it’s healed enough to show off. Thanks, Brian and Tanin!
The longest cell in Andy Farke
June 12, 2015
The longest cell in Andy Farke is one of the primary afferent (sensory) neurons responsible for sensing vibration or fine touch, which runs from the tip of his big toe to his brainstem. (NB: I have not actually dissected Andy to confirm this, or performed any viral neuron tracing on him, this is assumed based on comparative anatomy.) Here’s a diagram:
This is what happens when (a) I need to create a diagram to illustrate the longest cell in the human body for my students, and (b) my friends put stuff online with a CC-BY license.
Found this while I was checking out Aquilops art online:
It’s a derivative work by Andy IJReid, from this Wikimedia page, based on two PhyloPic silhouettes Andy created (go here for the pathetically tiny lower vertebrate and here for Aquilops).
From there it was pretty straighforward to mash up Andy’s silhouette with the nerve stuff from Wedel (2012: fig. 2).
So if you want the full deets on licensing – which I am obligated to provide whether you want them or not – the image up top is a derivative image by me, based on work by Andy published at PhlyoPic under the Creative Commons Attribution 3.0 unported (CC-BY 3.0) license, and based on my own image published in Acta, also under a CC-BY license.
If you’d like to know more about the science behind very long nerves in vertebrates, please see these posts:
- The world’s longest cells? Speculations on the nervous systems of sauropods
- Oblivious sauropods being eaten
Also, keep making stuff and putting it online under a license people can actually use. It’s beneficial for science and education, and hugely entertaining for me.
Reference
Will we ever find the biggest dinosaur?
June 12, 2015
I was contacted recently by David Goldenberg (dgoldenberg@gmail.com), a journalist who’s putting together a piece on the biggest dinosaurs. He asked me a few questions, and since I’d taken the time to write answers I thought I may as well post them here.
1) Do you think that we will ever know what the largest dinosaur (by mass) was?
In principle, we can never know that we’ve found the largest dinosaur. All we can know (and we probably can’t really know even this, as we’ll see below) is that we’ve found the largest so far. If we were dealing with animals where there’s a good sample size, there would be statistical techniques that we could use to figure out the likely size-range. But most giant dinosaur species are known only a handful of specimens — sometimes only a single one. How big did Puertasaurus get? We can’t possibly say: the best we can do is estimate how big the one known specimen of Puertasaurus was.
That said, we can sort of get a feel for size classes. There are quite a few sauropods that seem to come in at around 30-40 tonnes — Brachiosaurus, Giraffatitan, Supersaurus, Dreadnoughtus — which suggests there might be some kind of a limit there. But there are bigger titanosaurs (Argentinosaurus, Puertasaurus, Futalognkosaurus) which show that if the barrier exists at all, it’s a “soft” one. And of course the tantalising hints of super-giant sauropods.
There are at least three of these: Amphicoelias fragillimus, a diplodocid known from a drawing of a vertebral arch which has since been lost or destroyed, which could well have massed 100 tonnes. Bruhathkayosaurus, a giant titanosaur known from a two-meter tibia, since destroyed, which could conceivably have massed twice that; and the Broome Sandstone track-maker, known only from footprints, which might have been somewhere in between.
Any one of those, we might write off and say it’s too good to be true — all three stories are pretty vague as to evidence and require a lot of guesswork in the inferences. But the fact that we have all three of these makes me feel pretty certain that there were indeed sauropods out there in the 100-200 tonne range (i.e. the size of big whales). I only hope we find solid, verifiable, curated evidence for them some time soon.
2) What bones do you need to have before you can make an accurate measurement?
You can’t ever make an accurate measurement. Consider even a really well represented, essentially complete specimen such as MB.R.2181 (previously known as HM S II), the giant mounted skeleton in the Museum für Naturkunde Berlin. Peer-reviewed published estimates of the mass of that one individual have varied between 13,618 and 78,258 kg — a factor of 5.75. Even if you discard these obvious outlier estimates, recent and credible estimates vary from 23,337 to 38,000 kg, which is still a factor of 1.63.
And this is not completely crazy. Two humans with essentially identical skeletons can weigh 70 and 114 kg, after all. Soft tissue is essentially impossible to predict.
3) What do you make of the fact that so many different species have been given the title? Is that the fault of the media or scientists or what?
A big part of is that it depends on what you count. That Berlin brachiosaur is the biggest dinosaur known from an essentially complete skeleton, so Giraffatitan is a legitimate holder of the crown. (Confusing matters further, it used to be thought to be a species of Brachiosaurus). But there were definitely bigger sauropods than that — just not known from such complete specimens. Argentinosaurus was certainly bigger, for example. But there’s no way to put a meaningful whole-body mass estimate on it.
But yes, there is also an understandable tendency towards sensationalism, both from scientists and the press. There have been plenty of new discoveries that can legitimately be described as “could be the biggest yet”.
How much does “typesetting” cost?
June 11, 2015
We as a community often ask ourselves how much it should cost to publish an open-access paper. (We know how much it does cost, roughly: typically $3000 with a legacy publisher, or an average of $900 with a born-open publisher, or nothing at all for many journals.)
We know that peer-review is essentially free to publishers, being donated free by scholars. We know that most handling editors also work for free or for peanuts. We know that hosting things on the Web is cheap (“publishing [in this sense] is just a button“).
Publishers have costs associated with rejecting manuscripts — checking that they’re by real people at real institutions, scanning for obvious pseudo-scholarship, etc. But let’s ignore those costs for now, as being primarily for the benefit of the publishers rather than the author. (When I pay a publisher an APC, they’re not serving me directly by running plagiarism checks.)
The tendency of many discussions I’ve been involved with has been that the main technical contribution of publishers is the process that is still, for historical reasons, known as “typesetting” — that is, the transformation of the manuscript from from an opaque form like an MS-Word file (or indeed a stack of hand-written sheets) into a semantically rich representation such as JATS XML. From there, actual typesetting into HTML or a pretty PDF can be largely automated.
So: what does it cost to typeset a manuscript?
First data point: I have heard that Kaveh Bazargan’s River Valley Technologies (the typesetter that PeerJ and many more mainstream publishers use) charges between £3.50 and £9 per page, including XML, graphics, PDF generation and proof correction.
Second data point: in a Scholarly Kitchen post that Kent Anderson intended as a criticism of PubMed Central but which in fact makes a great case for what good value it provides, he quotes an email from Kent A. Smith, a former Deputy Director of the NLM:
Under the % basis I am using here $47 per article. John [Mullican, a program analyst at NCBI] and I looked at this yesterday and based the number on a sampling of a few months billings. It consists on the average of about $34-35 per tagged article plus $10-11 for Q/A plus administrative fees of $2-3, where applicable.
Using the quoted figure of $47 per PMC article and the £6.25 midpoint of River Valley’s range of per-page prices (= $9.68 per page), that would be consistent with typical PMC articles being a bit under five pages long. The true figure is probably somewhat higher — maybe twice as long or more — but this seems to be at least in the same ballpark.
Third data point: Charles H. E. Ault, in a comment on that Scholarly Kitchen post, wrote:
As a production director at a small-to-middling university press that publishes no journals, I’m a bit reluctant to jump into this fray. But I must say that I am astonished at how much PMC is paying for XML tagging. Most vendors looking for the small amount of business my press can offer (say, maybe 10,000 pages a year at most) charge considerably less than $0.50 per page for XML tagging. Assuming a journal article is about 30 pages long, it should cost no more than $15 for XML tagging. Add another few bucks for quality assurance, and you might cross the $20 threshold. Does PMC have to pay a federally mandated minimum rate, like bridge construction projects? Where can I submit a bid?
I find the idea of 50-cent-per-page typesetting hard to swallow — it’s more than an order of magnitude cheaper than the River Valley/PMC level, and I’d like to know more about Ault’s operation. Is what they’re doing really comparable with what the others are doing?
Are there other estimates out there?
In my blog-post announcing Haestasaurus as the new generic name for the misassigned species “Pelorosaurus” becklesii, I briefly surveyed the three phylogenetic analyses in the paper. Of the third — the one based on the Mannion et al. (2013) Lusotitan matrix using both discrete and continuous characters — I wrote that it …
… recovers Haestasaurus as a titanosaur — as sister to Diamantinasaurus and then Malawisaurus, making it a lithostrotian well down inside Titanosauria.
My mistake! I was working from the result of an earlier version of that analysis. In the final version included in the paper, things are rather different:
![Fig 17. Strict consensus tree (LCDM). A strict consensus tree based on the 17 most parsimonious trees generated by analysis of the Mannion et al. [18] LCDM with the revised scores for Haestasaurus and the addition of six new characters. GC values (multiplied by 100) are shown in square brackets for all nodes where these values are greater than 0. Abbreviations: Brc, Brachiosauridae; Dd, Diplodocoidea. N.B. the tree topology shown here means that the clades defined by Brachiosaurus+Saltasaurus (Titanosauriformes) and Andesaurus+Saltasaurus (Titanosauria) are identical. See main text for details.](https://svpow.files.wordpress.com/2015/06/journal-pone-0125819-g017.jpeg?w=480&h=599)
Upchurch et al. (2015: Fig 17). Strict consensus tree (LCDM).
A strict consensus tree based on the 17 most parsimonious trees generated by analysis of the Mannion et al. [18] LCDM with the revised scores for Haestasaurus and the addition of six new characters. GC values (multiplied by 100) are shown in square brackets for all nodes where these values are greater than 0. Abbreviations: Brc, Brachiosauridae; Dd, Diplodocoidea. N.B. the tree topology shown here means that the clades defined by Brachiosaurus+Saltasaurus (Titanosauriformes) and Andesaurus+Saltasaurus (Titanosauria) are identical. See main text for details.
[Side-note: Upchurch et al. (2015) uses phylogenetic definitions that I’m not crazy about. I prefer the arrangement that I followed in my brachiosaur paper (Taylor 2009), in which Titanosauriformes = Brachiosauridae + Titanosauria is a node-stem triplet. Hopefully, some time soon, the wretched PhyloCode will finally be implemented, and we’ll be in a position to nail down a single set of definitions for the whole community to use.]
Anyway, the upshot of all this is that all three phylogenetic analyses in the paper return Haestasaurus as a pretty basal macronarian, and on the balance of evidence it’s likely not a titanosaur after all. (That’s why the name “Haestatitan“, which was in some earlier drafts of the paper, was changed to Haestasaurus. Kind of a shame, given how mundane -saurus names are, but probably the wisest course of action.)
What is the takeaway lesson from this? It’s not just “Haestasaurus is not a derived titanosaur”. It’s that all our phylogenetic hypotheses are just that — hypotheses. Papers that publish only a single cladogram are always at risk of being misinterpreted as conveying much more certainty than they really do, and Paul and Phil are to be commended for including the whole messy story in this paper. The position of Haestasaurus shifts around far too easily for us to have a strong sense of what it is, and it’s good that the paper makes that clear.
(It also makes me glad that way back in Taylor and Naish (2007), I and Darren didn’t give a more precise position of Xenoposeidon than that it’s probably some kind of neosauropod. And even that is not something I would put money on.)
References
- Mannion, Philip D., Paul Upchurch, Rosie N. Barnes and Octávio Mateus. 2013. Osteology of the Late Jurassic Portuguese sauropod dinosaur Lusotitan atalaiensis (Macronaria) and the evolutionary history of basal titanosauriforms. Zoological Journal of the Linnean Society 168(1):98–206. doi:10.1111/zoj.12029
- Taylor, Michael P. 2009. A re-evaluation of Brachiosaurus altithorax Riggs 1903 (Dinosauria, Sauropoda) and its generic separation from Giraffatitan brancai (Janensch 1914). Journal of Vertebrate Paleontology 29(3):787-806.
- Taylor, Michael P. and Darren Naish. 2007. An unusual new neosauropod dinosaur from the Lower Cretaceous Hastings Beds Group of East Sussex, England. Palaeontology 50(6): 1547-1564. doi:10.1111/j.1475-4983.2007.00728.x
- Upchurch, Paul, Philip D. Mannion and Michael P Taylor. 2015. The Anatomy and Phylogenetic Relationships of “Pelorosaurus” becklesii (Neosauropoda, Macronaria) from the Early Cretaceous of England. PLoS ONE 10(6):e0125819. doi:10.1371/journal.pone.0125819