Unstated precision and undemonstrated accuracy: two more reasons why we don’t trust DinoMorph

June 2, 2009

Because the appearance of accuracy has an irresistible allure, non-specialists frequently treat these estimates as factual.

–Graur and Martin (2004: p.80)

Prologue: Why We Hatin’?

Between the first DinoMorph post and this one, it may seem like we have it in for DinoMorph, like we’re trying to discredit the method or bury it. We’re not anti-DinoMorph at all. We really want it to work, because 3D modeling is probably going to be the only way to explore some problems we care about  (like the breathing mechanics of an articulated sauropod torso), and so far DinoMorph seems to be farther along than any of the alternatives. It is also worth remembering that building 3D digital dinos for scientific purposes is still in its infancy, and that the VP community has barely gotten started exploring the possibilities. The field has great promise. But we also have to be realistic about limitations in the source data (see Mike’s post) and about the accuracy and precision of the results (this post). We hope that these posts will start constructive conversations and inspire more work to improve the science.

Intro: Accuracy and Precision

Accuracy is how close to the real value a measurement is, and precision is how close repeated measurements are to each other. Say it’s 100 degrees F outside, which it may be for some of you. If you have four thermometers and they read 90, 95, 105, and 110, then the mean is 100. The accuracy of the aggregate setup is high, but the precision is low (big error bars). If, on the other hand, your thermometers read 94.2, 93.8, 94.6, and 93.4, then they are precise (tight grouping) but inaccurate (not centered on the real value)

Oh Error Bars, Where Art Thou?

Here’s what 2 degrees (angular, not temperature) looks like:

two degreesIt’s not a big measurement. If I was measuring the range of movement (ROM) of a single joint in one individual, like an elbow or shoulder, and I got a precision of plus or minus 2 degrees over repeated movements, I’d be pretty happy. If I got that level of precision on, say, the left knee, in ten different people, I’d start worrying that I was in the Matrix.

All eusauropods have at least 12 cervical vertebrae, and diplodocids have at least 15 (Barosaurus probably has 16, but there are no complete necks so it’s hard to be sure). What happens if we propagate an error of plus or minus 2 degrees down the neck of Diplodocus?

Diplodocus 4 degree rangeNone of these are supposed to correspond to any particular pose in life. I just lined up all the cervicals as straight as I could get them, and then rotated each joint between C3 and C15 by 2 degrees. I left the occipital condyle and C1-C3 in a straight line because I felt the point was made, but the head could be rotated up or down by another 6 degrees if one so chose. Again, this is not an ROM, this is just an error of plus or minus 2 degrees across each of 12 intervertebral joints.

Now let’s look back at the neutral pose and estimated ROM of the neck in the CM 84/94 composite skeleton of Diplodocus (Stevens 2002: fig. 6a):


Notice that the model poses are shown with perfect precision, and no allowance for error. Now, look back up at the first picture to get an idea of what 2 degrees of error looks like, and then try to mentally apply it to each of those three poses. It’s not easy to picture, but in my mind’s eye the three neck poses dissolve into a fuzz of probabilities, like the electron cloud around the nucleus of an atom.

How precise is DinoMorph? Or rather, given that the guts of the program probably allow for Jupiter flyby levels of precision, how precise is any given result, based on the interaction of raw data, necessary but unverified controlling assumptions (see below), and the algorithm itself? Can we really rule out an error of plus or minus 2 degrees per joint? What about 1 degree per joint? What about 5? This is a problem of precision, and it would still exist even with an absolutely perfect neck that was 100% complete and entirely undistorted (which we ain’t got).

It’s possible that the current version of the program doesn’t allow these kinds of error calculations. That’s fine–I realize that DinoMorph, like all of science, is a work in progress. But I’d like to know up front that there is no provision for determining the precision, so I could delay asking the question. And at some point, it will have to be answered.

Maybe it would be better to shift gears and ask: when DinoMorph is applied to extant animals, does it accurately predict the neutral pose and ROM?

Ground Truthiness

It might be better to ask that question, but there are no published answers. From the first DinoMorph paper, where the method is justified (Stevens and Parrish 1999: p. 798):

Our manipulation of muscle and ligament preparations of extant bird necks indicated that synovial capsules constrain movement such that paired pre- and postzygapophyses could only be displaced to the point where the margin of one facet reaches roughly the midpoint of the other facet, at which point the capsule is stretched taut (20). In other words, one facet could slip upon the other until their overlap was reduced to about 50%. In vivo, muscles, ligaments, and fascia may have further limited movement (20); thus, the digital manipulations reported here represent a “best case” scenario for neck mobility.

The reference supporting all this is number 20 (remember how much I like numbered references?), and here’s the full text (Stevens and Parrish 1999: p. 800):

20. J. M. Parrish and K. Stevens, unpublished data.

Those data are still unpublished. But at least one of the basic assumptions–the 50% zyg overlap bit–is contradicted by Stevens and Parrish (2005b: p. 191 [not to mention by Taylor et al. 2009]).

It’s been a decade. There have been three subsequent papers on this stuff (Stevens 2002, Stevens and Parrish 2005a, b). The DinoMorph results have been the foundation for sauropod depictions in the biggest dinosaur documentary ever made and for an exhibit at the biggest natural history museum in the world. And we have no idea if the method is accurate, because the supporting data have never been published.

Sadly, this is not that uncommon in paleontology, particularly when it comes to sauropods, and especially when it comes to necks. Someone comes up with a totally new method, and right out of the gate it gets applied to a thorny paleontological problem, before it’s been demonstrated to work on extant animals. It’s exciting, it’s seductive, and it’s hard to screw up, because when you apply an unproven method to an unsolved problem, it’s impossible to get the wrong answer. In fact, the results are “not even wrong“; it’s impossible to get an answer of any value whatsoever, because there is no way of judging its correctness.

In contrast, the work of Christian and Dzemski (2007) on neck posture in Brachiosaurus warrants serious consideration, not because of the particular answer they got for Brachiosaurus, but because they got the right answers when they applied their method to extant long-necked animals (ostriches and camels; Dzemski and Christian 2007). Don Henderson and Ryosuke Motani, among others, have also been religious about ground-truthing their methods on extant animals before applying them to fossil taxa. That shouldn’t be  exceptional. It should be expected. It should be the minimum requirement for being included in the discussion.

Conclusion: Let’s move forward

I can’t accuse the makers of Walking With Dinosaurs or the designers of Dinosaurs: Ancient Fossils, New Discoveries of drinking the DinoMorph Kool-Aid. I don’t know that it is Kool-Aid. It might be fine wine. There’s red stuff in the cup, but no one has tasted it.

If you get nothing else from this post, please understand that I’m not saying the results of DinoMorph are either good or bad. I’m saying that there is currently no objective way of knowing. I want DinoMorph to work, but I want a DinoMorph made rigorous by the publication of supporting data from extant animals demonstrating its accuracy, and ranges of error demonstrating its precision.

If someone has a novel method they want to apply to dinosaurs or any other extinct animal, the burden of proof is on them to show that the method works. And if that evidence is not forthcoming, you–reviewers, editors, readers, science journalists, museum exhibit designers, documentary producers, netizens, laypeople–have the right to ask for it. And until you get that supporting evidence, you don’t have to take the results of the method seriously. Asking “how do you know that?” is the basis of science; it ought to be reflexive.

In the immortal words of Tom Holtz, “Sorry if that makes some people feel bad, but I’m not in the ‘make people feel good business’; I’m a scientist.”


7 Responses to “Unstated precision and undemonstrated accuracy: two more reasons why we don’t trust DinoMorph”

  1. Nathan Myers Says:

    This article brought tears to my eyes, it’s so good.

  2. David Marjanović Says:

    So Stevens & Parrish (1999) counts as a failure of peer review.

    Most reviewers don’t read data matrices or even character lists. Looks like most reviewers don’t read numbered references either.

  3. Matt Wedel Says:

    We-e-ell. I would be more inclined to be lenient with the first publication out of the gate if the uncertainties of the method had been made explicit, or if it had been clearly presented as exploratory data analysis instead of THE new answer.

    I do think that if a new paradigm is being announced the pages of Science, the bar for validating the method ought to be higher (i.e., there should be such a bar).

    The best thing that could happen now is for the authors to publish the supporting data, turn that possible Kool-Aid into definite wine so we can get on with things.

  4. Darren Naish Says:

    David says…

    Most reviewers don’t read data matrices or even character lists.

    I review loads of papers – way more than I should (like, three or four a week). It takes me ages, and in fact it takes up so much time that I hardly ever have time to properly check the character lists, data matrices, or references (I do always point this out to the handling editor). You might say that this is what reviewers are meant to do, but there is a limit on how much time you can spend on someone else’s work.

  5. Mike Taylor Says:

    Reviewing three or four papers a week can not possibly be normal. Can others chime in with their frequencies? It’s hard for me to imagine that anyone should ever be asked to do more than say two a month.

  6. Well, Darren has his editorial responsibilities with K Research and possibly others.

    I’ve never actually calculated my average: I wouldn’t be surprised if it came out to approximately one every two weeks averaged out over the year (maybe more, but not twice that). Something on the order of 20-40 MSs a year, barring situations like doing an NSF stint or the like.

    Of course, they always tend to fall in clumps, with three or four at once, than a dry spell of a month or more.

  7. […] break in here and point out that the same is true for pers. obs., unpubl. data, in prep., and other citations that don’t point to resources available to the reader: IF […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: