Smith et al.'s (2007) Cryolophosaurus analysis is untrustworthy due to a lack of coding

I've complained before about the recent trend of not coding included taxa in 
theropod matrices (as in the Limusaurus and Haplocheirus analyses), but here's 
a explicit and quantified example.  I post it here since I think this is an 
important issue in our field that needs to be confronted and addressed.

The paper is Smith et al.'s (2007) influential basal theropod analysis in the 
Cryolophosaurus monograph.  Holtz said here on the DML that it was "truly good 
stuff, and I strongly suspect that they have better captured the actual 
phylogeny of basal theropods than most previous studies."  The paper includes 
Makovicky and Currie as coauthors- two people who know their stuff and have 
access to specimens.  There's no excuse to make obvious mistakes.

The taxon is Marasuchus, the outgroup of the analysis.  This basal 
dinosauriform includes the more complete specimens once referred to Lagosuchus 
and has been described by Bonaparte (1975) and Sereno and Arcucci (1994) in 
addition to the briefer original descriptions by Romer (e.g. 1972).  This is a 
classic OTU for dinosaur analyses, used in papers by Rauhut, Ezcurra, Novas and 
others.  So there's plenty of information on the taxon available in easily 
accessed journals like JVP.

What follows are Smith et al.'s codings compared to my codings, divided by 
anatomical section.  Note somebody with access to the specimens themselves 
would be able to code even more than I could. 

Smith et al.- ?????????? ?????????? ?????????? ?????????? ?????????? ?????????? 
?????????? ?????????? ?????????? ???0???00? 00??-???00 ?0???????? ?????????? 
?????????? ?
Me- ?0???????? ?????????? ?0?????00- ?0??10???? ?????????? ?????????? 
?????????? ?????????? 0????????0 0??????00? 0000-?1?01 ?0???????? ?????????? 
?????????? ?

Cranially, Marasuchus is known from a maxilla and braincase.  Note the first 
section of coded characters in my row pertains to the maxilla, which went 
completely uncoded by Smith et al..  The second section pertains to the 
braincase, and here I was also able to code more.  Altogether Smith et al. 
coded 9 characters, while I coded 22.

Smith et al.- ?00?????- ??0?0??0?0 0?00????0? ?????????0 0???-??0?? ?0??????0? 
Me- 0110?0--- 0-0?10000- 01000??010 -?00010?00 100?1010?0 0p00000000 00???

Axially, Smith et al.'s laziness really shows.  Marasuchus preserves an almost 
complete vertebral column, yet Smith et al. only coded a few of the 
characters.  What's confusing too is that it's not the obvious characters which 
were coded.  Things like axis, cervical and dorsal pleurocoels absent, and 
amphicoelous cervical centra should be second nature to code for anyone even 
vaguely familiar with the taxon, but then there are characters like "cervical 
prespinal fossa narrow" which WERE coded.  Now having a wide prespinal fossa is 
an abelisaurid character, so nobody describes the state in something like 
Marasuchus, and Marasuchus' vertebrae have only been illustrated laterally as 
far as I know (not anteriorly, as you'd need to see the prespinal fossa).  It's 
not that I doubt Marasuchus would have a narrow fossa if examined, but since 
almost every non-coelurosaur with preserved cervicals is coded for this obscure 
character, and the matrix certainly doesn't show signs of rigorous coding in 
general, I'm suspicious.  Another issue is that some of the uncoded characters 
are important to code in Marasuchus, like the absence of hyposphenes or the 
presence of only two sacrals.  Again these are things anyone with even a 
passing interest in dinosaur origins would be aware of, as they are classic 
characters excluding it from Saurischia and Dinosauria respectively.  Without 
coding Marasuchus, the state "2 sacrals" is useless, as all other taxa have 
more (even the miscoded Saturnalia).  In the axial area, 17 characters were 
coded by Smith et al., but 54 could be coded by me.

The two integumentary characters cannot be coded, of course.

Smith et al.- ??? ?????????
Me- ??? 10010?000

In the pectoral girdle, Marasuchus preserves a scapulocoracoid.  Smith et al. 
didn't bother coding it at all.  Even obvious characters like the broad 
scapular blade, which was explicitly noted by Sereno and Arcucci to be an 
autapomorphy of the taxon.  So 0 coded by them, and 8 by me.

Smith et al.- ? ????????0? ?????????? ?????????? ?
Me- 0 00p?000?0? ?????????? ?????????? ?

In the forelimb, Marasuchus preserves a humerus, radius and ulna.  Smith et al. 
bothered to code one character- radius over half humeral length.  At least it's 
an obvious character this time.  That's 1 coded by them, and 8 by me.

Smith et al.- ????????? 0??0??1p00 00-00??0?? ?00?00???? ?
Me- 0000010?1 00020p0100 ---00p?000 --00?0?00? 0

Marasuchus preserves an essentially complete pelvis.  This situation is rather 
like the axial skeleton.  Again, obvious characters are left uncoded- propubic 
pelvis, short preacetabular process, no post-obturator notch.  And again, some 
are important.  If Marasuchus isn't coded as lacking a brevis fossa, why even 
have the character?  Everything else in the matrix has one (except 
Confuciusornis, which is nonsensically coded as inapplicable), so without 
coding Marasuchus the character's useless.  Of pelvic characters, Smith et al. 
code 16 and I code 35.

Smith et al.- ??0000000 0000001?0? 0?0????000 00000000?0 0000???0?0 00r????
Me- 001000000 000000100- 0000010000 0101000000 0000001100 001?0??

Finally, a decently coded area.  There are certainly some absences, like the 
obvious anteromedially oriented femoral head and absent fibular crest of the 
tibia, and none of the fibular characters are coded.  But overall it looks like 
someone actually tried in this area.  Smith et al. code 38 and I code 53.

In all, Smith et al. coded 81 characters while I coded 180.  That leaves 91 
characters uncoded. So they only coded 45% of what was possible using the 
literature, and an even smaller percentage of what's possible with the 
specimens in hand.  You might not think it's important to code the outgroups, 
but you'd be wrong.  The major conclusion of this paper was that Crylophosaurus 
and other dilophosaurs were closer to neotheropods than coelophysoids, but this 
depends on having the polarity for characters in basal Avepoda correct.  I can 
tell you now that even though I haven't worked my way through most of the 
matrix yet, just adding the codings for Marasuchus, Silesaurus and some for 
Herrerasaurus has changed the results to give a huge polytomy in basal Avepoda 
between coelophysoids, Zupaysaurus, dilophosaurids and neotheropods.  Who knows 
how that will change though, as I note that "Dilophosaurus" sinensis wasn't 
coded at all postcranially, for instance. 

I'd honestly like to know how this happens.  This isn't some obscure foreign 
paper by ignorant beginners, it's a landmark paper in a high tier journal by 
experts in the field.  Yet what I've described here is unacceptable.  If you're 
publishing a phylogenetic analysis, please code your taxa.  If you're 
reviewing/editing a paper, please check a taxon or two in the matrix.  And if 
you find uncoded taxa, send the paper back.  Because coding only half the 
available data makes the resulting cladogram worthless.

Reference- Smith, Makovicky, Hammer and Currie, 2007. Osteology of 
Cryolophosaurus ellioti (Dinosauria: Theropoda) from the Early Jurassic of 
Antarctica and implications for early theropod evolution. Zoological Journal of 
the Linnean Society. 151, 377-421.

Mickey Mortimer
The Theropod Database- http://home.comcast.net/~eoraptor/Home.html