[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Consistency index was Re: Clarification of scope of paleoart->uses

 Sure it is. If they're in the matrix, they're already scored for
 the N taxa, and the person adding taxon N+1 only has to score it
 once, not N+1 times. You've already done the work, so why throw it

I'm not saying throw it away. Mention in the text what autapomorphies you've discovered that were previously unknown. Just why put this information into the matrix?

(This is assuming that you only use parsimony. Bayesian analyses do use parsimony-uninformative characters to help determine the model of evolution.)

> Keeping them in has disadvantages. It makes your matrix appear
> bigger than it is (...impressive as it is, of the 720 characters in
> the supermatrix by Sigurdsen & Green [2011] only 335 are
> informative; no surprise, because they only kept those 25 taxa, out
> of something like 110 or 120, that are represented in all three
> input matrices...) [...]

 So state in your abstract how many of the characters are

Great idea. Nobody does it.

> [...] and it increases the CI. Fine, PAUP* will give you the CI
> with and without parsimony-uninformative characters, but it seems
> to be normal to report the former instead of the latter and thus
> make the trees look more robust than they are. And of course, the
> bigger a matrix, the more opportunities there are for glitches.

 A side-question: does anyone pay attention to CI? (In practice, it
 seems to be basically a measure of how small the matrix is.) If any
 number can top the Impact Factor for uninformativeness, it's surely
 the CI.

I pay attention to the CI.

If it's insanely high, like the 0.8 to 0.9 of Sereno's early analyses, this is a good reason to suspect that the characters were cherry-picked (deliberately or just by laziness!) to support the authors' pet hypothesis or that other manipulations were going on.

If it's low for the size of the matrix, like the 0.49 of McGowan (2002, Zool. J. Linn. Soc., albanerpetontids and origin of lissamphibians) for a matrix of 19 ingroup taxa and 41 characters, that shows that the matrix is "balanced" and not (or not much) biased towards any particular hypothesis, even though it's so tiny that one should expect random imbalances from this alone.

Finally, if it's insanely high but manipulation would be a very unparsimonious assumption, I am suitably impressed. The case I've seen is Rexová et al. (2003, Cladistics). That's an analysis of a matrix with 85 Indo-European languages and 200 meanings. These meanings are taken from a standardized list of 200 meanings that are considered "core vocabulary" (words that are probably less easily borrowed than most others -- body parts, basic kinship terms, personal pronouns...). The aim of that study was to show that vocabulary data alone, without data from grammar or from the sound system, are enough to reconstruct the phylogeny of languages to a useful degree. Some historical linguists had claimed that only morphology (grammar at the word level) is of any use, which would mean that the phylogeny of families of isolating languages (which lack grammatical endings or the like) would be impossible to reconstruct; the CI of 0.84 proves them wrong. Indeed, this incredibly high CI makes me think that core vocabulary could be used to look for relatives of Indo-European, something very few people have ever attempted and some, perhaps many, consider completely futile.