[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Consistency index was Re: Clarification of scope of paleoart->uses

On 17 March 2011 12:45, David Marjanovic <david.marjanovic@gmx.at> wrote:
>>  Sure it is. If they're in the matrix, they're already scored for
>>  the N taxa, and the person adding taxon N+1 only has to score it
>>  once, not N+1 times. You've already done the work, so why throw it
>>  away?
> I'm not saying throw it away. Mention in the text what autapomorphies you've
> discovered that were previously unknown. Just why put this information into
> the matrix?

So others using the matrix have it right there, of course.  (I know I
don't need to tell you this but) a published paper is not an eternal
work carved in marble, it's a foundation that future workers can build
on.  So let's make their jobs as simple and error-proof as we can.
(This also of course means making the matrix available as a Nexus file
or its equivalent, not just as text in the PDF, or in a supplementary
MS-Word document.)

> (This is assuming that you only use parsimony. Bayesian analyses do use
> parsimony-uninformative characters to help determine the model of
> evolution.)

Well, sure -- the fact that something is parsimony-uninformative only
means that it's uninformative for parsimony, and may well be useful
for other purposes.  Bayesian analysis is a GREAT example.  Suppose I
am suspicious of the phylogeny obtained by Halibutwrangler (2011)
using parsimony.  One of the things I might want to do is run his
matrix through a Bayesian analysis and see how the results differ.
Halibutwrangler has done me no favours if, because some of his data
doesn't affect ONE possible analysis, he's thrown it away.

>> > Keeping them in has disadvantages. It makes your matrix appear
>> > bigger than it is (...impressive as it is, of the 720 characters in
>> > the supermatrix by Sigurdsen & Green [2011] only 335 are
>> > informative; no surprise, because they only kept those 25 taxa, out
>> > of something like 110 or 120, that are represented in all three
>> > input matrices...) [...]
>>  So state in your abstract how many of the characters are
>>  parsimony-informative.
> Great idea. Nobody does it.

Well, nobody retains parsimony-uninformative characters in their
matrices, either!  Let's fix BOTH these mistakes.

More generally -- and here I speak wearing the hat of my day-job as a
software engineer -- it's NEVER a good thing to throw data away.  You
just don't know when you're going to need it, or when someone else
will, or what for.  You can always run analysis that ignores the parts
of your data-set that you don't care about but have retained; but you
can't run one that uses the parts you (or someone else) have

(And of course in the specific case of parsimony-uninformative
characters, it's in their very nature that you can just go ahead and
run your parsimony analysis with them included, and they won't hurt --
or indeed have ANY effect.)

>> > [...] and it increases the CI. Fine, PAUP* will give you the CI
>> > with and without parsimony-uninformative characters, but it seems
>> > to be normal to report the former instead of the latter and thus
>> > make the trees look more robust than they are. And of course, the
>> > bigger a matrix, the more opportunities there are for glitches.
>>  A side-question: does anyone pay attention to CI? (In practice, it
>>  seems to be basically a measure of how small the matrix is.) If any
>>  number can top the Impact Factor for uninformativeness, it's surely
>>  the CI.
> I pay attention to the CI.

I should have seen that coming :-)

> If it's insanely high, like the 0.8 to 0.9 of Sereno's early analyses, this
> is a good reason to suspect that the characters were cherry-picked
> (deliberately or just by laziness!) to support the authors' pet hypothesis
> or that other manipulations were going on.
> If it's low for the size of the matrix, like the 0.49 of McGowan (2002,
> Zool. J. Linn. Soc., albanerpetontids and origin of lissamphibians) for a
> matrix of 19 ingroup taxa and 41 characters, that shows that the matrix is
> "balanced" and not (or not much) biased towards any particular hypothesis,
> even though it's so tiny that one should expect random imbalances from this
> alone.

Here's my real problem with CI: that you have to talk in fluffy terms
like "low for the size of the matrix".  I don't uinderstand why we're
all using a metric that is so sensitive to matrix size.  We should
have some kind of normalised CI that is independent of matrix size.
(Maybe if we did, then I and others would pay more attention to that

-- Mike.