# Re: Cladistics (was Sci. Am. - present)

```A collection of say animal species may differ in some of their
characteristics; we want to relate them together in a tree, and say where in
the tree the changes took place.

The principle of parsimony in cladistics tries to minimise the number of
changes (roughly speaking) required to explain the differing
characteristics.

Where the number of specimens is large compared to the number of changes,
and where the number of ?changes back? is small, the most parsimonious
solution is likely to be fairly correct.  Such a situation might correspond
to a dozen closely-related vertebrates taken over a five million year
stretch, or gently mutating viruses being relatively frequently examined.

However, as we depart form scenarios such these, the truth becomes
increasingly complex.  Characteristics have time to revert, and we sometimes
have just a few fossils interspersed by many millions of years.  Sometimes
an event (such as loss of flight) has actually occured on a separate
occasion for the majority of the flightless fossils in our group.  In cases
such as these the simplest solution is not what actually happened.

In those circmstances, the best we could do would be to guess an expected
number of reversions/changes etc over the time (v.diff) and select from that
solution space.  This would be more accurate than simply *minimising* the
assumed number of changes, but of course the possible solutions will grow
exponentially with the number of changes, and each will have similar and
vanishingly small probabilities.  Sometimes our data will be so sparse that
the number of changes that actually occurred exceeds the number of fossils.
There will come a time where there just isn?t enough information available
for the algorithm to have any chance of finding the correct solution.  Under
these conditions it will soflty and silently drift away from reality,
leaving behind only an impressive set of maximums, likelihoods etc for our
delectation - and confusion.

Where the simplest solution (consistent with the data as it appears) is the
right one, cladistics will find it.  However this is rather like the stopped
clock being right twice a day.

On page 36 of Sci. Am Feb ?98 we have a representation of a cladogram with
six individuals, spanning over seventy million years.  (We can nearly halve
that if we arm-wave the Velociraptor back into the Jurassic.)  I still have
just enough time for cladistics to be able to say the exercise didn?t do
justice to the process.

> How would you "improve" a parsimony analysis? <
Chris Brochu, 21
Feb 98

By feeding the best guess for parsimony into the algorithm, which isn?t
always the maximum.  Even then, as I?ve argued above, this won?t usually
help much; it?s begging a lot of the question anyway.  Occam?s razor is fine
but  part of the skill in wielding a razor is knowing when not to use it.
Outside maths and physics it is always wrong to stick rigidly to any
principle.  That?s just another principle, but I trust it better than the
parsimony one.

> Given the principle of total evidence, the hypothesis <
> that best matches all the available data is the preferred one . . . <

The problem for cladistics is in that word ?best?.

> - are you advocating some sort of significance test for
character data? If so, on what objective basis would this be based? <

We?re trying to find something that works.  Cladistics works a bit (although
I?m rather surprised someone hasn?t knocked it on the head as an infallible
tool once and for all by now).  By using characters we have some reason
to believe will render valid the assumptions it has to make, we may get
some mileage out of it.  The best characters are those that aren?t likeley
to have changed much under any of the reasonably possible scenarios.

Palaeontology is a soft-nosed science whether we like it or not.  Like the
Wizard of Oz, on closer inspection, cladistics is not quite magic after all.

John V Jackson    jjackson@interalpha.co.uk

?I know what faith is and what it?s . . . . . woooooerrrrrrrrth!?

```