[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Learning cladistics (was Re: Dinosaur Web Pages' Re-Opening)

        (I fear we stray too far from dinosaurs. I also fear I may be out a
little too far on a limb here, and I invite various and sundry folks smarter
and more learned than myself to correct me.)
I wrote:
>... Given that, the explanation which requires the least numer of
>unsubstantiated assumptions, and the fewest untestable hypotheses,
>is most likely to be the correct one.

        First off, I feel the need to emphasize that a cladogram is *not*
necessarily a trace of the path of evolution of various organisms (a point
made, I believe, by Brusca and Brusca (1990)), but rather a graphical
representation of the hypothetical relationships of a set of organisms. The
metric for these relationships, and specifically for "closeness of
relationship", as set out by phylogenetic systematics, is *relative* recency
of a common ancestor. Thus, Newick notation (nested elipses) might be a far
better way of expressing the information content in a cladogram, excepting
the power oinherent in the graphical nature of cladograms ("a picture is
worth a thousand words").
        Mapping character transformations on a cladogram is a function of
using a cladogram to discerne the relative order and spacing of evolutionary
novelties, but is not (contra Fastovsky and Weishampel's textbook) the true
meaning of a cladogram. The use of a cladogram as a presentation of the true
path of evolution (ie, who evolved FROM whom, etc.) is at best a second
order function of the study which produced the cladogram, and questioning
the validity of such hypotheses does not directly affect the validity of the
study itself.

Stan (the man) Friesen writes:
>True, but the question is, are character transitions the only hypotheses
>implied by a cladistic analysis?
        No. As stratocladists point out, assumptions of the fragmentary
nature of the fossil record are also implied. However, since these
hypotheses are easily demonstrated to be more likely than not, this is not,
IMHO, such a far-out concept. Certainly, the stratocladistic approach seems
to me (please bear in mind I have not begun researching the subject in
sufficient depth to make more than a VERY tentative assessment) to be
makeing the far less parsimonious conclusion that the fossil record *might*
be complete enough for stratigraphy to provide sufficient evidence. On the
other hand, Marine habitats are not my forte, and their assumptions may work
in those circumstances. Still, whether they acknowledge it or not, this
presumes both a "vertical" and "horizontal" completeness which does not
sound at all appropriate for studying, say, dinosaurs.

>I think not, it also implies things like ghost lineages,
        Ghost lineages are a *result* of the application of parsimony, not
an initial assumption per se. The assumption they might be constured to
result from is the assumption of incompleteness in the fossil record. As I
note above, stratocladists seem to be attempting to test this assumption
(amongst other agendas). However, it does not take very complex algebra to
determine that we simply do not have enough of a fossil record preserved for
terrestrial vertebrates to falsify the concept of ghost lineages (and, if we
did, would we need the concept?). Continued discoveries may however, support
the concept (and I believe they shall).
        Consider, for example, that non-avian dinosaurs, until their
currently hypothesized relationship to Aves was understood, might be thought
of as representing the ghost lineage of the birds after their stock diverged
from Pseudosuchia.

        Dichotomies are no longer a rigid assumption of cladistics. Period.
        It is likely, however, that as one views the history of life at a
wider and wider scope, the relations amongst *major* taxa will be
dichotmous. This is due to the incompleteness of the fossil record. For
example, we are not likely to even have evidence of one taxon which resulted
from any speciation event, no matter how many species split off, much less
are we likely to have two or three. Indeed, the times when we have evidence
of any at all are reasonable only when one of the decendant species gave
rise to a major radiation. The likelihood of any one member of a speciation
event being the ancestor of a major group of animals is slim as it is. The
odds of more than one member of the speciation event giving rise to a major
(monophyletic) taxon are substantially less.
        However, those species which give rise to many decendants are more
likely to be represented in the fossil record. Our picture then becomes one
of tracing the major groups back along the "stem" from which they sprang,
which is sooner or later going to lead to the "stem" of very likely only one
other group. The odds seem good (and no doubt get better with braodening
scope) that no more than two species resulting from a speciation event later
became large groups which were likely to be preserved.
        Thus, in the broader view of the tree of life, our overall picture
(which is strongly influenced by the odds of preservation) will be skewed
towards a dichotomous scheme, even though speciation events themselves may
not be dichotomous.
        On a smaller scale, multichotomies may actually better represent the
relationships amongst organisms (as Dr. Holtz has already with a glad heart
conceeded). While Mayr may have a point in that we may be mislead,
especially at the species level, concerning the tree topology in multiple
speciation events, this is an issue which I hardly think will be resolved
satisfactorally by any other method.

>For instance, *real* evolution almost certainly is NOT
>restricted to just dichotomous branching - species tend to bud
        In the first place, I have dealt with polychotomy above. Especially
as regards the fossil record, in which many of these buds may not be
        Also, I should point out that the idea of "species budding" is
incompatible with the Biological Species Concept. It seems clear that the
BSC dicatated that the departure of a "coherent" body of genetic material
(one or more populations) from the interbreeding whole nullifies the species
as defined ("a group of populations linked by continuous interbreeeding").
Thus, although the larger collection of populations may not be affected
morphologically, it is BY DEFINITION a new species just as the
"off-branching" population(s) is/are. Note that this is a fundamental result
fo the fact that the BSC is based on *breeding dynamics* and NOT morphology.
        BTW: I note here that such a strictly interpreted biological species
is not paraphyletic in the full sense of the term. As it excludes all
descendant taxa, not merely some of them, I prefer to denote it as "lineraly
paraphyletic", which is the minimum level of paraphyly required of a taxon
in biology, and seems a far more agreeable sort of paraphyly.
        Obviously, this is not a useful interpretation of biological
species. On the one hand, it requires that either the loss of one member
population of a species be recognized with the creation of a new species, or
that the restircitions of strict interpretation of the BSC apply only to
speciation (which is, of course, an artificial restriction). On the other
hand, Such a species concept is nearly inapplicable to biology, much less to
        Gripping hand is, however, if you expect to speak of species as real
natural units, you must first consider if they are or not. Perhaps you would
re-examine your objections based on "species budding" in this light.

>and the parent species may easily be polymorphous for traits
>that become fixed in descendents
        I do not find this to be a problem with cladistics any more than I
find the recaptulation of phylogeny in ontogeny to be a problem. Both are
considerations, neither is fatal.

>However my real beef is with the lack of statistical testing.
        [The following is merely my opinion. Please, anyone who has
knowledge of this problem (like biological knowledge, not just
computer-modelling knowledge), please speak up!]
        Cladistics is not a statistical procedure. Parsimony is a method of
choosing between phylogenetic trees. It cannot tell you how likely it is
that any one is right, in part because we have no objective way of telling.
This is the same difficulty as in George's desire to test cladistics using
artificially generated trees. In statistics, some of the more basic
confidence interval tests presume that the data are normally distributed, or
at least some statistics are provided from which we may make assumptions
which often prove sufficent. Without a knowledge of the basic structure of
phylogeny, what can we test our trees against? What assumptions can we make?
        Perhaps biology has an answer, but somehow I doubt it. Not that
biologists aren't that good, I just don't think we can reproduce the problem
in all of its complexity and polarity in order to arrive at an agreeable model.
        So parsimony chooses the tree with the minimum number of steps. And
thoughtful cladists will give you a consusus tree of the next most
parsimonious cladogram, as a sort of a confidence test. Beyond that, I'm not
sure we can venture.
>When large
>numbers of characters and/or transitions are involved a difference of two
>or three steps simply is *not* going to be statistically significant, IMHO.
        Does this mean that a two or three step difference in a small
dataset is actually more significant? Or are we assuming a minimum dataset
size. Careful, Stan, I got blasted from all corners last time I suggested
any such thing (and quite rightly, I might add). Once again, you are apply
the model of statistics to what, IMHO, is an inherently unstatistical
procedure. Although the sum of character transformations does affect the
parsimony algorythm, this is not really a mathematical procedure. Each
"step" is an evolutionary event, of uncertain importance, complexity, and
phylogenetic significance.
        For example, in one dataset, the blastopore forming the anus may be
the one character transformation which unites Deuterostoma [sic?], but boy
is it a doozy!

> That is the several "nearly most parsimonious" trees may be effectively as
>likely as the most parsimonious one.
        The point of parsimony is, however, to choose one tree. The "nearly
most parsimonious" trees may be collectively as likely, but, heck, the
collection of all other trees are probably MORE likely that the set of most
parsimonious *and* the second most parsimonious trees. For example, if I
roll two dice, and I want the most likely sum of the two dice, I am looking
to roll a seven. The odds of me rolling this number are (P7) = 6/36.
However, the odds of me rolling either a six or an eight (the next most
likely rolls) are together a P(6 or 8) = 10/36. Yes, it is more likely that
my roll will be either of these two, but individually (and the individual
case is the important one) neither is more likely.
        Did I miss your point?

>Statistically, one should only treat
>a tree as less likely that the most parsimonious one if the difference in
>number of steps is statistically significant, at least at the 95%
>confidence level.
        And how, pray, does one calculate this? I just had Baby Stats, I can
take the math...

>There are apparently statistical tests available for DNA based cladograms,
>but not yet for character based ones.  This is a sore need.
        DNA cladograms, amongst other things, are based on a very limited
set of options, G, A, T, and C. This would seem to give them a much firmer
base to work from. Also, some of the assumptions under which DNA cladists
work are not the same ("molecular clocks" come to mind).
        Beyond that, I cannot say more, as I do not understand DNA
cladistics one little bit.

I wrote also:
>> And given that assumption, the application of
>>the principle of parsimony should only fail to produce the best 
>>hypothesis in the absense of some data.

>Missing species in the analysis.
>Missing characters in known species
>Lack of data on polymorphisms.
>Hmm, there generally seems to be lots of missing data.
        I know of no other method which can account for this. Seriously
though, parsimony is still the best method for develping hyptheses, but lack
of data will inhibit its producing the best possible hypothesis. Missing
data will *always* be a problem.

>My main diffidence is in accepting weakly supported clades as real.  I tend
>to treat them as an over-resolved polychotomy.
        Y'know, this isn't all that bad a method. Despite what I said above
about the "one or two steps" thing, I believe you are perfectly within your
right to evaluate the characters supporting a node and consequently doubt
the validity of the node. Indeed, the best thing you can do is recode or
discard any characters for which you can justfy such a treatment, then
re-run the analysis. This is much better than the Altangerel et al. (the
_Erlicosaurus_ skull paper) treatment of Holtz 1994a, where an entire tree
topology was discarded because of percieved discrepancies in the coding of a
few characters, yet no subsequent re-analysis was performed.

>(Over-resolved because
>cladistic analysis as now performed pretty much forces dichotomies, even
>where multiple budding from a polymorphous species is actually more likely
        I answer this objection above.

>- that alternative is not even considered, so its relative parsimony is not
>easily obtainable).
        As I said above, as a cladogram is not necessarily a model of the
precise evolutionary pathway, this is not necessarily relevant.

>Hmm, this suggests comparing the number of steps implied by a polychotomy,
>where any character change that appears at the base of more than one branch
>is treated as a *single* character change.
        Care to explain? I'm not quite shure I understand...

>May the peace of God be with you.
        Aleikhem Salaam.
        :)        Wagner

    Jonathan R. Wagner, Dept. of Geosciences, TTU, Lubbock, TX 79409-1053
               "Not the One..." -- Zathras (not Zathras)