[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

More about practical phylogenetics

First off, a brief description of the types of tree searches:
For our purposes, we want to enter data (about which we've talked before,
and no doubt will want to talk about again) into a program that searches for
the set of Most Parsimonious Trees.  (Okay, some people look for Maximum
Likelihood Trees, and Bootstrapping and Jackknifing can be useful, but let's
keep things relatively simple).

Trees are searched for by counting the minimum number of evolutionary
"steps" changes to explain the data.  Trees can be represented
mathematically; the observed character situation is the data matrix; and the
program than finds the shortest path to go from (for example) an
hypothesized ancestral condition or an assigned outgroup to the observed
conditions using the tree at hand.  Get the tree length, go to the next
tree.  Keep on going until you find the set of shortest trees to explain
that data matrix.

In principle, it would be best to examine every single possible tree.  This
is an option on every package (Exhaustive Search algorithm in PAUP*, for
instance).  However, the number of dichotomous trees possible increases in a
factorial equation with the addition of more taxa (analogous to adding
destinations in the Travelling Salesman problem).  Because of this,
alternative "faster and dirtier" algorithms are used for bigger matrices,
such as the "Heuristic" and "Branch-and-Bound" options in PAUP*.  In brief,
these find a tree, get its length, change it a bit, get its length, and keep
on going until they stop getting shorter trees (its a bit more complicated,
but if you want more info go to the PAUP* manual or The Compleat Cladist).
Of course, this runs the very real problem of finding local rather than
global minima: because of this, it is standard operating procedure to due
multiple runs with different (often randomized) starting conditions, among
other options.

Regardless of the algorithm, you get out a set of trees in the end.  You can
then examine these trees using tree manipulation/character evolution mapping

In case it is of interest for anyone, here are some of the main software
used in phylogenetic analyses:

According to Felsenstein's Phylogeny Programs website
(http://evolution.genetics.washington.edu/phylip/software.html) there are
over 193 different phylogenetic programs.  However, not all of these search
for optimal trees, and many are designed with aspects of molecular rather
than morphological phylogenetics in mind (lest anyone on the list think
otherwise, a majority (perhaps the vast majority) of phylogenetics done
today is molecular and neontological: we in the vert paleo community are, as
per normal, on the "fringe").

PAUP* (Phylogenetic Analysis Using Parsimony) is the workhorse of
phylogenetics (molecular and morphological, paleontological and
neontological, etc.).  There are a LOT of options and variables in this, in
terms of outgroup choice, character weighting, tree search algorithms,
support algorithms, consensus tree generation, random tree generation, etc.
PAUP*'s site is http://www.lms.si.edu/PAUP/index.html.  Incidentally, users
of PAUP* 4.0b4a be warned: that beta version expired a couple of weeks ago,
and they have the new beta version up now.

PAUP*'s main rival is PHYLIP
(http://evolution.genetics.washington.edu/phylip.html), and many classic
studies were generated by the venerable Hennig86 (which requires a whopping
512K of RAM...).  A rising star is NONA
(http://www.cladistics.com/about_nona.htm), with a Parsimony Rachet option
which seems to be lightning quick.

MacClade (http://www.sinauer.com/Titles/frmaddison.htm) is one of the more
useful tree manipulation programs: it is easy to use and can yeild a LOT of
useful information for mapping character evolution (and is a great way to
enter data, too...).  Be warned, however: it is NOT a tree search program,
although it does have a peculiar mini-search option.  If you see a paper
which says their tree(s) were "found with MacClade", then the authors don't
know what they are doing (bit of a joke among the systematics community,
acutally: like papers that come out where the "number of most parsimonious
trees = 100" in a PAUP analaysis: often a sign that the authors did not
change the factory-set "Maxtrees = 100" option...).  In any case, by all
means enter data in MacClade, and look at your PAUP*/NONA/whatever results
in MacClade to figure out how the characters are changing, but don't use it
to find your MPTs.

                Thomas R. Holtz, Jr.
                Vertebrate Paleontologist
Department of Geology           Director, Earth, Life & Time Program
University of Maryland          College Park Scholars
                College Park, MD  20742
Phone:  301-405-4084    Email:  tholtz@geol.umd.edu
Fax (Geol):  301-314-9661       Fax (CPS-ELT): 301-405-0796