[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)
The problem with multigenes (either method) is also that it combines
sequences evolving at different speeds. If you compare cytB and RAG-1
and ND2 sequences for Accipitridae, you'll find that each locus has
the best resolution (where support is consistently high except for
very short branches) at a different area in the phylogeny,
corresponding to different periods in time. Before that period (i.e.
further from the present) the noise is too large. After that period
(i.e. closer to the present) the signal becomes too weak. Either way
the SNR drops off.
But when you add characters to a matrix, the signal adds up, while the
noise, as long as it's random, cancels itself out. If a slow-evolving
gene gives you high resolution for old nodes and low resolution for
young ones, and a fast-evolving one gives you the opposite, then
combining them should give you high resolution for nodes of both ages.
Different speeds of evolution among different characters are, however,
still a well-known problem in model-based phylogenetics (maximum
likelihood, Bayesian inference) that assume a rate of evolution. To
avoid this, you need a model with several rate categories -- at the very
least one for each gene.
The more rate categories, the better. Unfortunately, calculation time
increases rapidly with the number of rate categories, so the default
number in phylogenetics software is _four_, which is _not good_.
(Alternatively, you can use simple parsimony, which amounts to a
separate rate category for every nucleotide. But of course, simple
parsimony comes with its own problems, like the famous greater
susceptibility to long-branch attraction -- different speeds of
evolution among different taxa.)