[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)

 The problem with multigenes (either method) is also that it combines
 sequences evolving at different speeds. If you compare cytB and RAG-1
 and ND2 sequences for Accipitridae, you'll find that each locus has
 the best resolution (where support is consistently high except for
 very short branches) at a different area in the phylogeny,
 corresponding to different periods in time. Before that period (i.e.
 further from the present) the noise is too large. After that period
 (i.e. closer to the present) the signal becomes too weak. Either way
 the SNR drops off.

But when you add characters to a matrix, the signal adds up, while the noise, as long as it's random, cancels itself out. If a slow-evolving gene gives you high resolution for old nodes and low resolution for young ones, and a fast-evolving one gives you the opposite, then combining them should give you high resolution for nodes of both ages.

Different speeds of evolution among different characters are, however, still a well-known problem in model-based phylogenetics (maximum likelihood, Bayesian inference) that assume a rate of evolution. To avoid this, you need a model with several rate categories -- at the very least one for each gene.

The more rate categories, the better. Unfortunately, calculation time increases rapidly with the number of rate categories, so the default number in phylogenetics software is _four_, which is _not good_.

(Alternatively, you can use simple parsimony, which amounts to a separate rate category for every nucleotide. But of course, simple parsimony comes with its own problems, like the famous greater susceptibility to long-branch attraction -- different speeds of evolution among different taxa.)