[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)

Augusto Haro <augustoharo@gmail.com> wrote:

> Agreed that it is logical to expect more change, and thus more
> homoplasy, in third positions than in the others positions... But this
> does not mean that the change in these third positions biased and
> non-random.

Yes, I agree.  As I tried to make clear in my previous message, the
third codon position (the most rapidly changing position) is often
invaluable for discerning recent divergences.  The potential problems
arise in deep divergences, when this position can reach saturation.

One pitfall of the higher substitution rate of the third codon
position is that unless the substitution probabilities are the same
across all taxa, then this third position is collectively providing
non-random-ness (= structure).  (Actually this holds for all base
positions, but is most acute for the third position because it has the
highest potential for neutral base substitutions.)  This structure
cannot be distinguished from the structure that we're really after:
the phylogenetic signal.  Across multigene datasets, structure is
additive but random noise will be averaged out.  But not all structure
will be phylogenetic; non-random 'noise' that provides structure will
also be additive, so adding more genes will have the undesirable
result of amplifying this as well.

> Change at these positions may still be random, so the
> characters based on these positions will still be "noise" and not
> "bias".

The changes may well be neutral, but they may not be random.  If there
is the slightest chance that a third base per codon is more likely to
change to one base over another base in a given organism, then we have
bias.  In other words, unless these substitution probabilities are the
same for all organisms, then we have fertile ground for homoplasy
which will be interpreted as structure.  This source of homoplasy is
more creeping and insidious than compositional bias.  Further, like
compositional bias, differential substitution probabilities can also
be a consequence of shared evolutionary history, and therefore would
qualify as being phylogenetically informative.

What this boils down to is that we need bases to change within a gene
sequence in order to get a phylogenetic signal in the first place.
But the mechanism(s) by which these changes occur often have their own
quirks.  These quirks can add structure, which in turn can be misread
as phylogenetic signal; or these quirks may be part of the
phylogenetic signal.

> And, as you stated, homoplasy (which implies more change than
> non-homoplasious characters) cannot be dismissed because it can add
> structure to a tree at least close to the terminals, and because it
> may demonstrate not to be homoplasy with further data.

Homoplasy can add structure to a dataset.  But for third base
positions it's not so much "close to the terminals" that it poses a
problem; it's a consequence of the higher turnover in the third
position relative to the other two (especially the second position).
In multigene datasets, random noise will be averaged out by the
addition of more and more genes; but structure (whatever its source)
will be increased.  Because the algorithm is on the look-out for
structure (ANY structure), any non-random homplastic changes are a