[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

RE: Morpho v molecular (was Re: Tinamous: living dinosaurs)

I don't see anything different here than what happens in morphological 
analyses.  Adding genes is better even if it causes more homoplasy, because all 
that means is that the topology isn't as well supported as we thought, and the 
"non-random noise" may be a signal of the true phylogeny.  The morphological 
equivalent is a Sereno analysis, where he purposefully excludes homoplasic 
characters so that his clades appear better supported.  I don't think most 
people agree with this tactic, so I don't think you'd agree to it in molecular 
analyses either.

And again, I agree it's important to get the algorithm's right, and indeed more 
important than adding more data to crappy algorithms.  Just as I'd say it 
doesn't matter how much data you throw into a supertree or an Adam's consensus 
tree, the tree's still crap.  But I don't see how adding more data to a bad 
algorithm will necessarily cause greater node support and thus make the results 
of a larger bad analysis more misleading than those of a smaller bad analysis.  
So I don't think adding more genes will make our trees worse, and still stand 
by my statement adding more genes is a good way to test molecular phylogenies 
and add useful data to them.

Finally, I never said that adding more genes and taxa to molecular analyses 
will ALWAYS get the phylogeny right, but it's the best tool we have.  
Similarly, adding more characters and taxa to morphological analyses won't 
always generate correct trees, but it's more likely to than using a small 
number of characters and taxa.

Mickey Mortimer

> Date: Wed, 6 Jul 2011 11:11:51 +1000
> From: tijawi@gmail.com
> To: dinosaur@usc.edu
> Subject: Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)
> Mickey Mortimer <mickey_mortimer111@msn.com> wrote:
> > This doesn't mean it's easy to combine genes in analyses.  As has been 
> > mentioned, genes need different models.  This slows
> > down analyses, and you need to get them right.  I don't see how different
> > evolutionary rates in different genes is an issue if you have enough genes 
> > to work with though.  Any noise should cancel itself
> > out given enough data.  The only way it wouldn't is if it's not random, and
> > which molecular artifacts can cause that over multiple genes?  Base 
> > composition bias maybe, but that's easily identified.
> The assumption that the noise should cancel itself out given enough
> data sounds entirely reasonable. This relies on the premise that the
> phylogenetic signal is additive whereas the random noise is averaged
> across multigene datasets. However, how do you deal with non-random
> noise - which strictly speaking isn't actually noise, but bias?
> Biases can be caused by base composition, codon usage, or
> transition-transversion ratios. Such biases may be easy to identify
> (especially if compositional), but what if these biases are actually
> part of the phylogenetic signal itself?! Can you (or should you)
> discriminate between biases introduced by shared evolutionary history
> (good) and biases that result from homplasy (bad)?
> The thing to recall about bias (compositional, codon.
> transition-transversion ratio) is that it's only homoplastic ('bad')
> if it arises independently in two or more lineages, and therefore
> qualifies as 'noise' (i.e., is not reflective of a shared evolutionary
> history = the phylogenetic signal). Bias can also occur by common
> ancestry. To use an analogy in morphology-based analyses, shared
> adaptive traits can either arise from a common ancestor, or
> convergently (homplasy). For example, are the shared aquatic
> characters of hupehsuchians and ichthyosaurs a result of convergence
> within two independently aquatic lineages, or the product of
> inheritance from a common ancestor that possessed these aquatic
> adaptations? Would it be useful to exclude all aquatic-related
> characters? Similarly, re the aerial locomotor adaptations of colugos
> (Dermoptera) and bats (Chiroptera) a product of shared ancestry, or
> homoplastic? Morphology- and molecular-based phylogenies disagree on
> this point, with the latter finding that the shared aerial locomotor
> characters must be homoplastic.
> In molecular-based analyses, biases of any kind introduce a pattern (=
> structure) in the dataset that is not random noise. If you remove any
> and all biases, you run the risk of removing some phylogenetically
> informative characters. Again to give an an analogy in the
> morphological realm, one study of pterosaur affinities (Bennett, 1996
> ) removed hindlimb characters on the basis that they were functionally
> correlated with bipedal, digitigrade locomotion in dinosaurs and
> pterosaurs. Unsurprisingly, the dinosaur-pterosaur link broke down
> when hindlimb characters were excluded.
> Your contention is that any homplasy will eventually be swamped by the
> phylogenetic signal by expanding the dataset, because adding more data
> will override the random noise. But this only works if (1) the noise
> is random, and (2) there is sufficient phylogenetic signal to override
> any noise. This may not always be the case, and third codon positions
> may be especially problematic. As you know, in protein-coding genes,
> each codon is composed of three bases. The third bases of codons tend
> to vary more than first bases, which in turn tend to vary more than
> second bases. This is due to relative levels of degeneracy of the
> genetic code at these three positions. For this reason, the third
> codon position (the most rapidly changing position) is often
> invaluable for discerning recent divergences. However, for deep
> divergences, the third positions become saturated. At this point, the
> third codon positions cease to be useful for phylogenetic analysis.
> This means that a whole third of the dataset is not merely useless for
> retaining the phylogenetic signal (more, if the first position also
> becomes saturated), but acts as a potential source of homoplasy if the
> base substitutions are not evenly distributed. Thus, this non-random
> 'noise' at the third position can therefore create structure that
> conflicts with the phylogenetic signal, especially in deep
> divergences, but can be mistaken for the phylogenetic signal.
> > I don't disagree that working out the best method to combine genes is 
> > difficult or that developing these methods is more important
> > than just throwing more genes into the analysis, but my point still stands 
> > that unanalysed genes are a huge resource for testing
> > molecular phylogenies and that I don't know of any times (post-2001, as 
> > David notes) that adding more genes to an analysis has
> > led to changing a well supported result to one newly congruent with 
> > morphological
> But if the "well supported result" is itself an artifact of the genes
> themselves (including the method by which the algorithm seeks to
> extract the phylogenetic signal), then adding more genes is not likely
> to overturn this result. My argument is unless the algorithm gets its
> right, then adding more genes will compound the problem - it will give
> the same wrong tree, but with better support.
> My concern is not that all molecular trees are bad - I don't believe
> that at all. My concern is the assumption that given enough taxa and
> enough genes, a molecular analysis will always get it right. Whereas
> most nodes are recovered courtesy of the "true" phylogenetic signal,
> others may be being supported by homplastic 'noise' which is
> generating structure that is not phylogenetic.
> Cheers
> Tim