[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

RE: Morpho v molecular (was Re: Tinamous: living dinosaurs)

The assumption that genes contain the true phylogenetic signal, and that this 
signal can be detected via an algorithm in our analyses, must be true if we 
think the same of morphological characters.  After all, heritable morphological 
characters are basically all caused by genes, and it's genes that are passed 
down, not base morphology (ignoring some epigenetic exceptions).  I know that 
most of the genes we use now are not directly involved with morphology (as they 
involve mitochondria, immune cells and such) but the analogy still holds, and 
as we use more genes in phylogenetic analyses, we'll eventually get to ones 
which affect macroscopic morphology.

This doesn't mean it's easy to combine genes in analyses.  As has been 
mentioned, genes need different models.  This slows down analyses, and you need 
to get them right.  I don't see how different evolutionary rates in different 
genes is an issue if you have enough genes to work with though.  Any noise 
should cancel itself out given enough data.  The only way it wouldn't is if 
it's not random, and which molecular artifacts can cause that over multiple 
genes?  Base composition bias maybe, but that's easily identified.

I don't disagree that working out the best method to combine genes is difficult 
or that developing these methods is more important than just throwing more 
genes into the analysis, but my point still stands that unanalysed genes are a 
huge resource for testing molecular phylogenies and that I don't know of any 
times (post-2001, as David notes) that adding more genes to an analysis has led 
to changing a well supported result to one newly congruent with morphological 

Mickey Mortimer

> Date: Fri, 1 Jul 2011 12:40:47 +1000
> From: tijawi@gmail.com
> To: dinosaur@usc.edu
> Subject: Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)
> Mickey Mortimer <mickey_mortimer111@msn.com> wrote:
> > More importantly, we have the huge untapped resource of genes.  How many 
> > genes are used in most molecular analyses?  One?  Three?  Fifteen if 
> > they're really recent.  The turtle study I noted above
> > used two.  I could understand if analyzing a few genes gave us bad results 
> > due to misalignment, LBA, etc., but as we sequence five, ten, a hundred, a 
> > thousand genes, not to mention SINEs, LINEs and
> > other non-coding regions, things really should start to resemble the morpho 
> > trees if the latter are right.  Yet instead as molecular analyses acquire 
> > more data, the trees gain a strongly supported
> > consensus which sometimes differs from morphology.  I think this is a 
> > strong indication the molecular analyses are finding something real, since 
> > none of the artifacts should bias the result in the same
> > way across so many genes.
> Sounds good in theory, but in practice it's a world of pain. Your
> assumption is that the genes retain the "true" phylogenetic signal,
> and this signal can be detected using an algorithm contained within
> the phylogenetic analysis. But individual genes evolve in different
> ways: some may have accumulated too few changes to record the
> phylogenetic signal; others may have accumulated too many changes,
> such that the original signal has been lost; others are 'just right'.
> Unfortunately, if the algorithm can't discern the phylogenetic signal,
> it may hone in on any biases instead - after all, it's just looking
> for patterns in the dataset and can't magically identify the "true"
> phylogenetic signal. Adding more genes can actually make things worse
> if these added genes contain little phylogenetic signal - they may
> dilute the influence of those 'just right' genes that do preserve this
> signal.
> So do you string all the genes together, and analyze the concatenated
> multigene dataset as a single "supergene" using common parameters -
> and hope that all the quirks of individual genes "come out in the
> wash" to generate the "true" phylogeny. Or do you try and deal with
> potential heterogeneities across different genes by partitioning the
> multigene dataset - and analyze the different genes separately, each
> with its own parameters, and combine them all at the end as a kind of
> gene 'supertree'. Both approaches have their pros and cons, and each
> has its own proponents and detractors. To be honest, I don't know
> which approach is better at capturing the true phylogeny. But I don't
> think we can simply assume that "more data" always leads to a "better"
> tree. It's the quality of the analysis - specifically the
> algorithm(s) - that's important, and this crucial detail tends to get
> overlooked in the rush to compile ever-larger multigene datasets.
> Cheers
> Tim