[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)

Mickey Mortimer <mickey_mortimer111@msn.com> wrote:

> More importantly, we have the huge untapped resource of genes.  How many 
> genes are used in most molecular analyses?  One?  Three?  Fifteen if they're 
> really recent.  The turtle study I noted above
> used two.  I could understand if analyzing a few genes gave us bad results 
> due to misalignment, LBA, etc., but as we sequence five, ten, a hundred, a 
> thousand genes, not to mention SINEs, LINEs and
> other non-coding regions, things really should start to resemble the morpho 
> trees if the latter are right.  Yet instead as molecular analyses acquire 
> more data, the trees gain a strongly supported
> consensus which sometimes differs from morphology.  I think this is a strong 
> indication the molecular analyses are finding something real, since none of 
> the artifacts should bias the result in the same
> way across so many genes.

Sounds good in theory, but in practice it's a world of pain.  Your
assumption is that the genes retain the "true" phylogenetic signal,
and this signal can be detected using an algorithm contained within
the phylogenetic analysis.  But individual genes evolve in different
ways: some may have accumulated too few changes to record the
phylogenetic signal; others may have accumulated too many changes,
such that the original signal has been lost; others are 'just right'.
Unfortunately, if the algorithm can't discern the phylogenetic signal,
it may hone in on any biases instead - after all, it's just looking
for patterns in the dataset and can't magically identify the "true"
phylogenetic signal.  Adding more genes can actually make things worse
if these added genes contain little phylogenetic signal - they may
dilute the influence of those 'just right' genes that do preserve this

So do you string all the genes together, and analyze the concatenated
multigene dataset as a single "supergene" using common parameters -
and hope that all the quirks of individual genes "come out in the
wash" to generate the "true" phylogeny.  Or do you try and deal with
potential heterogeneities across different genes by partitioning the
multigene dataset - and analyze the different genes separately, each
with its own parameters, and combine them all at the end as a kind of
gene 'supertree'.  Both approaches have their pros and cons, and each
has its own proponents and detractors.  To be honest, I don't know
which approach is better at capturing the true phylogeny.  But I don't
think we can simply assume that "more data" always leads to a "better"
tree.  It's the quality of the analysis - specifically the
algorithm(s) - that's important, and this crucial detail tends to get
overlooked in the rush to compile ever-larger multigene datasets.