[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Genes show Neoaves branching before K/Pg extinction

> If this were the case, adding more data would help -- the
> phylogenetic
> signal is additive, whereas noise is (by definition) random
> and would
> cancel itself out.

In theory yes. In practice this is wrong for two reasons:

1. While noise is by definition randomly *generated*, the algorithm which maps 
the noise on the signal can be selective, resulting in a noise-generated bias 
to the signal. But "signal-bias ratio" is, I think, not a regular term. Hence 
The algorithm here is natural selection, which is nonrandom enough (once it has 
persisted one generation) to create a stochastic bias that is only partially 
predictable by likelihood models in each *actual case* it occurs. An unexpected 
transition-transversion ratio in one position (due to unusual neighborhood or 
whatev) will reliably create a fake "signal" in most base positions within 
20-200 Ma. Methods to enhance signal reliably recover this in preference. 
Still, actual signal persists in conserved regions. But this too decays 
eventually, and in the end you *have* to resort to rare genomic changes (TEs 
and other indels etc).

Our algorithms to recover signal are optimized on recovering signal, *not* on 
saying "sorry guv, no signal here" if it is likely there is none left.

2. Even if the noise were mapped without bias (think white noise in audio) 
"noise cancels itself out" would never be 100% true for actual sequence data. A 
switch from signal to nonsignal is always more likely than vice versa, unless 
a) nonsignal exists that is more prominent than signal (see 1. for effects) or 
b) the signal has been fully obliterated. Signal decay and nonsignal 
accumulation are inevitable.

This is easily overlooked in large-scale sequence analyses, because the focus 
is on the assembled data, not the individual loci. But the assembled data is a 
synthetic composite that has not evolved in situ. If you compare numerous 
individual partitions (loci) for a dense taxonomic sample, you may start to 
wonder whether a considerable part of the deep-time "signal" is not just the 
noise that screams loudest across the dataset. See also

Wang et al (2012) Testing hypotheses about the sister group of the 
passeriformes using an independent 30-locus data set. Mol Biol Evol. 29(2): 

Matzke et al. (2012). Retroposon insertion patterns of neoavian birds: strong 
evidence for an extensive incomplete lineage sorting era. Mol Biol Evol, in 
print. doi:10.1093/molbev/msr319

which are a case study and a partial explanation.

Under certain conditions, the noise does indeed seem to cancel itself out. I 
think there is even a quantitative study defining when signal decay is 
significant and when not for a few loci, SystBiol sometime in the 90s (volume 
40-something), *maybe* Cladistics. Closest I can pin it down is the debate 
launched by

Bull et al (1993). Partitioning and Combining Data in Phylogenetic Analysis. 
Syst. Biol. 42(3): 384–397.

Might wanna try 1997ish (vol 46). Maybe it's in

Bremer et al (1999). More characters or more taxa for a robust phylogeny – case 
study from the coffee family (Rubiaceae). Syst. Biol. 48(3): 413–435.

It is also different for morph data, where the noise (uncodable due to 
destruction, pathology etc) is for practical purposes random. Here too, signal 
decay is final however. Here too, it can be ameliorated to some extent 
(extrapolating remains of almost-obliterated characters) but once the signal is 
gone, it is gone for good.



PS Thanks for the ostrich paper. That changes things quite a bit. Except if the 
paleognath and ratite morphotype are sequentially more advanced than the 
neognath. That would still not recover ratite monophyly, only move ostrich 
further up (if its neognath similarities are atavistic). Not likely.

But it still needs all the fossil data one can get. The crown clade is 
sufficiently sampled by Livezey & Zusi (2006, 2007), and there are some studies 
with stem taxa. Mayr (2009) considers neither of the two main ones robust 
enough tho.