[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Re: Genes show Neoaves branching before K/Pg extinction
> If this were the case, adding more data would help -- the
> signal is additive, whereas noise is (by definition) random
> and would
> cancel itself out.
In theory yes. In practice this is wrong for two reasons:
1. While noise is by definition randomly *generated*, the algorithm which maps
the noise on the signal can be selective, resulting in a noise-generated bias
to the signal. But "signal-bias ratio" is, I think, not a regular term. Hence
The algorithm here is natural selection, which is nonrandom enough (once it has
persisted one generation) to create a stochastic bias that is only partially
predictable by likelihood models in each *actual case* it occurs. An unexpected
transition-transversion ratio in one position (due to unusual neighborhood or
whatev) will reliably create a fake "signal" in most base positions within
20-200 Ma. Methods to enhance signal reliably recover this in preference.
Still, actual signal persists in conserved regions. But this too decays
eventually, and in the end you *have* to resort to rare genomic changes (TEs
and other indels etc).
Our algorithms to recover signal are optimized on recovering signal, *not* on
saying "sorry guv, no signal here" if it is likely there is none left.
2. Even if the noise were mapped without bias (think white noise in audio)
"noise cancels itself out" would never be 100% true for actual sequence data. A
switch from signal to nonsignal is always more likely than vice versa, unless
a) nonsignal exists that is more prominent than signal (see 1. for effects) or
b) the signal has been fully obliterated. Signal decay and nonsignal
accumulation are inevitable.
This is easily overlooked in large-scale sequence analyses, because the focus
is on the assembled data, not the individual loci. But the assembled data is a
synthetic composite that has not evolved in situ. If you compare numerous
individual partitions (loci) for a dense taxonomic sample, you may start to
wonder whether a considerable part of the deep-time "signal" is not just the
noise that screams loudest across the dataset. See also
Wang et al (2012) Testing hypotheses about the sister group of the
passeriformes using an independent 30-locus data set. Mol Biol Evol. 29(2):
Matzke et al. (2012). Retroposon insertion patterns of neoavian birds: strong
evidence for an extensive incomplete lineage sorting era. Mol Biol Evol, in
which are a case study and a partial explanation.
Under certain conditions, the noise does indeed seem to cancel itself out. I
think there is even a quantitative study defining when signal decay is
significant and when not for a few loci, SystBiol sometime in the 90s (volume
40-something), *maybe* Cladistics. Closest I can pin it down is the debate
Bull et al (1993). Partitioning and Combining Data in Phylogenetic Analysis.
Syst. Biol. 42(3): 384–397.
Might wanna try 1997ish (vol 46). Maybe it's in
Bremer et al (1999). More characters or more taxa for a robust phylogeny – case
study from the coffee family (Rubiaceae). Syst. Biol. 48(3): 413–435.
It is also different for morph data, where the noise (uncodable due to
destruction, pathology etc) is for practical purposes random. Here too, signal
decay is final however. Here too, it can be ameliorated to some extent
(extrapolating remains of almost-obliterated characters) but once the signal is
gone, it is gone for good.
PS Thanks for the ostrich paper. That changes things quite a bit. Except if the
paleognath and ratite morphotype are sequentially more advanced than the
neognath. That would still not recover ratite monophyly, only move ostrich
further up (if its neognath similarities are atavistic). Not likely.
But it still needs all the fossil data one can get. The crown clade is
sufficiently sampled by Livezey & Zusi (2006, 2007), and there are some studies
with stem taxa. Mayr (2009) considers neither of the two main ones robust