[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Morpho v molecular (was Re: Tinamous: living dinosaurs)

> the phylogenetic signal.  Across multigene datasets,
> structure is
> additive but random noise will be averaged out.

Not as easily as assumed, because the noise *is subtracted* from the signal. 
The deeper you go in time, the less signal you have and the more noise (random 
or not) you have. The signal/noise ratio is more interesting than the absolute 
amount of signal for easy calculations of branch support ("easy" maning those 
that only consider the best-supported pairing and not "almost-as-good" 
alternatives). IF you compare the support for ALL POSSIBLE sisters of one 
branch, then indeed the signal should conspicuously add up. But if you don't, 
adding noise simply increases support for alternate 
(non-most-likely/-most-parsimonous) sisters but you won't know which, while 
decreasing support for the most likely/most parsimonious one.

So you'll probably end with the same topology, but even with *random* noise it 
will be less well supported if the SNR decreases.

(For complete averaging-out, you'd need an infinitely amount of data in any 
case. But for practical purposes, it's enough if any noise-generated "second 
bests" according to one dataset A are prevented by the other datasets from 
becoming any better.)

Bremer support and similar methods (those that show how much better the 
best-supported sister pairing is versus others) may be the only easily-computed 
workaround: if you add datasets and see ML/MP support decreasing but Bremer 
support increasing, it is likely due to deep-time noise.

And from my observations, it looks like adding taxa is from some point onwards 
more helpful than adding characters at least in molecular studies, not the 
least by allowing to infer ancestral states more robustly.

Particularly pesky nodes ought to be tested more thoroughly. It would help for 
example to know the ML/MP value *distribution* for a lineage and for all its 
possible sisters, for example. Is one possible sister conspicuously 
better-supported than the others? Bremer support can show this. But if the 
Bremer support value is not so high, what then? Can we narrow it down to 2 
alternatives? If there is sufficient signal, there should be very few pairings 
with moderate-to-high ML/MP support, and a long tail of pairings with support 
values close to zero.  

Is there a measure for "number of alternative sisters with no more than 10% 
increase in required steps presuming the next-lowest node doesn't change"? 
("10%" is arbitrary, it could be 5% or 15% or whatnot, depending what is proven 
to be a good cut-off value)