[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Taxon sampling in cladistic analyses - some results from DNA



Hi,

a few loci later, I still find it almost impossible to get an  Eufalconimorphae 
clade based on "proper" gene sequences alone (i.e. without transposable 
elements). 

There are some interesting observations though. 
For the following, if I say "robust" I mean "including resilience to changes in 
taxon sample". Otherwise I say "strong" or similar. The two are most assuredly 
NOT the same.
(The retroposon signal for Eufaconimorphae is *strong*. Its robustness has not 
been tested at all, and this is is why I think the paper should not have been 
published yet.)

Ornithine decarboxylase (OCD, some part of the region between exons 6 and 8 is 
usually sequenced) is the only locus I have found yet where there is a strong 
signal for falconid+passerine monophyly. It is not very robust though, and 
parrots are convergent to Aequornithes. I have not looked at why (i.e. what the 
offending sequence is). OCD in general has more severe problems with finding 
the correct rootpoint for subclades than comparable loci. It is also being 
remarkable for reliably clading accipitrids and cathartids, and for clading 
half the "higher landbirds" with Aequornithes (inside them or sister to them) 
while recovering ther est as robust clade.

Otherwise, falconids closer to passerines than to accipitrids is slightly more 
often found than accipitrids closer to passerines than to falconids. But both 
cases are neither common nor robust, usually the three are part of an 
effectively unresolveable polytomy.

And here it gets interesting, because this polytomy *might* represent a 
definite dichotomy in the Neoaves - Aequornithes vs "higher landbirds".

Almost always the following lineages clade consistently and rather robustly:
* passerines
* falconids
* accipitrids
* "picocoracines" (s.l., including hoopoes etc)
* strigiforms
* psittaciforms

"Almost" because you usually have one of these lineages for every locus that is 
convergent with something entirely different. Like psittaciforms in OCD. But 
the divergent lineage varies between loci both in its identitiy and its 
attachment point outside "higher landbirds". I.e. it is not always or even 
preferrably the same lineage of "higher landbirds" that drops out, and those 
that drop out do not attach to constant parts elsewhere in the phylogeny.

There is one exception: passerines. We know this from mt data already. If 
passerines drop out, they usually go to the area between the base of the 
putative "higher landbird" clade and the root of Neoaves. I.e. they may appear 
basal to other "higher landbirds", form a polytomy with the latter and 
Aequornithes, or go basal in or even to Neoaves.
This is likely misroot attraction[*], i.e. convergence between the hypothesized 
base of passeriforms and the hypothesized base of Neoaves.

There is also one problem, or rather two or three: especially columbids and 
cypselomorphs, and to a lesser extent psittaciforms, are genetically "wild".... 
and they are all candidates for inclusion in this clade.
The cypselomorph base *might* be boosted with taxa to an extent that they clade 
more readily, but perhaps the effects of heterothermy running deep in this 
lineage permeate their genome. Pigeons and doves OTOH are just weird... it is 
sometimes barely recognizable that you deal with the same locus as in your 
comparison taxa, it doesn't align properly at all!

I suspect pigeons to be the culprits behind "Metaves". Basically, *everything* 
that is so beset with unusual and long transposons in bFibInt7 that it refuses 
to readily clade with anything else LBAs to columbiforms. That's my present 
working hypothesis at least.

As to "Eufalconimorphae", the troublemaker is almost certainly _Colius_. No 
matter what you find clading with passerines in the "higher landbirds" - remove 
or add a mousebird, and it usually looks *very* different. The mousebird 
doesn't even have to be near passerines (it usually is not *that* close). Its 
effect seems to be the disruption of the picocoracines, which ramifies through 
the "higher landbirds".

I have no idea yet why this is so. Upupiforms have also very unusual sequences 
(as far as they have been sampled, which is OK but not outstanding), but with 
_Colius_ it is less obvious, it aligns better than one might expect from the 
effect it has.

Basically, adding a mousebird seems to create pseudoconvergence within the 
analysis. "Pseudo-" because you have to scrape the bottom of the data to 
recover a phylogenetic signal, so I basically suspect the algorithms simply 
*invent* a "phylogenetic signal" (which then happens to be convergent) from 
colorful noise.

Mousebirds, then, are perhaps the #1 avian taxon which can only be permitted 
into a mol analysis to test for destabilizing effects. Until the reason for 
their odd behavior is known, it is dangerous to include them "for 
completeness". Furthermore I do not think that until this has happened, it is 
impossible to resolve their affinities based on molecular data. All you will 
end up with is a huge load of "phylogenetic signal" that may almost all be 
invented whole cloth.

And this is perhaps the take-home message here: our analyses are by now 
sophisticated to the point where, if they find no signal for the clear 
dichotomies they are optimized to resolve the data into, they can invent one. 
Essentially, we have advanced beyond the point where support values can be 
depended upon as meaningful indicators of clade robustness independent of taxon 
sample composition. I have a few dozen trees lying around which leave little 
room for doubt as to this, and more are in the works. Support for 
_Colius_+[whatever] is typically >0.75. Obviously, given that "[whatever]" 
varies, this is at least in part artefactual.

But luckily, regarding mousebirds we have already this:
http://www.bioone.org/doi/abs/10.1525/auk.2009.07178
which builds upon the somewhat older and less complete
http://onlinelibrary.wiley.com/doi/10.1111/j.1475-4983.2008.00814.x/abstract

I have both papers, in case anyone needs them. The first step to resolve what 
mousebirds *are* is to plug _Eocolius_ into a numerical analysis, e.g. the L&Z 
matrix. It should be scorable, but perhaps not from the literature 
(http://www.springerlink.com/content/q56148836757u02m/, I have this too). It is 
no coliiform apparently, but might be the needed "missing link". If we could 
narrows down the sister group of mousebirds beyond "unspecified 'higher 
landbirds'" - and being essentially living fossils with an abundant hypodigm, 
such an indication *cannot* come from DNA - this would help a lot with the DNA 
work.

(If I *had* to guess, I'd would put my money on upupiforms/bucerotiforms or 
trogons as sister to mousebirds. Very distant sister though. They are 
"similarly weird" in molecular analyses, their total effect is much like that 
of cypselomorphs - doesn't clade (or only barely so) but alters everything 
around it and then some.
Need to check out for what loci there are _Colius_ + _Urocolius_ sequences. 
Perhaps it's just misrooting; then it could be solved via DNA. But considering 
the work that has gone into quantitative analyses of the fossil record, it's a 
waste disregarding that.)

-----

There is definite need to control present-generation cladistic analyses for 
taxon add/remove effects. This is obviously much easier for morph analyses, 
because there you can use the qualitative assessment as guideline; you can tell 
in advance which taxa are troublemaker candidates. For molecular analyses, 
there are obvious cases like parrots sister to storks (OCD dataset)

I also think that the sequencing of the _Columba livia_ genome will allow to 
answer a lot of questions. Especially since with mallard, chicken and 
zebrafinch we have a phylogenetic framework of "mainstream" taxa to compare 
with.

And I think that point-mutation indels can be analyzed conventionally, or at 
least they do little harm and may carry a useful phylogenetic signal. However, 
it is always good to check whether they occur in "weak" regions of a locus 
(where indels and point mutations are frequent), iin which case they may be 
more homoplasious.
As regards transposable elements, any such analysis has to deal with the 
caveats mentioned in the fallout of the "Pegasoferae" case before drawing any 
conclusions.
Especially their distribution *within* a gene pool/species warrants attention. 
I am not certain that interspecific variation is markedly higher here than 
interspecific variation. They are called not "transposable" without a reason - 
if you find one at a particular position in a particular species, you cannot 
per se assume it's present just the same in the sister *individual* to the one 
sequenced. However, there is probably insufficient comparison data to solve 
this question yet (_Gallus gallus_ and _Anas platyrhynchos_ are the only taxa 
for which enough individuals seem to have been sequenced).


Regards,

Eike

* I haven't found a better term. It is a distinct phenomenon from LBA, but it 
is just as significant. You need a fairly good taxon sample to notice it 
though, hence I'm not the first to discuss it (except on-list I think) but 
there are no papers either. It's occasional lab talk, it has been mentioned in 
phylogenetic studies when they were still using phenetics even, but the data 
were insufficient to actually research it until a few years ago.

It is easy to test: detach subtree and display as unrooted star phylogeny. 
Misrooting does not significantly change the *relative* relationships among 
lineages, only the *absolute* one.


PS: looking at the data, the mousebird problem may be best expressed as 
Coliiformes having a fatal attraction to accipitrids. They don't clade, but 
mousebirds alter tree topology to draw accipitrids away from passerines and/or 
falcons. Given that all three seem to be pretty close relatives, this is 
usually enough to push falcons 1-3 steps away from passerines.