[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Eufalconimorphae and homoplasy in mol-phyl


I have looked at a few molphyl issues in the last week. My taxon sets include 
about 50-80 taxa (out of 100-160) every time I analyze them. I have not yet 
included the basalmost Neornithes (charadriiforms, Mirandornithes, the 
"seabird" part of the Aequrnithes), perhaps I'll include them eventually. 

I ran the analyses on PhyML (i.e. ML analysis), using a GTR+I(empirical)+10 
rate categories model. Trees evaluated by NNI and SPR (best of both models). 
Support via PhyML's "aLRT-like" algorithm (quicker than bootstrap).

I can preliminarily conclude 4 things:

"Higher neoavian" (i.e. non-Aequornithes) basal relationships are tricky to the 
extreme. Adding or removing certain taxa will mess up any tree, because they 
will (mostly) long-branch-attract to *within* otherwise trivially resolving 
clades. Often, this will occur as secondmost-basal taxon (if you have 3 or more 
taxa in the clade in question), kicking out the basal member in the process. It 
is easy for example to kick _Sagittarius_ or _Pandion_ away from the 
accipitrids. Even if you include _Elanus_ or _Gampsonyx_, to which at least the 
Secretarybird otherwise reliably clades.

Particularly insidious are: 
* "aberrant gruiforms" (mesites, sungrebes, Sunbittern, Kagu... bustards for 
some reason seem tame by comparison)
* basal cypselomorphs (anything except Apodiformes)
* trogons, mousebirds, Upupiformes s.l. (hoopoes are worst, but hornbills are 
little better).
* possibly pelicans and ibises and _Psophia_.

These taxa will routinely clade at places where they CANNOT be. Not because 
*they* are there, but because their supposed sisters as far as anyoine can tell 
belong elsewhere (_Psophia_ is NOT sister to _Galbula+Ciconia_...). 

They will routinely do so with support values in excess of 0.8.

It is highly desirable to represent every major lineage (ordinal- or even 
familial-level, for living birds) with 4 taxa or more (up to 6 or 7) initially. 
You want 3 taxa at least, and one might have an aberrant sequence (huge 
autapomorphic indels etc).

At this time, indels and transposable/retroelements are NOT SUITABLE FOR DEEP 
PHYLOGENETIC ANALYSIS without detailed information on their origin and mode of 


The amount of homoplasy and stereotyped insertion/deletion sites is staggering. 
They DO NOT insert/excise at random. Not at all. Short (1-3 bp) indels do 
indeed occur fairly randomly. Long ones definitely don't far more often than 

Just take the beta fibrinogen intron 7 sequences from GenBank for a broad 
taxonomic sample of Neoaves and align them. Or try to align them, then weep.

Better still: use the mt control region (sometimes listed under "D-loop"). You 
need a strong stomach for this though, its pseudogene has evolved independently 
at least 2 times in Neoaves (possibly 5 times or more, though one of the 
supposed "pseudogenes" looks surprisingly human...).

This is a snippet of the fibrinogen sequence: 

It's still not fully aligned, but you can already see some interesting things:

>From left to right, the major inserts appear to be:
* longish homoplasy: cypselomorphs + Sunbittern/Kagu (misaligned in latter) + 
basal Neornithes. Mesites seem to have a different insert at the same site.
The former degrades quickly and the original sequence is not well discernible. 
May be a retroposon, it is ubiquitious and resembles a snippet from an 
enhancer-binding protein of HIV. The insert in _Monias_ is almost certainly 
noneukaryotic in origin.
* medium-sized ?synapomorphy of Columbidae, but if so convergently lost again 
in some. Otherwise convergent autapomorphy of some unrelated columbids. 
Possibly this is just badly aligned though (but a good alignment is tough here).
* medium-sized autapomorphy of _Zenaida_, or possibly badly aligned.
* medium-sized synapomorphy of Columbidae. Probably it's not well aligned and 
the "TG" corresponds to the start of the TGTTA sequence a few bases to the 
right. But it doesn't matter, columbids stand out a lot here, the entire region 
(this and the two preceding points) is incredibly hard to align parsimoniously.
* longish autapomorphy of _Tyto_ and _Chalcophaps_, again different sequences 
inserting at exactly the same site. 
The former or something derived from it seems to be also present (at the same 
site?) in the same locus in single species of at least 3 other neoavian 
lineages - it *may* be bacterial but it could be a transposon (similar 
sequences appear in many unrelated eukaryotes and *could* be a homeobox 
The latter is widespread (though not common) in birds and also occurs in _Danio 
rerio_. In birds, it was apparently lost for good at some point beyond 
Aequornithes, but even so the presence/absence agrees little with phylogeny.

So, in this stretch alone we have about as much insert code as code that is 
indeed inherited as the analysis assumes it to be. But of the insert code, only 
1 element in 5 is phylogenetically informative. The rest will just mess up 
things with their pronounced "phylogenetic signal" that has nothing to do 
whatsoever with common descent.

Falcons are one of the most jumpy clades in the analysis so far, but they tend 
to come up closer to psittaciforms and accipitrids than to *any* of these three 
does to passeriforms or cathartids. 
Passeriforms usually end up somewhere with "picocoracines", and cathartids are 
surprisingly hard to get away from "Ciconiiformes" s.l. (the latter might be 
susceptible to addition of the remaining aequornithid lineages, which I have 
not tried).

But the falconid-psittacid relationship is easily changed though. Just add a 
mesite, and it all falls apart.


My approach is to try and assemble 3 or more taxa from each major clade 
("orders" usually, but "families" for taxa sedis mutabilis). Then I run a 
complete analysis and check out what well-supported (based on the rest of total 
evidence) clades do clade. Next, I remove any taxa that seem to inhibit the 
others from clading, and see what the effect is. 
Usually, formerly-dispersed "known clades" pop into place. For example 
_Sagittarius_ returns to the base of the accipitrids instead of hanging around 
with nightjars, hoopoes or bustards (all examples taken from 12S sequence data, 
which is otherwise fairly conservative and very nice to handle).

I have this far looked into the bFibInt7, the mt control region, RAG-1, ND2, 
12S (mt) rRNA and cytochrome b. The approach seems to work.

In the end, it may turn out that the course molecular phylogeny has been going 
for the last 5 years is a dead end: in cases where the phylogeny is not readily 
resolved from a few loci, it may be a VERY bad idea to do whole-genome or 
indel/transposon analyses.

Rather, the way to go seems to be focusing more on *what* is analyzed (i.e. 
study the evolutionary pattern per-locus) than to simply increasing the amount 
of analyzed data blindly: "Never trust a transposon whose origin and mode of 
evolution is not known".

And while singular taxa may be crucial to resolving some things, their 
inclusion may just as well kill any hope of resolving anything.

In short: 
Molecular phylogeny needs to focus more on the content of the data analyzed and 
the peculiarities of the taxa analyzed, than to follow the present "more = 
better" paradigm.
For avian molecular phylogeny, it seems as if nothing that cannot be resolved 
to satisfaction with 4 loci (1-3 mt, 1-3 nc, depending on how deep you go) can 
be resolved with *any* amount of data as long as the data itself are not better