[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Eufalconimorphae and homoplasy in mol-phyl
I have looked at a few molphyl issues in the last week. My taxon sets include
about 50-80 taxa (out of 100-160) every time I analyze them. I have not yet
included the basalmost Neornithes (charadriiforms, Mirandornithes, the
"seabird" part of the Aequrnithes), perhaps I'll include them eventually.
I ran the analyses on PhyML (i.e. ML analysis), using a GTR+I(empirical)+10
rate categories model. Trees evaluated by NNI and SPR (best of both models).
Support via PhyML's "aLRT-like" algorithm (quicker than bootstrap).
I can preliminarily conclude 4 things:
"Higher neoavian" (i.e. non-Aequornithes) basal relationships are tricky to the
extreme. Adding or removing certain taxa will mess up any tree, because they
will (mostly) long-branch-attract to *within* otherwise trivially resolving
clades. Often, this will occur as secondmost-basal taxon (if you have 3 or more
taxa in the clade in question), kicking out the basal member in the process. It
is easy for example to kick _Sagittarius_ or _Pandion_ away from the
accipitrids. Even if you include _Elanus_ or _Gampsonyx_, to which at least the
Secretarybird otherwise reliably clades.
Particularly insidious are:
* "aberrant gruiforms" (mesites, sungrebes, Sunbittern, Kagu... bustards for
some reason seem tame by comparison)
* basal cypselomorphs (anything except Apodiformes)
* trogons, mousebirds, Upupiformes s.l. (hoopoes are worst, but hornbills are
* possibly pelicans and ibises and _Psophia_.
These taxa will routinely clade at places where they CANNOT be. Not because
*they* are there, but because their supposed sisters as far as anyoine can tell
belong elsewhere (_Psophia_ is NOT sister to _Galbula+Ciconia_...).
They will routinely do so with support values in excess of 0.8.
It is highly desirable to represent every major lineage (ordinal- or even
familial-level, for living birds) with 4 taxa or more (up to 6 or 7) initially.
You want 3 taxa at least, and one might have an aberrant sequence (huge
autapomorphic indels etc).
At this time, indels and transposable/retroelements are NOT SUITABLE FOR DEEP
PHYLOGENETIC ANALYSIS without detailed information on their origin and mode of
The amount of homoplasy and stereotyped insertion/deletion sites is staggering.
They DO NOT insert/excise at random. Not at all. Short (1-3 bp) indels do
indeed occur fairly randomly. Long ones definitely don't far more often than
Just take the beta fibrinogen intron 7 sequences from GenBank for a broad
taxonomic sample of Neoaves and align them. Or try to align them, then weep.
Better still: use the mt control region (sometimes listed under "D-loop"). You
need a strong stomach for this though, its pseudogene has evolved independently
at least 2 times in Neoaves (possibly 5 times or more, though one of the
supposed "pseudogenes" looks surprisingly human...).
This is a snippet of the fibrinogen sequence:
It's still not fully aligned, but you can already see some interesting things:
>From left to right, the major inserts appear to be:
* longish homoplasy: cypselomorphs + Sunbittern/Kagu (misaligned in latter) +
basal Neornithes. Mesites seem to have a different insert at the same site.
The former degrades quickly and the original sequence is not well discernible.
May be a retroposon, it is ubiquitious and resembles a snippet from an
enhancer-binding protein of HIV. The insert in _Monias_ is almost certainly
noneukaryotic in origin.
* medium-sized ?synapomorphy of Columbidae, but if so convergently lost again
in some. Otherwise convergent autapomorphy of some unrelated columbids.
Possibly this is just badly aligned though (but a good alignment is tough here).
* medium-sized autapomorphy of _Zenaida_, or possibly badly aligned.
* medium-sized synapomorphy of Columbidae. Probably it's not well aligned and
the "TG" corresponds to the start of the TGTTA sequence a few bases to the
right. But it doesn't matter, columbids stand out a lot here, the entire region
(this and the two preceding points) is incredibly hard to align parsimoniously.
* longish autapomorphy of _Tyto_ and _Chalcophaps_, again different sequences
inserting at exactly the same site.
The former or something derived from it seems to be also present (at the same
site?) in the same locus in single species of at least 3 other neoavian
lineages - it *may* be bacterial but it could be a transposon (similar
sequences appear in many unrelated eukaryotes and *could* be a homeobox
The latter is widespread (though not common) in birds and also occurs in _Danio
rerio_. In birds, it was apparently lost for good at some point beyond
Aequornithes, but even so the presence/absence agrees little with phylogeny.
So, in this stretch alone we have about as much insert code as code that is
indeed inherited as the analysis assumes it to be. But of the insert code, only
1 element in 5 is phylogenetically informative. The rest will just mess up
things with their pronounced "phylogenetic signal" that has nothing to do
whatsoever with common descent.
Falcons are one of the most jumpy clades in the analysis so far, but they tend
to come up closer to psittaciforms and accipitrids than to *any* of these three
does to passeriforms or cathartids.
Passeriforms usually end up somewhere with "picocoracines", and cathartids are
surprisingly hard to get away from "Ciconiiformes" s.l. (the latter might be
susceptible to addition of the remaining aequornithid lineages, which I have
But the falconid-psittacid relationship is easily changed though. Just add a
mesite, and it all falls apart.
My approach is to try and assemble 3 or more taxa from each major clade
("orders" usually, but "families" for taxa sedis mutabilis). Then I run a
complete analysis and check out what well-supported (based on the rest of total
evidence) clades do clade. Next, I remove any taxa that seem to inhibit the
others from clading, and see what the effect is.
Usually, formerly-dispersed "known clades" pop into place. For example
_Sagittarius_ returns to the base of the accipitrids instead of hanging around
with nightjars, hoopoes or bustards (all examples taken from 12S sequence data,
which is otherwise fairly conservative and very nice to handle).
I have this far looked into the bFibInt7, the mt control region, RAG-1, ND2,
12S (mt) rRNA and cytochrome b. The approach seems to work.
In the end, it may turn out that the course molecular phylogeny has been going
for the last 5 years is a dead end: in cases where the phylogeny is not readily
resolved from a few loci, it may be a VERY bad idea to do whole-genome or
Rather, the way to go seems to be focusing more on *what* is analyzed (i.e.
study the evolutionary pattern per-locus) than to simply increasing the amount
of analyzed data blindly: "Never trust a transposon whose origin and mode of
evolution is not known".
And while singular taxa may be crucial to resolving some things, their
inclusion may just as well kill any hope of resolving anything.
Molecular phylogeny needs to focus more on the content of the data analyzed and
the peculiarities of the taxa analyzed, than to follow the present "more =
For avian molecular phylogeny, it seems as if nothing that cannot be resolved
to satisfaction with 4 loci (1-3 mt, 1-3 nc, depending on how deep you go) can
be resolved with *any* amount of data as long as the data itself are not better