[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Eufalconimorphae and homoplasy in mol-phyl



Hi,

again, I have a few comments - sorry about that. :-)

> "Higher neoavian" (i.e. non-Aequornithes) basal relationships are
> tricky to the extreme.
I agree. I wonder if this will ever be resolvable without having whole-genome
comparisons...

> At this time, indels and transposable/retroelements are NOT SUITABLE
> FOR DEEP PHYLOGENETIC ANALYSIS without detailed information on their
> origin and mode of evolution.
> Period.
;-) Please don't mix up indels, DNA transposons and retroposons - these are
three different things with different "origins and modes of evolution"!
1. indels: random insertion or deletion of a few or more nucleotides during
replication. In other, but rare cases, insertions arise via non-homologous end
joining (e.g. nuclear copies of mitochondrial DNA fragments aka numts). Used
in phylogenomics, but have to be treated with caution (e.g., Fain & Houde
2004).
2. DNA transposons: "cut and paste" mechanism, hard to study their insertion
sites simply by looking at a sequenced genome or so, simply because their
transposase frequently cuts them out (leading to homoplasy) and puts them
somewhere else. NOT used in phylogenomics, guess why. ;-)
3. retroposons: "copy and paste" mechanism, creating a target site duplication
(TSD) flanking the retroposed sequence. Way more complex characters than
indels (indels do NOT have a straightforward character polarization, TSD,
truncation, orientation etc.) and usually much larger. Furthermore, they stay
in the place where they were inserted, in contrast to DNA transposons. Used in
phylogenomics and yielding congruent results (largely congruent to large
sequence analyses and other retroposon studies) - just have a look at the
wealth of phylogenomic studies in mammals, for instance. Phylogenetic
incongruences among retroposons are, after dozens of studies (most of them in
mammals), restricted to rapid radiations where incomplete lineage sorting was
likely to have occurred (e.g., east african cichlids, basal placental
mammalian relationships, basal neoavian relationships).

> Just take the beta fibrinogen intron 7 sequences from GenBank for a
> broad taxonomic sample of Neoaves and align them. Or try to align
> them, then weep.
The distribution of indels is indeed weird and indicates homoplasy in at least
some of them (as was already noted by Fain & Houde 2004, who studied them in
detail).
Looking at the retroposons (in an alignment created from the sequences of
Hackett et al. 2008), there are the following in beta-fibrinogen intron 7:
- a shared insertion of a CR1-H-related retroposon in the sampled
representatives of Columbiformes (this insertion site is empty in the other
sampled birds)
- an autapomorphic insertion of a CR1-Y4-related retroposon in _Pedionomus_
(this insertion site is empty in the other sampled birds)
- a shared insertion of a CR1-C4-related retroposon in the sampled
representatives of Psittaciformes (this insertion site is empty in the other
sampled birds)
So I see 2 phylogenetically informative ones, each congruent with other types
of molecular data. Nice.

> In the end, it may turn out that the course molecular phylogeny has
> been going for the last 5 years is a dead end: in cases where the
> phylogeny is not readily resolved from a few loci, it may be a VERY
> bad idea to do whole-genome or indel/transposon analyses.
How do you explain the fact that only since the upcome of large phylogenomic
studies (= more than four independently evolving loci!) in birds (Ericson et
al. 2006, Hackett et al. 2008, Suh et al. 2011 - sorry about the self-citation
here...), a first consensus within neoavian phylogeny (i.e., "landbirds",
Eufalconimorphae + seriemas, Eufalconimorphae, Psittacopasserae) has been
established? Have a look at mammalian phylogenomics, "deep metazoan"
phylogenomics, etc. - similar thing there.
Having single or few genes in sequence analyses appears to be not enough
signal compared to all the "noise" that is inherent in the sequences. In the
neoavian case, this can be expected to be an extreme ratio, as the branches
within the neoavian radiation are extremely short compared to the looooong
branches that lead from the radiation to the living bird species (e.g., see
Hackett et al.'s figure 2).
Regarding retroposons - having only one orthologous retroposon insertion
supporting one branch is not enough to exclude that this is an artefact due to
rare, but possible homplasy. Only if you have found several CONGRUENT
retroposon insertions and NONE that refute them (e.g., in Suh et al. 2011,
sorry again: 3 for Psittacopasserae, 7 for Eufalconimorphae, but NONE of the
remaining 196 loci that we checked refute them!), you can be quite positive
(as it's supported by the amount of independent retroposon insertion loci and
the fact that there is no refuting signal) that this is not an artefact.

> Rather, the way to go seems to be focusing more on *what* is analyzed
> (i.e. study the evolutionary pattern per-locus) than to simply
> increasing the amount of analyzed data blindly
Well, one has to study the evolutionary pattern per retroposon insertion locus
in order to build a tree from the retroposon presence/absence patterns. At the
same time, it's the amount of data (= independent loci) that provides the
robustness of the results, not a single locus itself.

> ... "Never trust a
> transposon whose origin and mode of evolution is not known".
Case study: All the retroposon markers for Psittacopasserae and
Eufalconimorphae are insertions of one of two closely related retroposons
called "TguLTR5a" and "TguLTR5d". These belong to the group of
hitchcock-related LTR retroposons and all feature a characteristic 5-basepair
target site duplication (each locus has a unique, duplicated 5-bp sequence, so
there is no insertion site preference for a specific target sequence).
Concerning their origin, LTR retroposons are derived from endogenous or
exogenous retroviruses. As these two retroposon subtypes became extinct
somewhere on the lineage leading to the ancestor of Passeriformes, we cannot
study these guys "in vivo", but as they left thousands of copies ("molecular
fossils") in neoavian bird genomes, there's enough data to reconstruct the
above mentioned characteristics.

All the best,
Alex

References:
- Ericson, P. G. P. et al. Diversification of Neoaves: integration of
molecular sequence data and fossils. Biol. Lett. 2, 543–547 (2006).
- Fain, M. G. & Houde, P. Parallel radiations in the primary clades of birds.
Evolution 58, 2558–2573 (2004).
- Hackett, S. J. et al. A phylogenomic study of birds reveals their
evolutionary history. Science 320, 1763–1768 (2008).
- Suh, A. et al. Mesozoic retroposons reveal parrots as the closest living
relatives of passerine birds. Nat. Commun. 2, 443 (2011).





evelyn sobielski schrieb am 2011-09-10:
> Hi,

> I have looked at a few molphyl issues in the last week. My taxon sets
> include about 50-80 taxa (out of 100-160) every time I analyze them.
> I have not yet included the basalmost Neornithes (charadriiforms,
> Mirandornithes, the "seabird" part of the Aequrnithes), perhaps I'll
> include them eventually.

> I ran the analyses on PhyML (i.e. ML analysis), using a
> GTR+I(empirical)+10 rate categories model. Trees evaluated by NNI and
> SPR (best of both models). Support via PhyML's "aLRT-like" algorithm
> (quicker than bootstrap).

> I can preliminarily conclude 4 things:

> 1.
> "Higher neoavian" (i.e. non-Aequornithes) basal relationships are
> tricky to the extreme. Adding or removing certain taxa will mess up
> any tree, because they will (mostly) long-branch-attract to *within*
> otherwise trivially resolving clades. Often, this will occur as
> secondmost-basal taxon (if you have 3 or more taxa in the clade in
> question), kicking out the basal member in the process. It is easy
> for example to kick _Sagittarius_ or _Pandion_ away from the
> accipitrids. Even if you include _Elanus_ or _Gampsonyx_, to which at
> least the Secretarybird otherwise reliably clades.

> Particularly insidious are:
> * "aberrant gruiforms" (mesites, sungrebes, Sunbittern, Kagu...
>   bustards for some reason seem tame by comparison)
> * basal cypselomorphs (anything except Apodiformes)
> * trogons, mousebirds, Upupiformes s.l. (hoopoes are worst, but
>   hornbills are little better).
> * possibly pelicans and ibises and _Psophia_.

> These taxa will routinely clade at places where they CANNOT be. Not
> because *they* are there, but because their supposed sisters as far
> as anyoine can tell belong elsewhere (_Psophia_ is NOT sister to
> _Galbula+Ciconia_...).

> They will routinely do so with support values in excess of 0.8.

> 2.
> It is highly desirable to represent every major lineage (ordinal- or
> even familial-level, for living birds) with 4 taxa or more (up to 6
> or 7) initially. You want 3 taxa at least, and one might have an
> aberrant sequence (huge autapomorphic indels etc).

> 3.
> At this time, indels and transposable/retroelements are NOT SUITABLE
> FOR DEEP PHYLOGENETIC ANALYSIS without detailed information on their
> origin and mode of evolution.

> Period.

> The amount of homoplasy and stereotyped insertion/deletion sites is
> staggering. They DO NOT insert/excise at random. Not at all. Short
> (1-3 bp) indels do indeed occur fairly randomly. Long ones definitely
> don't far more often than not.

> Just take the beta fibrinogen intron 7 sequences from GenBank for a
> broad taxonomic sample of Neoaves and align them. Or try to align
> them, then weep.

> Better still: use the mt control region (sometimes listed under
> "D-loop"). You need a strong stomach for this though, its pseudogene
> has evolved independently at least 2 times in Neoaves (possibly 5
> times or more, though one of the supposed "pseudogenes" looks
> surprisingly human...).

> This is a snippet of the fibrinogen sequence:
> http://img339.imageshack.us/img339/3487/bfi7.png

> It's still not fully aligned, but you can already see some
> interesting things:

> >From left to right, the major inserts appear to be:
> * longish homoplasy: cypselomorphs + Sunbittern/Kagu (misaligned in
>   latter) + basal Neornithes. Mesites seem to have a different insert
>   at the same site.
> The former degrades quickly and the original sequence is not well
> discernible. May be a retroposon, it is ubiquitious and resembles a
> snippet from an enhancer-binding protein of HIV. The insert in
> _Monias_ is almost certainly noneukaryotic in origin.
> * medium-sized ?synapomorphy of Columbidae, but if so convergently
>   lost again in some. Otherwise convergent autapomorphy of some
>   unrelated columbids. Possibly this is just badly aligned though (but
>   a good alignment is tough here).
> * medium-sized autapomorphy of _Zenaida_, or possibly badly aligned.
> * medium-sized synapomorphy of Columbidae. Probably it's not well
>   aligned and the "TG" corresponds to the start of the TGTTA sequence a
>   few bases to the right. But it doesn't matter, columbids stand out a
>   lot here, the entire region (this and the two preceding points) is
>   incredibly hard to align parsimoniously.
> * longish autapomorphy of _Tyto_ and _Chalcophaps_, again different
>   sequences inserting at exactly the same site.
> The former or something derived from it seems to be also present (at
> the same site?) in the same locus in single species of at least 3
> other neoavian lineages - it *may* be bacterial but it could be a
> transposon (similar sequences appear in many unrelated eukaryotes and
> *could* be a homeobox pseudogene).
> The latter is widespread (though not common) in birds and also occurs
> in _Danio rerio_. In birds, it was apparently lost for good at some
> point beyond Aequornithes, but even so the presence/absence agrees
> little with phylogeny.

> So, in this stretch alone we have about as much insert code as code
> that is indeed inherited as the analysis assumes it to be. But of the
> insert code, only 1 element in 5 is phylogenetically informative. The
> rest will just mess up things with their pronounced "phylogenetic
> signal" that has nothing to do whatsoever with common descent.

> 4.
> Falcons are one of the most jumpy clades in the analysis so far, but
> they tend to come up closer to psittaciforms and accipitrids than to
> *any* of these three does to passeriforms or cathartids.
> Passeriforms usually end up somewhere with "picocoracines", and
> cathartids are surprisingly hard to get away from "Ciconiiformes"
> s.l. (the latter might be susceptible to addition of the remaining
> aequornithid lineages, which I have not tried).

> But the falconid-psittacid relationship is easily changed though.
> Just add a mesite, and it all falls apart.

> ------

> My approach is to try and assemble 3 or more taxa from each major
> clade ("orders" usually, but "families" for taxa sedis mutabilis).
> Then I run a complete analysis and check out what well-supported
> (based on the rest of total evidence) clades do clade. Next, I remove
> any taxa that seem to inhibit the others from clading, and see what
> the effect is.
> Usually, formerly-dispersed "known clades" pop into place. For
> example _Sagittarius_ returns to the base of the accipitrids instead
> of hanging around with nightjars, hoopoes or bustards (all examples
> taken from 12S sequence data, which is otherwise fairly conservative
> and very nice to handle).

> I have this far looked into the bFibInt7, the mt control region,
> RAG-1, ND2, 12S (mt) rRNA and cytochrome b. The approach seems to
> work.

> In the end, it may turn out that the course molecular phylogeny has
> been going for the last 5 years is a dead end: in cases where the
> phylogeny is not readily resolved from a few loci, it may be a VERY
> bad idea to do whole-genome or indel/transposon analyses.

> Rather, the way to go seems to be focusing more on *what* is analyzed
> (i.e. study the evolutionary pattern per-locus) than to simply
> increasing the amount of analyzed data blindly: "Never trust a
> transposon whose origin and mode of evolution is not known".

> And while singular taxa may be crucial to resolving some things,
> their inclusion may just as well kill any hope of resolving anything.


> In short:
> Molecular phylogeny needs to focus more on the content of the data
> analyzed and the peculiarities of the taxa analyzed, than to follow
> the present "more = better" paradigm.
> For avian molecular phylogeny, it seems as if nothing that cannot be
> resolved to satisfaction with 4 loci (1-3 mt, 1-3 nc, depending on
> how deep you go) can be resolved with *any* amount of data as long as
> the data itself are not better understood.


> Regards,


> Eike