[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

*To*: dinosaur@usc.edu*Subject*: Re: Phylogenetics was Re: "Ratite" polyphyly and paleognathous dromornithids*From*: David Černý <david.cerny1@gmail.com>*Date*: Sun, 19 Aug 2012 23:42:10 +0200*In-reply-to*: <502F5F26.6050604@gmx.at>*References*: <CADHyUaRf+WRx6HCWrh8Uzhitg5di18mBzz9pGrKeSiAryk5t=Q@mail.gmail.com> <501D8089.8050203@gmx.at> <CADHyUaTV8G1SPsNzp7e5J=L4wgAgBXtEnAT3pdr+QGXcZyjzTQ@mail.gmail.com>*Reply-to*: david.cerny1@gmail.com*Sender*: owner-DINOSAUR@usc.edu

David Marjanovic <david.marjanovic@gmx.at> wrote: > Sorry for the delay. I got busy in meatspace... I love that term. :-) > That's great! I have to read up on those. I should have mentioned that the mixture model method have problems of its own -- Matsen & Steel (2007) showed that a mixture of trees with the same topology but different branch lengths can result in several different topologies fitting the data perfectly (= having exactly the same likelihood score). However, mixture models can be useful for detecting heterotachy in the first place (returning to the original topic, Smith et al. based their conclusion that their paleognath phylogeny isn't misled by heterotachy on the fact that MCMC sampled only one branch length for most branches on their tree), and there are other models designed to address heterotachy in the data. There are covarion models that do the same thing for rate variation across the tree that the gamma parameter does for rate variation across sites, allowing sites to switch between several rate categories as they evolve (Galtier 2001; Wang et al. 2007). > As far as I know, ML uses the parsimony-uninformative characters to > estimate the parameters of the model; that doesn't make it non-phylogenetic > -- quite the opposite. It still uses synapomorphies to build the trees, > right? It does not. When a probabilistic analysis (ML or Bayes) calculates the likelihood score of a tree, it sums the likelihood over all possible transformations of each character. It assigns every possible character state to every single internal node in the tree. Suppose you have the following four-taxon tree based (for simplicity) on a single binary character: X--Y-- 0 | `-- 0 `--Z-- 1 `-- 1 (I really hope the spaces won't get merged.) The tree is rooted, so let's say there is an outgroup and its character state is zero. Under parsimony, you can say that the grouping of two zeros (stemming from the node Y) is united by a symplesiomorphy -- it wouldn't survive on the final (consensus) tree because there are two other equally parsimonious trees that don't contain that grouping. Probabilistic methods consider this possibility, too, and calculate its likelihood using a model of evolution and branch lengths -- the relevant parameters are estimated from all characters in the data set, not just from the parsimony-uninformative ones, and are tweaked along with the topology during the analysis. However, ML and BI _also_ consider the seven remaining possibilites (there are 2 characters states and 3 internal nodes, so there are 2^3 = 8 possible state assignments), and calculate their probabilities as well. In some of them, the zero-zero grouping is united by a homoplasy: X--1-- 0 | `-- 0 `--Z-- 1 `-- 1 And finally, there is a character-state assignment where the zero-zero grouping is united by a synapomorphy and it's the one-one grouping that is held together by a symplesiomorphy: 1--0-- 0 | `-- 0 `--1-- 1 `-- 1 When you sum the probabilities of all possible state assignments, you get the likelihood of a character. Do it for every character in your data set, calculate the product of the resulting likelihoods, and you have the likelihood score of a tree. Then you are left with two options: you can either use some kind of heuristics and try to find the tree with the highest likelihood by tweaking the topology, branch lengths, and values of substitution model parameters (maximum likelihood); or you can multiply the likelihood by prior probabilities of the current parameter values -- including topology -- to obtain the posterior, let MCMC sample the posterior distribution of tree topologies and other parameters, and when the chain reaches convergence, summarize the sample as a majority-rule consensus tree with branch lengths averaged over all trees in the sample containing that branch (Bayesian inference). Of course, this account is too simplistic*, but it should be evident that the distinction between synapomorphies and symplesiomorphies doesn't enter into the procedure at all. Parsimony is the only method that makes that distinction because it simultaneously optimizes the topology and the character states at its internal nodes, whereas probabilistic methods integrate the ancestral states out. That's their advantage: while it's improbable that the most recent common ancestor of two "ones" was a "zero", it cannot be ruled out, and parsimony doesn't take that possibility into account. However, you can still retrieve synapomorphies from a probabilistic analysis _a posteriori_ -- the simplest way is to constrain a parsimony program to find the ML/Bayesian tree and then check the resulting character state optimization. That's how Lee & Worthy (2011) were able to say what characters support their likelihood tree -- the relationships between nodes on the tree and individual characters isn't as clear-cut with likelihood or Bayesian trees as it is with parsimony trees. In fact, one might argue this is the only reason we should care about synapomorphies at all: they provide the "explanatory connection" between a tree (a result of a statistical analysis) on the one hand and a phylogenetic history on the other (Morrison 2012). However, this certainly doesn't mean you have to separate apomorphies from plesiomorphies in order to _infer_ a phylogeny, and methods that attempt to do such a thing (different variants of parsimony) are actually routinely outperformed by methods that do not. In fact, even the ancestral state reconstruction might be better left to probabilistic methods that can combine the uncertainty about the existence of any particular node with the uncertainty about the ancestral state on that node. *There are good articles that give a much more complete picture about statistical phylogenetics. Lewis (1998) is excellent and the likelihood chapter in the recent second edition of _Phylogenetics: Theory and Practice_ (Wiley & Lieberman 2011) also does a good job explaining the method, although the book has been criticized as being too parsimony-oriented (Morrison 2012). Paul O. Lewis's lecture slides cover pretty difficult concepts (such as MCMC) in a very intuitive way; they are freely available on the web. > What makes NJ (and UPGMA and WPGMA...) non-phylogenetic is that they take > all the differences between any two taxa, average them into a single number > (the percentage of similarity), assemble these into a distance matrix, and > then work on the distance matrix. There is no attempt in there to > distinguish synapomorphies from symplesiomorphies. That's why these > algorithms can be, and are, used for entirely non-phylogenetic problems like > the similarities between faunas at different sites. I strongly disagree. I can see how UPGMA or WPGMA can be used to cluster entities that don't have any phylogenetic history, but there is no way you can do it using, say, NJ with the HKY85 distance correction. You can't interpret the results as anything other than a phylogenetic hypothesis. It doesn't matter that NJ doesn't distinguish synapomorphies from plesiomorphies -- neither does ML or BI (see above), both of which demonstrably outperform the methods that do. The important thing is that NJ attempts to infer evolutionary distances from observed distances, which would make no sense at all if the data haven't evolved on a tree. > That must be why people use ML and BI instead of phenetic methods these > days. I don't think that's the reason. The loss of information about individual characters is a big downside of distance-based methods (more so in morphology, though; there just aren't that many interesting things you can say about evolution of individual nucleotides) and ML or BI certainly perform better than NJ, but they don't care about synapomorphies either. In fact, there are die-hard cladists who would call them "phenetic" as well for this precise reason. > Fair enough; that's why ML and BI were developed. ML wasn't developed to relax the assumptions of parsimony, parsimony was developed as a fast approximation to ML. > I'll have to read those; however, I don't see a reason to assume a priori > that any two characters (that aren't correlated) wold evolve at the same > speed ( = have the same set of branch-length parameters). That's where the statistical approach to phylogenetics is useful. Let's suppose you are right. Then we might indeed want to give a separate set of branch lengths to each character in the data set, and it's mathematically guaranteed that by doing this, we arrive at parsimony: Tuffley & Steel (1997) proved that an ML analysis with a different (= separately parameterized) JC69 model for each character will give the same results as unweighted parsimony (unless, as discovered by a later study, you impose certain restrictions on the substitution process). This is sometimes referred to as "no common mechanism" (NCM). Is it a good idea? Some have argued that it is, because the NCM model is so general that it would apply to almost any data set (Farris 2008). And that's generally true of statistical models: the more parameters they have, the more realistic they are. However, it has a huge drawback: in order to actually capture reality, the estimates of those parameters must be close to their real values, and "the power of a given amount of data to estimate several parameters accurately is generally low" (Steel 2005:309). If you have 2,000 sites from a single locus (... OK, it's a big locus), you might be able to estimate the 9 free parameters of one GTR+gamma model common to all of the sites (5 relative rate parameters, 3 frequency parameters, and the alpha parameter of the gamma distribution) with a reasonable level of precision. However, the NCM model described above needs 2,000 parameters estimated accurately from the same amount of data, which is madness. That's why the information criteria used in the model choice (AIC, BIC) compare not only the realism of different models (likelihoods) but also their simplicity (number of parameters), and why they try to find a reasonable trade-off between the two. It can be proved that AIC will _never_ choose the NCM model (Holder et al. 2010). Surprisinigly, it's not that the gain in likelihood is compromised by the huge number of parameters: there is no gain in likelihood at all. Huelsenbeck et al. (2011) took six empirical data sets and used MCMC to calculate their marginal likelihoods given the fixed (MP) topology and a variety of models including JC69, HKY85, GTR+gamma, and their NCM versions. In every single case the marginal likelihoods of the best common-mechanism models (usually HKY85+gamma or GTR+gamma) exceeded the likelihood of any submodel of the NCM model by many orders of magnitude. Complex submodels of the NCM often outperformed oversimplified CM models such as JC69, but _only_ when they allowed branch lengths to be shared across sites. So there is a very good reason to assume the common mechanism: it fits the data better than the NCM. >> On the other hand, you can use a model to correct the data for >> unobserved changes, just as with neighbor-joining, and subject the >> resulting data matrix to a parsimony analysis (= to a >> maximum-likelihood analysis using one of the "parsimony models"). >> Steel et al. (1993) described how to do it, it's still called >> parsimony, nobody does it. Apparently it's philosophically >> objectionable. > > Huh. Maybe it was too computation-intensive for 1993, so people forgot about > it? Maybe, but the philosophical criticism seems to be more common (Siddall & Kluge 1997; Doyle & Davis 1998). > Oh, so long-branch repulsion would be expected, right? Actually, no, and that was the whole point of the paper: there is no such thing as long-branch repulsion. If there is little enough data, ML may fail to unite the long branches in the Farris zone -- it finds the correct tree in over 30% of cases, because there are 3 possible unrooted trees for 4 taxa and not enough information on the short internal branch to choose between them. It does not fail because of "long-branch repulsion", it fails because it isn't prone to long-branch attraction. On the other hand, parsimony can't distinguish the handful of synapomorphies on the short internal branch from homoplasies on the two long terminal branches, so it sticks the long branches together with ridiculously inflated support -- and thus finds the right tree more often than ML. However, unlike parsimony in the Felsenstein zone, ML isn't inconsistent in the Farris zone, and if you give it more data, it _will_ converge on the correct topology sooner or later. "This behavior of parsimony in the extreme regions of the Felsenstein and inverse-Felsenstein zones is analogous to an oracle who responds to any question by responding “0.492.” If the question asked is, “What is the sum of 0.450 and 0.042?” or “What is 3 times 0.164?” the oracle will answer correctly, but presumably once interrogators realized that the answer was always the same regardless of the question, they would not be ready to give up their electronic calculators. There are times when “I don’t know” is a better answer than a confident guess that has a high probability of being incorrect." -- Swofford et al. 2001:535 > So its bias to long-branch attraction is strong enough to overcome > long-branch repulsion. Good to know. I think you misunderstand what long-branch repulsion was supposed to be (I don't say "what long-branch repulsion is", because the phenomenon simply doesn't exist). Siddall (1998) coined the term for an alleged bias of ML towards keeping long branches apart -- it was never supposed to be a problem for parsimony, or something that parsimony would have to overcome, but a property of maximum likelihood. >> UPGMA grouped the short branches together (also correctly) because of >> their symplesiomorphies. > > Were there enough of those left, or were they independent reversals? They were genuine symplesiomorphies. UPGMA found the correct topology by grouping the short branches together, not the long ones, so there's no need to worry about homoplastic reversals. >> When the model is misspecified, posterior probabilities >> can be either inflated or too conservative. > > Then apparently the former happens a lot. If you look through publications, > almost every node in a Bayesian tree has a PP of 0.99 or 1.00. Well, not all high PPs are necessarily inflated! > Estimating models clearly isn't a nontrivial problem. Unfortunately, I don't > know how the successor to ModelTest does it, or what happened to > MrModelTest. I suppose that should read "is a nontrivial problem". That's true, although there is some hope that the problem could be avoided entirely with reversible jump MCMC, which make it possible to move among models with different numbers of parameters. This way the model of evolution itself becomes just another random variable, and the results of the analysis can be averaged across its multiple values. > Thanks! I'll check them out from Monday onwards. You're welcome, glad to be of any help. :-) *Refs:* Doyle JJ, Davis JI 1998 Homology in molecular phylogenetics: a parsimony perspective. 101–31 _in_ Soltis DE, Soltis PS, Doyle JJ, eds. _Molecular Systematics of Plants II: DNA Sequencing_. Boston: Kluwer Academic Publishers Farris JS 2008 Parsimony and explanatory power. Cladistics 24: 825–47 Galtier N. 2001. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol. 18(5): 866–73 Holder M, Lewis PO, Swofford DL 2010 The Akaike information criterion will not chose the no common mechanism model. Syst Biol 59(4): 477–85 Huelsenbeck JP, Alfaro ME, Suchard MA 2011 Biologically-inspired phylogenetic models strongly outperform the no-common-mechanism model. Syst Biol 60(2): 225–32 Lee MSY, Worthy TH 2011 Likelihood reinstates _Archaeopteryx_ as a primitive bird. Biol Lett 8(2): 299–303 Lewis PO 1998 Maximum likelihood as an alternative to parsimony for inferring phylogeny using nucleotide sequence data. 132–63 _in_ Soltis DE, Soltis PS, Doyle JJ, eds. _Molecular Systematics of Plants II: DNA Sequencing_. Boston: Kluwer Academic Publishers http://www.botany.wisc.edu/courses/botany_563/563_readings/0226_lewis_ml.pdf Matsen FA, Steel M 2007 Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst Biol 56(5): 767–75 Morrison DA 2012 [Review of] Phylogenetics: The Theory and Practice of Phylogenetic Systematics, 2nd edition. Syst Biol doi:10.1093/sysbio/sys065 Siddall ME 1998 Success of parsimony in the four-taxon case: long-branch repulsion by likelihood in the Farris zone. Cladistics 14: 209–20 Siddall ME, Kluge AG 1997 Probabilism and phylogenetic inference. Cladistics 13: 313–36 Steel M 2005 Should phylogenetic models be trying to ‘ﬁt an elephant’? Trends Genet 21(6): 307–9 Swofford DL, Waddell PJ, Huelsenbeck JP, Foster PG, Lewis PO, Rogers JS 2001 Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst Biol 50(4): 525–39 Tuffley C, Steel M 1997 Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59: 581–607 Wang HC, Spencer M, Susko E, Roger AJ 2007 Testing for covarion-like evolution in protein sequences. Mol Biol Evol 24(1): 294–305 Wiley EO, Lieberman BS 2011 _Phylogenetics: The Theory and Practice of Phylogenetic Systematics, 2nd edition_. Hoboken: Wiley-Blackwell -- David Černý

**References**:**"Ratite" polyphyly and paleognathous dromornithids***From:*David Černý <david.cerny1@gmail.com>

**Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Marjanovic <david.marjanovic@gmx.at>

**Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Černý <david.cerny1@gmail.com>

**Phylogenetics was Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Marjanovic <david.marjanovic@gmx.at>

- Prev by Date:
**RE: Huge Trike from Alberta** - Next by Date:
**Sphenodon skull and teeth compared to Mesozoic relatives** - Previous by thread:
**Phylogenetics was Re: "Ratite" polyphyly and paleognathous dromornithids** - Next by thread:
**Solnhofen and Messel** - Indexes: