[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

*To*: dinosaur@usc.edu*Subject*: Re: "Ratite" polyphyly and paleognathous dromornithids*From*: David Černý <david.cerny1@gmail.com>*Date*: Wed, 8 Aug 2012 22:00:26 +0200*In-reply-to*: <5022387C.2060400@gmx.at>*References*: <CADHyUaRf+WRx6HCWrh8Uzhitg5di18mBzz9pGrKeSiAryk5t=Q@mail.gmail.com> <501D8089.8050203@gmx.at> <CADHyUaTV8G1SPsNzp7e5J=L4wgAgBXtEnAT3pdr+QGXcZyjzTQ@mail.gmail.com>*Reply-to*: david.cerny1@gmail.com*Sender*: owner-DINOSAUR@usc.edu

David Marjanović <david.marjanovic@gmx.at> wrote: > When rates per character are unequal, however -- and they usually are --, > model-based methods get into trouble when the model doesn't contain enough > rate categories. Parsimony does not assume any correlation between the rates > of any two characters and therefore performs better in those situations. Is this Kolaczkowski & Thornton (2004) again? They showed that if the sequence evolution follows the Jukes-Cantor model and the proportion of heterotachous sites is between 32% and 68%, MP outperforms common-mechanism maximum likelihood. That's an interesting finding, but you seem to overestimate its importance. Using K2P+gamma instead of JC69 for the simulation is enough to make the difference almost completely disappear (Spencer et al. 2005). And, of course, that was in 2004. Since then, mixture models have been developed in a Bayesian framework to accommodate heterotachy by summing the likelihood for each site over multiple sets of branch lengths (Zhou et al. 2007; Kolaczkowski & Thornton 2008; Pagel & Meade 2008). That's far more flexible than a discrete gamma model with rate categories (where the ratio of each branch length to the others is constant across the categories), better than a partitioned model (because you don't have to divide characters into partitions a priori -- you don't even have to specify the number of partitions prior to the analysis), and more... well, _parsimonious_ than parsimony, because it's still far from parameterizing every branch-length/character combination separately. It still might lead to overparameterization, but reversible jump Markov chain Monte Carlo can take care of that (Pagel & Meade 2008). > NJ, even ME, is still phenetic. They work on similarity that has been > corrected by a model, but it's still similarity, it's still shared character > states (whether observed directly or recalculated from the observations by > the use of a model). Parsimony works exclusively on shared _derived_ > character states, not shared character states in total. That's what makes it > phylogenetic. Nice, but if that doesn't guarantee it finds the right phylogeny more often than a non-phylogenetic method -- and by this point, every method except parsimony is "non-phylogenetic", as "work[ing] exclusively on shared derived character states" means dependence on a particular character-state assigment to the internal nodes of a fixed topology, which is a unique trait of parsimony -- then what's being phylogenetic good for? Is it a good idea to claim that only the possession of this property makes a method phylogenetic, if it actually isn't very helpful in inferring phylogenies? If methods working on all character states perform better than methods working exclusively on derived states, what justification is there left for using only the latter? > UPGMA can only be right for the wrong reasons: it can only give you a tree > that is congruent with a phylogenetic tree if there's little enough > homoplasy in the data. When this condition is met, the phenetic tree happens > to be identical to the phylogenetic tree. But you can't assume _a priori_ > that there's little enough homoplasy in your dataset! I don't see how that's different from parsimony. Parsimony, too, can only give you a tree that is congruent with a phylogenetic tree if there's little enough variation in rates of evolution among the taxa in your data set. When this is true, it works just fine, but you can't assume it to be true a priori. I wouldn't call that "being right for the wrong reasons", I would call it "being right because the assumptions of the method aren't violated by the data". Your description applies quite well to some related situations, though, such as the behavior of both parsimony and UPGMA in the Farris zone (see below). > Adding a model can compensate for this assumption to varying extents. When > you do that with parsimony, it isn't called parsimony anymore... You cannot add a model to parsimony, because parsimony itself is a model. Or rather it's a nonparametric shortcut to several different models (Farris 1973; Goldman 1990; Tuffley & Steel 1997), which all exhibit some rather strange properties. Their number of parameters grows as fast as new data are added to the analysis -- F73 and G90 achieve it by treating ancestral character states as nuisance parameters, TS97 by giving a different set of branch-length parameters to every single character (Huelsenbeck et al. 2008). This makes them statistically inconsistent and, by the way, extremely non-parsimonious. On the other hand, you can use a model to correct the data for unobserved changes, just as with neighbor-joining, and subject the resulting data matrix to a parsimony analysis (= to a maximum-likelihood analysis using one of the "parsimony models"). Steel et al. (1993) described how to do it, it's still called parsimony, nobody does it. Apparently it's philosophically objectionable. > Sure: in the simplest cases, in those where there's little enough homoplasy > in the data, all methods (phenetic or phylogenetic) will give the same tree No, that's not what I've been talking about. The case explored by Swofford et al. involved an extreme amount of homoplasy -- but between two adjacent branches that evolved much faster than the remaining two (the "Farris zone"). Parsimony grouped the long branches together (correctly) because of their homoplasies, UPGMA grouped the short branches together (also correctly) because of their symplesiomorphies. Both methods were able to find the correct tree only because their bias worked in their favor. > Eh, that depends. Naturally I forgot the reference *sigh*, but I remember > reading that BI is biased toward finding too symmetric trees. If the > true/simulated tree has a Hennig comb at its base, BI commonly fails to find > it and puts the OTUs of that comb into one or two small clades. Sounds interesting, although I couldn't find the reference either. However, even if true, it doesn't seem unsolvable; it should be possible to counter it by biasing the MCMC proposal mechanism in the right direction. > Also, Bayesian posterior probabilities are inflated for unknown reasons. > (Bootstrap values are too low for likewise unknown reasons.) That's only true for some cases. If the model of evolution is correct, moderately overparameterized, or slightly oversimplified, the posterior probability of a clade corresponds extremely closely to the probability that the clade is correct given the data (Ronquist & Deans 2010). When the model is misspecified, posterior probabilities can be either inflated or too conservative. *Refs:* Farris JS 1973 A probability model for inferring evolutionary trees. Syst Zool 22: 250-6 Goldman N 1990 Maximum likelihood inference of phylogenetic trees with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst Zool 39: 345-61 Huelsenbeck JP, Ané C, Larget B, Ronquist F 2008 A Bayesian perspective on a non-parsimonious parsimony model. Syst Biol 57(3): 406-19 Kolaczkowski B, Thornton JW 2004 Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431(7011): 980-4 Kolaczkowski B, Thornton JW 2008 A mixed branch length model of heterotachy improves phylogenetic accuracy. Mol Biol Evol 25(6): 1054-66 Pagel M, Meade A 2008 Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo. Phil Trans R Soc B 363(1512): 3955-64 Ronquist F, Deans AR 2010 Bayesian phylogenetics and its influence on insect systematics. Annu Rev Entomol 55:189-206 Spencer M, Susko E, Roger AJ 2005 Likelihood, parsimony, and heterogeneous evolution. Mol Biol Evol 22(5): 1161-4 Steel MA, Hendy MD, Penny D 1993 Parsimony can be consistent! Syst Biol 42(4): 581-7 Tuffley C, Steel MA 1997 Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59: 581-607 Zhou Y, Rodrigue N, Lartillot N, Philippe H 2007 Evaluation of models handling heterotachy in phylogenetic inference. BMC Evol Biol 7: 206 -- David Černý

**Follow-Ups**:**Phylogenetics was Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Marjanovic <david.marjanovic@gmx.at>

**References**:**"Ratite" polyphyly and paleognathous dromornithids***From:*David Černý <david.cerny1@gmail.com>

**Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Marjanovic <david.marjanovic@gmx.at>

**Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Černý <david.cerny1@gmail.com>

**Re: "Ratite" polyphyly and paleognathous dromornithids***From:*David Marjanovic <david.marjanovic@gmx.at>

- Prev by Date:
**Triassic dicynodont Stahleckeria found in Namibia** - Next by Date:
**Funds For Dinosaur Sculpture** - Previous by thread:
**Re: "Ratite" polyphyly and paleognathous dromornithids** - Next by thread:
**Phylogenetics was Re: "Ratite" polyphyly and paleognathous dromornithids** - Indexes: