I'm not a phylogenetics expert when it comes to the algorithms used and such, but I believe what Marjanovic meant was that a morphological dataset like this one that tries to maximize taxonomic coverage will have a lot of highly incomplete taxa. Dromomeron gigas is just a partial femur, Ignotosaurus an incomplete ilium, Nyasasaurus a few vertebrae and partial humerus, etc.. It's common to have only 10% coded or less. So in these cases, it's very easy for the bootstrapping procedure to delete the few characters placing such taxa in their clades, or multiply those few characters that would otherwise be local autapomorphies and disagree with close relatives. And since the bootstrap percentage is telling you if the entire compliment of included taxa formed a clade in the bootstrap replicates, it's not useful if you have a lot of fragmentary taxa jumping around a basically stable backbone phylogeny. Ornithoscelida might have been recovered in every tree, but Nyasasaurus was in it 44% of the time for example. And because of this, Ornithoscelida gets a 66% bootstrap score, despite the fact every bootstrapped tree had theropods plus ornithischians forming a clade outside sauropodomorphs. Am I completely off base?
From: firstname.lastname@example.org <email@example.com> on behalf of David Černý <firstname.lastname@example.org>
Sent: Thursday, March 23, 2017 5:57 PM
Subject: Re: [dinosaur] Dinosauria reclassification joins Ornithischia and Theropoda in Ornithoscelida
David Marjanović <email@example.com> wrote:
I cannot complain about a bootstrap support of 66 %. These aren't molecular data, and bootstrap values are well known to have a bias for too low values (while Bayesian posterior probabilities are biased in the other direction).
Yes, the nonparametric bootstrap might be overly conservative (though statements like that tend to run into all sorts of interpretation issues: see Alfaro & Holder 2006 and the references therein), but 66% is low even when we take that into account! The bootstrap value most often cited as corresponding to the standard confidence level of 0.95 is 70% (again, see Alfaro & Holder 2006).
On a side note, I'm not comfortable with the notion that we should lower our standards when dealing with morphological data (as opposed to molecules) so as to level the playing field. If we find that morphology is incapable of giving us the degree of confidence we want, let's just leave it at that.
However, if we accept 66 % as on the low side, the finding that there's no arrangement with a bootstrap support higher than that becomes an important result in itself!
Exactly: out of the 69 internal nodes in Baron et al.'s reduced tree (Extended Data Figure 1), only 5 have bootstrap values of 50% or more, and only one has a bootstrap support higher than 70%. To me, this indicates a nearly complete failure of the data to say anything about early dinosaur phylogeny with confidence. Perhaps this could be interpreted as a statement about the quality of the available fossil record: we have so many samples so close to the origin of Dinosauria that well-supported inferences about their relationships are impossible even with a reasonably large dataset.
It merely changes "everything we thought we knew is wrong" to "everything we thought we knew is, at best, much more weakly supported than we thought it was".
That's a pretty substantial change, though!
Ruben Safir <firstname.lastname@example.org> wrote:
The texts and articles that we have reviewed continued to have a lack of coherent methodology or theory, is often factually wrong (sometimes in critical areas, and sometimes later corrected) and play fast and loose with mathematical modeling. It reached a point where I'm not willing to just accept the output of any software that produces phylogeny without reviewing it with experts on graph theory and algorithms who can weigh in on the soundness of the software. This would be someone outside of paleo. At this point, I have a dead line I need to make in about 3 weeks to finish my paper, and I've yet to write the AI section. So I can't review the code base for TNT, et al.
I like how you casually dismiss the entire well-established field of computational phylogenetics while saying absolutely nothing of substance, then back off claiming you're too busy to actually make your point. Talk about playing fast and loose.
Alfaro ME, Holder MT 2006 The posterior and the prior in Bayesian phylogenetics. Annu Rev Ecol Evol Syst 37: 19–42