One more response:
Thomas R. Holtz, Jr. <email@example.com
Herein is a major problem. An ancestor is a very, very, very specific thing: a member of the population from whom the later clade of interest is directly derived. I.e., your grandparents are ancestors, but your granduncles and aunts are not.
A sister group is simply the taxon within a given analysis which is more closely related to the clade of interest than all other taxa *in that analysis*. So finding that A is the sister taxon to C in an analysis is *NOT* a statement that “A was the very closest relative to C of all organisms that ever lived.” It is simply an analysis-level topology. [see note below]
In contrast, saying A is the ancestor to C means that in the actual real world, all members of C are directly derived from the population A. That is an exceedingly difficult and problematic thing to demonstrate.
Yes, estimating direct ancestry is more difficult than estimating sister-group relationships because you have to account for incomplete sampling (i.e., the difference between "within a given analysis" and "in the actual real world"). Since the models we are discussing do exactly that, this won't work as an argument against using them.
It's also important to note that the approaches in question do not attempt to "demonstrate" direct ancestry beyond doubt. There might be no Mesozoic dinosaur species whose probability of being the ancestor to a more recent taxon exceeds, say, 95%, or some other arbitrary threshold. So what? That would only be a problem if the methods told you "A is an ancestor, B is a tip" whenever that was more likely than the alternative and were done with it, without attaching any measure of uncertainty to that statement. However, the actual analyses performed by Bapst et al. (2016) are not invalidated by this, since they do quite a bit more than give you a point estimate and pretend that's all there is to detecting ancestors. In the hypothetical situation described above, they would assign a posterior probability distribution to the ancestral status of both A and B, like the one that was described in the paper (and highlighted by D. Bapst in his comment) for Archaeopteryx.
In other words, the new algorithms do not necessarily cause there to be less uncertainty regarding direct ancestry; they quantify it. That is a major step forward, because paleobiologists can now accommodate that uncertainty in their inferences about other parameters by summing over all hypotheses of ancestry, each weighted by its own probability.
A quick example: assume that we are interested in the age of the clade comprising all known penguins. According to Gavryushkina et al. (2015), Waimanu manneringi, the oldest representative of the group, has about 65 to 80 percent probability of being ancestral to all other Sphenisciformes. Therefore, we assign the corresponding probability to dates from the range that bounds the age of its fossils (61.6 to 60.5 Ma). However, if the taxon represents a tip rather than a sampled ancestor, there must have been a non-zero waiting time between the speciation event that gave rise to W. manneringi on the one side and the rest of the group on the other side, and the occurrence of W. manneringi in the fossil record. Therefore, we place 20–35 percent probability on ages older than 61.6 Ma. The resulting distribution will be skewed toward much younger ages than if we didn't allow for sampled ancestors.
To constrain it to the issue of bird origins, we might be confident (hypothetically here: in reality, the base of Paraves remains in a real state of flux) that Archaeopteryx lithographica is closer to extant birds than to all other operational taxonomic units in our study. But we really can’t say that it was Archaeopteryx lithographica that was the ancestor. This is especially true for the pre-Pleistocene fossil record, where recovery of a given type of fossil is extremely spotty. Can we be secure that it was the Bavarian species that was the ancestor to birds, and not a species in Greenland, or western North America, or Australia, or Africa, or what have you? No, the sampling is by no means anywhere near dense enough to even know the regional species geography of early Tithonian small theropods.
Sure, sampling rates matter, and the ability to infer that a given taxon is an ancestor is conditional on the sampling being good enough for the chance of finding one to be reasonably high in the first place. If the sampling is too spotty, the probability of one fossil being the direct ancestor of another should be relatively low even if their characters and stratigraphic positions are both compatible with such a relationship.
Whether the sampling ever gets so sparse that we should be skeptical of (or ignore) the very possibility of sampled ancestors being present in our datasets is a separate question. I find it interesting that most assertions to that effect are essentially arguments from intuition. In the above reasoning, you employ biogeographic intuition; I also saw arguments based on the ratio of fossil to extant tips. Convincing as these might sound, I'd like to see that probability quantified, and it turns out that doing this is rather nontrivial: you'd need a simulation with biologically realistic rates of speciation and extinction as well as geologically realistic rates of fossilization and sampling. As Bapst et al. (2016) helpfully point out, this was attempted by Foote (1996), who concluded that "[w]ith pessimistic assumptions, a lower bound on the proportion of known fossil species that are directly ancestral to other known fossil species is on the order of 1%. With more realistic assumptions, this proportion is on the order of perhaps 10% or more" (ibid
., p. 148).
Furthermore, there really isn’t even a strong knowledge benefit of knowing a given Tithonian maniraptoran was the direct ancestor of Passer and Struthio.
We're clearly going round in circles here, as I already gave an explanation of just what such benefits might be, including a recently published empirical example. Many more could be given. How hard did the K/Pg extinction hit the dinosaurs? Was it a sudden event, or a gradual process? Questions like these are usually answered using non-phylogenetic approaches relying on species counts. Nick Matzke, one of the authors on Bapst et al. (2016), pointed out
that they could instead be analyzed in an explicitly phylogenetic framework by estimating speciation and extinction rates in a BAMM-like manner (in fact, a new version of BAMM that can deal with fossil tips has already been announced), but in order to do that, we would need to account for certain species being sampled from multiple time horizons without introducing a new tip for each horizon. This is best done by estimating direct ancestry.
Since most of the remaining discussion was predicated on not seeing a difference between sister group and ancestor, I’ll let it go.
That's a pretty serious misinterpretation of what I wrote, since I never claimed that I didn't see a difference between ancestors and sister groups. I merely argued that there was no difference in our ability to make probabilistic statements about the two. (I also added a caveat that this was true in principle
– i.e., I did not claim that both types of relationships were equally easy to infer.) So far, I didn't see any evidence to the contrary.
I was referring to the newer software, which does in fact allow far more sophisticated means of assessing likelihoods and so forth of a given ancestral node state than traditional parsimony ACCTRANS/DELTRANS methods. Advances in computer speed and more sophisticated algorithms have greatly improved these, and (in my opinion) it is this end of phylogenetics and not necessarily tree reconstruction per se which has seen the greater advances in the last decade.
I absolutely agree that there's been huge progress in the field, related mainly to its embrace of probabilistic methods. It's just that the field itself is not new; it has been around for a long time. On a related note, I'm not sure that it's particularly useful to separate this progress from the developments in "tree reconstruction", since the most advanced approaches currently in use coestimate ancestral states along with tree topology: see, for example, Lee et al. (2014).
Foote M 1996 On the probability of ancestors in the fossil record. Paleobiol 22(2): 141–51
Gavryushkina A, Heath TA, Ksepka DT, Stadler T, Welch D, Drummond AJ 2015 Bayesian total evidence dating reveals the recent crown radiation of penguins. arXiv:1506.04797v1 [q-bio.PE], June 15
Lee MSY, Cau A, Naish D, Dyke GJ 2014 Sustained miniaturization and anatomical innovation in the dinosaurian ancestors of birds. Science 345(6196): 562–6