Last jab at cladistic shorthand (long and moderately esoteric)

Peter B. wrote:
>*Again in this post, I have used my own brand of cladistic short-hand
>which differs from Jon Wagners.
        But not by much...

        I differentiate between four different types of cladistic shorthand,
shorthand used to describe tree topology, shorthand used to describe clade
content, shorthand to describe a clade, and shorthand used to define taxa.
Note that the first three are all descriptive uses, and the last is definitive.

1)      Tree Topology shorthand: This is already well established as Newick
Notation, used to describe the internal relationships of terminal taxa on a
cladogram. Simply put, elipses and commas are used to describe the
relationships of terminal taxa in a cladogram. This provides a fast
convenient means of conveying the same information as a cladogram with the
ponderour (usually ASCII) graphics. For example:
                (Crocodilia, (Ornithiscia, (Sauropodomorpha, Theropoda)))
        The principle is that of heirarchichaly nested parentheses, with
"sister taxa" (not necessarily true sisters, i.e. sister-stems) separated by
commas. THis is, to my knowledge, the *only* established cladistic
shorthand, and should (IMHO) always be used when discussing tree topology.

2)      Clade "Content": It is unlikely that we will ever be able to name
all members of a given clade (the contents of the set, in mathspeak). It is
frequently useful, however, to list the members of a clade on a specific
cladogram. This is often done in elipses, either with commas or with
addition signs. Note that this is very similar to describing tree topology,
as above, but without the heirarchy of nested parentheses. For example:
                (Charcarodontosaurus, Acrocanthosaurus, Giganotosaurus) or
                (Charcarodontosaurus + Acrocanthosaurus + Giganotosaurus)
        Clearly, however, the casual use of two different shorthand methods
is potentially confusing. It would be my preference to use the "modified
Newick" system, using commas. This, however, may carry the unintentional
implication of a multi-chotomous relationship amongst the taxa in the study.
On the other hand, the (+) method causes potential confusion with the
shorthand in number 3 below, and therefore carries connotations which are
inappropriate to discussions of clade content.
        If we were to extend the mathematical set idea, we might insist that
these be denoted in {} curly brackets, with commas. Indeed, as these are
subsets of the greater set of all taxa in the study, and this is
appropriate. Therefore, it is suggested that the group of animals above be
identified as:
        {Charcarodontosaurus, Acrocanthosaurus, Giganotosaurus}
when the subset of terminal taxa consisting of these animals is to be
considered. This is potentially confusing with the shorthand for definitions
provided in number 4 below, but since there is no modifier appended to the
taxon names, there should be no confusing them with anchor taxa.

3)      Clade Describtion: It is often convenient to offhand refer to a
group of taxa, especially on a particular cladogram, as a clade. Note that
this is not the same as defining a taxon, in which a clade is given a name.
A clade is frequently described using a shorthand which is ambiguous as to
whether it is intended to define the clade or simply describe the clade.
This is often written using elipses and addition signs simliarly to the
second case shown above. For example, we might see:
        (Charcarodontosaurus + Acrocanthosaurus)
used to describe the clade consisting of C. and A., their common ancestor
and all of its descendants. As noted above, this shorthand may be used in a
way which fails to distinguish between describing a taxon and defining a
taxon. It also fails to provide a means of describing stem based clades.
Such a shorthand is indeed useful, and a clear methodology should be
        It seems that the best thing to do currently is to leave this
notation the way it is now, and adopt a *completely new* and *distinct*
shorthand for defining taxa. The above notation is not consistant with
either Pete's nor my notations in any way, but appears to be popular (on
this list and in reent work, eg. Novas' latest). Using anchor-taxon
notation, or curly brackets would only cause more confusion, IMHO. I would
suggest that a minus sign be used to denote stem relationships, as in:
        (Charcarodontosaurus - Acrocanthosaurus - Giganotosaurus)
        I regard this as an unatractive alternative, but I would rather
avoid a proliferation of symbols beyond (){}+-.

NOTE: I resist the use of square brackets [] for any purpose because there
are already enough symbols being used out there, and because there has to be
at least one type of bracket left for Mickey to comment with. ;)

ALSO NOTE: The triple-barred "is defined as" equals sign should *never* be
used with this notation.

4)      Taxon Definition: I have already given a detailed explanation of how
my system works, but I will repeat it briefly. I specify the taxa used in
the verbal definition of the taxon as "anchor taxa". Each anchor taxon is
described as an "inclusive" or "exclusive" anchor, and given a + or - symbol
respectively in front of its name. Simply, inclusive anchors are part of the
taxon being defined, exclusive anchors are excluded by definition.
        I believe that putting th +/- in front of the anchor taxon name
makes the definition *visually distinct*, and highlights the unique role of
the anchor taxon in the definition. It is unlikely that a taxon definition,
so denoted, will be confused with any of the above shorthands. For example:
>Mine:  {A + B + C}
>Wagner's:  {+A, +B, +C}
        Which of the above looks more like:
        (A + B + C)?

        I do not think Pete's use of the | "or" symbol is beneficial. It is
not clear exactly clear what the "or" means. For example"
>{A > B | C}
        If I have an animal more closely related to A than it is to C, but
more closely related to B than to A, is it still part of the group? In other
words, should I read this as "all animals more closely related to A than B
or to A than C"?
        Also, it should be noted that in probability the | means "given". So
the statement above would read: "All animals more closely related to A than
B, given C."
        While I think the use of <> is ingenious, it does not work without
some sort of "or" symbol, and it still does not *explicitely* denote anchor
taxa. The +/- prefix also emphasizes the differring function of an anchor
taxon, as not necessarily being included in the group, but fundamental to
the definition of the group.
        Additionally, the proliferation of symbols only complicates an
already very confused process.      

        My system uses curly brackets to surround the definition. These were
intended to emphasize the distinctness of definition versus description.
Using them in case number 2 above makes this less effective, but it is still
useful. Peter likes this idea too, so maybe it has some merit...
        The other points of the shorthand were using the three-barred
"defined as" symbol, and delimiting the anchor taxa with commas. In
retrospect, the commas may be inappropriate. without them, however, the
definition may appear more like the shorthand in number 3 above. Compare:
        {+A +B +C}
        (A + B + C)

        Hopefully, we'll all reach some sort of happy, useful middle ground
at some point.

