[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: Rconstructing DNA (was Re: Dino-fuzz found in amber?)



On Fri, Sep 16, 2011 at 2:54 PM, Erik Boehm <erikboehm07@yahoo.com> wrote:
> And when we find that extant organisms use more than 1 of the 4 possible 
> codon sequences to encode Serine at that position?

Align the sequences, put on a tree.

> Lets take an example of humans and chips as the extant organisms.
> In more than one case (such as hemoglobin), Humans and chimps have the exact 
> same amino acid sequence, but a slightly different DNA sequence to encode 
> that protein.
> Suppose we find a fossil ape that is just outside the Human-Chimp clade.
> Which DNA sequence do we use?

The reconstructed common ancestral one. We do it with humans - SNPs
and other intrapopulational variations are taken into account.

> What if it is on the human branch, but very basal, do we assume in every case 
> where there is ambiguity between humans and chimps that we use the human 
> version. Sure you could compare to gorillas, but what of the cases where the 
> protein sequence is different (as the amino acid identity with them is not 
> 100%). If we find consensus between the chimp and gorilla, sequence, we 
> conclude the chimp sequence is basal, but this basal member of the human 
> branch.... we still don't know when to go with the human encoding for a 
> particular aa, or the chimp encoding.>

We could use maximum likelihood, for example.

> You simply cannot reconstruct the DNA sequence that made the amino acid 
> sequence with any certainty. You will be reduced to
> arbitrary guessing.

"cannot with *any* certainty" is an exaggeration. It is made all the
time when the ancestral state is inferred from extant sequences.

> Every single codon assignment is going to involve some level of guessing 
> (unless it is methionine in a vertebrate).

It is true. Actually even when we find a methionine it will involve
some level of guessing. It is just that it will not be a random
guessing.

> Even when consesnus sequences exist, you still find many variations from 
> species to species (SNPs)....

Not only from species to species, but *within* species - between
individuals of the same species. Actually this SNPs are *informative*.

> Its a pointless endeavor. Your DNA sequence generated won't be any more 
> useful than the amino acid sequence (which you can still use for phylogenetic 
> analysis).>

There is no need of being useful. Many people think that guessing the
origin of avian flight is useless.

> It is pointless to generate a DNA sequence from an amino acid sequence of an 
> extinct organism. I can say with well over 95% confidence that whatever 
> sequence you generate (assuming its of any reasonable length) will be wrong.>

If you regard 'wrong' as "we can't be 100% sure that the sequence
obtained have a 100% match with the actual - and unkown - sequence",
then yes. In this regard, it will be true to any sequencing - and I'm
talking about direct DNA sequencing here - even with fresh sample.

I think that it is better to say: "We can say, within 95% of
confidence, that the obtained sequence differ no more than +/-X% from
the actual sequence".

[]s,

Roberto Takata