[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Rconstructing DNA (was Re: Dino-fuzz found in amber?)

> Not a random guess. After all, skeletal restoration of
> fossils - even
> with a complete skeleton - is a guessing too, an educated
> guess, and
> there is no problem with that.

Yes, but the confidence intervals are *much* higher

> Consider the leucine: UUA, UUG, CUU, CUC, CUA, CUG
> Lets hypothetically consider that in the same place, in a
> sequence of
> extant organism 1, we have isoleucine, coded by AUC. We
> could not be
> 100% sure, but it is, under parsimony assumption, likely
> that the
> leucine code used is CUC.

In this case, probably, yes.

Now lets say we have an extinct sequence indicating Serine.
Now lets say in extant organisms we also find Serine.
What then?
Suppose we find its sometimes also Threonine or Alanine.
Well, then we can probably eliminate 2 of the 6 codons, we still have a 3/4 
chance of getting it wrong.
Now suppose we find another extant organism that has argentine in that location 
too, with an estimated divergence time similar to the species with 

Now we are back to a 1 out of 6 guess.
Lets say we find Aspartate - no amount of comparing to any extant sequence is 
ever going to do much to change it from a 50:50 guess (unless perhaps it is 
from a bacteria where we notice major AT vs CG content differences, in which 
case we may get to a a 2/3 guess)

There simply isn't enough information contained within the amino acid sequence 
to determine the DNA sequence that produced it.
In vertebrates, you can determine criteria for possible DNA sequences, and 
determine that some of those possible sequences are highly unlikely, but you 
are going to fall orders of magnitude short of the results from sequencing DNA.
If your amino acid sequence comes from a microbe of unknown affinity, forget 
about it, you can't even be sure of the coding rules that microbe was using.

I think it is a poor idea to imply that amino acid sequences can lead to the 
DNA sequences that created them - as the vast majority of the time, you are 
going to end up way off.