[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Re: Reverse-engineering the T. rex genome
Erik Boehm <email@example.com> wrote:
> When you sequence a genome, at the most basic level, all
> you find is the nucleotide sequence, but you don't even know
> which of those nucleotides constitute a "gene" - after all
> for any given length of DNA you have to examine 6 different
> possibilities for open reading frames.
> Genes can be on either strand, and since codons are pairs
> of three nucleotides, there are three possibilities for
> reading frames per strand.
> And its very hard to tell what codes for aa's, what
> produces rna transcripts that aren't translated (but still
> serve some function, and could be considered a gene), what
> nucleotides are promoters, which ones are structural,
> binding sites, which ones are merely "spacers", etc.
Computers can do all this (and more) standing on their heads. Moreover, they
can annotate genomes in a high-throughput and fairly accurate fashion. Sure,
they get the start- and end-points of a gene wrong sometimes, even when it's a
protein-coding gene. For example, if there are two methionine codons (ATG) in
close proximity near the beginning of a gene, the software may choose the wrong
one as the start codon. And some organisms have stop codons that are not stop
codons at all, but encode for 'weird' amino acids (i.e., outside the standard
20, like pyrrolysine), which causes the software to terminate the gene
prematurely. But by and large, computers can annotate genomes (and
metagenomes) at a fairly rapid rate.
> Very few genomes have been fully sequenced and annotated -
> the mitochondria genome is one - I could see calling it
> "mapped", as you can precisely identify where each gene is
> within the genome.
Very few *eukaryotic* genomes have been fully sequenced and annotated. They're
just too big. The _H. sapiens_ nuclear haploid genome for example is about 3
billion nucleotides long. Prokaryote genomes (bacteria and archaea) are a lot
smaller (between 2 to 5 million nucleotides), and so sequencing and annotation
is "easier". Literally hundreds of bacterial species have had their entire
genomes sequenced and annotated.