[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Re: Reverse-engineering the T. rex genome
Current genome annotations are horrible regarding micro RNAs, intron splicing,
In fact, there are cases, where it appears the spliced intron segments have the
real function, and the exons are just degraded after they have been sliced
together and capped as in normal mRNA transcripts.
They are annotated, but not fully annotated.
Computer based annotations are only as good as the program they use, and even
with the relatively simple task of identifying open reading frames, they still
make mistakes, often overlooking open reading frames for extremely short
peptides, because of the parameters the program uses.
And then there is alternate splicing..... and I don't know of any computer
program that attempts to annotate different intron/exon patterns for the same
Of course, bacterial genomes don't have introns (well, some prokaryotes do
appear to have self splicing introns), and I don't think there is any evidence
for micro RNA in bacteria.
The mitochondria genome is basically a greatly reduced bacterial genome, being
just 17 KILO base pairs, whereas other bacteria are typically several MEGA base
pairs, so obviously the annotation of the mitochondrial genome is easier, and
one can have more confidence in it.
In fact, a human could probably proof read the human mt genome in about a day.
It only has ~20 protein encoding genes, 2 rRNA sequences, and ~20 tRNA
sequences, so you could check gene-by-gene to make sure the computer program
got it right, doing this with e. coli would be a lot harder.
So yes, bacterial genomes are generally well annotated after being sequenced,
but I was overlooking bacteria.
You can't simply chuck the nucleotide sequence through a computer program can
call it fully annotated.
--- On Thu, 10/8/09, evelyn sobielski <firstname.lastname@example.org> wrote:
> From: evelyn sobielski <email@example.com>
> Subject: Re: Reverse-engineering the T. rex genome
> To: firstname.lastname@example.org
> Date: Thursday, October 8, 2009, 5:06 AM
> > > When you sequence a genome, at the most basic
> > all
> > > which of those nucleotides constitute a "gene" -
> > all
> > > for any given length of DNA you have to examine
> > different
> > > possibilities for open reading frames.
> > > Genes can be on either strand, and since codons
> > pairs
> > > of three nucleotides, there are three
> > for
> > > reading frames per strand.
> > > And its very hard to tell what codes for aa's,
> > > produces rna transcripts that aren't translated
> > still
> > > serve some function, and could be considered a
> > what
> > > nucleotides are promoters, which ones are
> > > binding sites, which ones are merely "spacers",
> > Computers can do all this (and more) standing on
> > heads. Moreover, they can annotate genomes in a
> > high-throughput and fairly accurate fashion. Sure,
> > they get the start- and end-points of a gene wrong
> > sometimes, even when it's a protein-coding gene.
> > example, if there are two methionine codons (ATG) in
> > proximity near the beginning of a gene, the software
> > choose the wrong one as the start codon. And some
> > organisms have stop codons that are not stop codons at
> > but encode for 'weird' amino acids (i.e., outside the
> > standard 20, like pyrrolysine), which causes the
> software to
> > terminate the gene prematurely. But by and large,
> > computers can annotate genomes (and metagenomes) at a
> > rapid rate.
> True, but it still needs manual proofreading. If you manage
> to code an application that speeds up the process, something
> like a smarter version of Apollo (a nice compact Java app
> that you can put on a USB stick and use on the fly
> http://apollo..berkeleybop.org/current/index.html) you
> might not have a top seller but the eternal gratitude of
> geneticists wordwide.
> A huge problem are exons/introns. IIRC, the smallest exons
> in vertebrates are smaller than their flanking
> splice-signals (either 3 AA or even 1 AA). Coupled wi
about 150 splice variants of one gene in _C. elegans_, but
> that was some years ago), this is really an obstacle that
> only human proofreading can tackle. Good proofreading
> software will at least fairly confidently figure out all the
> possible splice sites. But then you have protein splicing,
> and there it becomes tricky.
> If you only need the genomic sequence, things become A LOT
> easier. There, the biggest problem is faulty sequencing.
> This can be sped up by sequencing several close relatives at
> the same time, because erroneous sequencer reads will stand
> out in an alignment. Of course, multiple close relatives
> might not be available; then parallel runs of the same
> sequence of the same organism (multiple specimens if you
> have the time and money) will do it.