Saturday, May 22, 2010

Junk DNA is still junk

Part of the series, "Blogging on Peer-Reviewed Research" at The Panda's Thumb:

The ENCODE project made a big splash a couple of years ago — it is a huge project to not only ask what the sequence of a strand of human DNA was, but to analyzed and annotate and try to figure out what it was doing. One of the very surprising results was that in the sections of DNA analyzed, almost all of the DNA was transcribed into RNA, which sent the creationists and the popular press into unwarranted flutters of excitement that maybe all that junk DNA wasn't junk at all, if enzymes were busy copying it into RNA. This was an erroneous assumption; as John Timmer pointed out, the genome is a noisy place, and coupled with the observations that the transcripts were not evolutionarily conserved, it suggested that these were non-functional transcripts.
I felt the same way. ENCODE was spitting up an anomalous result, one that didn't fit with any of the other data about junk DNA. I suspected a technical artifact, or an inability of the methods used to properly categorize low frequency accidental transcription in the genome.

Creationists thought it was wonderful. They detest the idea of junk DNA — that the gods would scatter wasteful garbage throughout our precious genome by intent was unthinkable, so any hint that it might actually do something useful is enthusiastically siezed upon as evidence of purposeful design.

Well, score one for the more cautious scientists, and give the creationists another big fat zero (I think the score is somewhere in the neighborhood of a big number requiring scientific notation to be expressed for the scientists, against a nice, clean, simple zero for the creationists). A new paper has come out that analyzes transcripts from the human genome using a new technique, and, uh-oh, it looks like most of the early reports of ubiquitous transcription were wrong.

Here's the author's summary:
The human genome was sequenced a decade ago, but its exact gene composition remains a subject of debate. The number of protein-coding genes is much lower than initially expected, and the number of distinct transcripts is much larger than the number of protein-coding genes. Moreover, the proportion of the genome that is transcribed in any given cell type remains an open question: results from "tiling" microarray analyses suggest that transcription is pervasive and that most of the genome is transcribed, whereas new deep sequencing-based methods suggest that most transcripts originate from known genes. We have addressed this discrepancy by comparing samples from the same tissues using both technologies. Our analyses indicate that RNA sequencing appears more reliable for transcripts with low expression levels, that most transcripts correspond to known genes or are near known genes, and that many transcripts may represent new exons or aberrant products of the transcription process. We also identify several thousand small transcripts that map outside known genes; their sequences are often conserved and are often encoded in regions of open chromatin. We propose that most of these transcripts may be by-products of the activity of enhancers, which associate with promoters as part of their role as long-range gene regulatory sites. Overall, however, we find that most of the genome is not appreciably transcribed.
So, basically, they directly compared the technique used in the ENCODE analysis (the "tiling" microarray analysis) to more modern deep sequencing methods, and found that the old results were mostly artifacts of the protocol. They also directly examined the pool of transcripts produced in specific tissues, and asked what proportion of them came from known genes, and what part came from what has been called the "dark matter" of the genome, or what has usually been called junk DNA. The cell's machinery to transcribe genes turns out to be reasonably precise!

No comments: