The ENCODE Project publishes new genomic insights in special issue of Genome Research

September 05, 2012

Genome Research publishes online and in print today a special issue dedicated to The ENCODE (ENCyclopedia Of DNA Elements) Project, whose goal is to characterize all functional elements in the human genome. Since the completion of the pilot phase of the project in 2007, covering 1% of the genome, The ENCODE Consortium has fanned out across the genome to study function and regulation on an unprecedented scale. This special issue presents novel findings, methodologies, and resources from ENCODE that bring extensive insight to gene regulation and set the stage for future discoveries. In addition, the issue also contains commentary and perspectives on how our views of the genome have changed as a result of The ENCODE Project. The entire issue will be freely available online on September 6 to coordinate with additional ENCODE Consortium publications in Nature, Genome Biology, and other journals.

1. GENCODE presents the most detailed annotation of the genome yet

From the completion of the pilot phase of The ENCODE Project in 2007, it has been evident that there is much more to a gene than the just a sequence that codes for protein, changing our concept of what defines a gene. We now know that the genome is not a set of discrete genes, but rather a complex system of genes and regulatory regions, much of which is transcribed into RNA, including many RNAs that do not code for protein but have critical cellular functions.

When The ENCODE Project was launched, a subgroup of the project called The GENCODE Consortium was established to accurately map and annotate these complex features across the human genome, by both manual curation and computational methods. In this special issue, Harrow and colleagues of The GENCODE Consortium present the latest release of GENOCDE gene data, describing a wealth of new information that exceeds the depth of annotation of other community resources.

Also in this issue are detailed reports of experimental validations to complement the GENCODE gene data and novel strategies for further annotating the genome. Howald and colleagues developed the RT-PCR-seq method to show that a substantial portion of exons, the protein-coding regions of genes retained by splicing, are not well annotated by unbiased RNA-sequencing alone, requiring a more targeted strategy in combination.

GENCODE has mapped more than 9,500 long non-coding RNA (lncRNAs), but up until now, only about 100 have been characterized with cellular function. lncRNAs, which are transcribed in a range of human tissues and play roles in gene regulation, are particularly interesting because they do not seem to be as well-conserved evolutionarily, in contrast to conservation of genes that code for proteins. Derrien et al. have analyzed the GENCODE lncRNA annotations, integrating the lncRNA data with other ENCODE transcriptome and epigenome data, presenting the most comprehensive lncRNA annotation to date. The authors show that approximately one-third of lncRNAs have arisen in the primate lineage, suggesting that there may be important lncRNA functions yet to be discovered.

References: 2. ENCODE studies clarify the murky world of RNAs

The ENCODE Project's efforts to annotate the genome include the sequencing of RNA, the message transcribed from DNA to code for proteins and perform other cellular functions. Splicing can produce different forms of action for that molecule that have varied biological functions but the mechanism and timing by which splicing occurs across the genome has remained poorly understood. Previous studies have shown that splicing can occur while the RNA is still being transcribed from its template.

Now, analyses by The ENCODE Consortium are shedding light on the scale of co-transcriptional splicing genome-wide. In this issue, Tilgner and colleagues analyzed sequencing data from RNA isolated in different regions of the cell, allowing them to define splicing events at different stages and measure which splicing events are occurring during transcription. They found that most RNAs are being spliced while they are transcribed, and interestingly, for lncRNAs, splicing occurs late, and in some cases, not at all.

In previous studies, researchers have found that another well-known class of small regulatory RNAs, called microRNAs (miRNAs), are in some cases generated by splicing (called mirtrons), in addition to the typical miRNA biogenesis pathway. Recently, hundreds of mirtrons were identified in model organisms, but the prevalence of mirtrons in mammals remained unknown. Utilizing the wealth of small RNA datasets produced by The ENCODE Consortium and specialized analysis tools, a study by Ladewig et al. in this issue identified more than 200 mammalian mirtrons, confirming some that had been previously identified and showing evidence for many more that have not been previously characterized, and revealing new insight into the evolution and biology of miRNAs.

References:3. New views of the genome's regulatory landscape

The ENCODE Project continues to illuminate the complex process of gene regulation and chromatin, the combination of DNA and protein that packages DNA in the nucleus. The scale of new data from The ENCODE Project is allowing more accurate characterization than ever of the factors that regulate gene expression. In this issue, Cheng and colleagues have applied a statistical model to the large-scale ENCODE gene expression and transcription factor binding datasets to assess the accuracy of gene expression prediction. Among a number of insights into the predictability of gene expression, their work suggests that gene expression differences in different cell lines are directly reflected in quantitative differences in transcription factor binding levels, challenging the classic "on" or "off" transcription factor binding model.

In addition to studies investigating the myriad transcription factors in the cell, researchers in The ENCODE Consortium are also investigating the function of specific factors genome-wide. Wang et al. present a genome-wide analysis in diverse cell types of the binding pattern of CTCF, a well-known insulator that can suppress the effect of regulatory enhancers on its target gene when bound, playing a role in a number of fundamental genomic processes. The team found that the binding pattern of CTCF is surprisingly plastic yet reproducible, and is significantly different between normal and immortal cells, a finding that could have important implications in cancer.

ENCODE studies are spurring the development of new methods to integrate large genome-wide datasets of different types and to overcome the limitations of current techniques. For example, to investigate the relationship between nucleosome remodeling, histone modifications, and transcription factor binding that governs gene regulation, Kundaje and colleagues have developed a new tool called the Clustered Aggregation Tool (CAGT). The method was applied to datasets of chromatin marks and transcription factor binding to generate an extensive catalog of histone modifications and nucleosome positioning around bound transcription factors. The analysis indicated that both histone modifications and the positions of nucleosomes around transcription factor binding sites are highly heterogeneous, a surprising finding that suggests the features of many regulatory elements are asymmetrical.

References:4. Regulatory variation and the genetic basis of disease

The data and analyses of The ENCODE Project will help the research community to not only understand genome function, but also disease, with the aim of designing new strategies of treatment and prevention. Much effort in the last decade to understand the genetic basis of disease has been through genome-wide association studies. Many genetic variants found to associate with disease lie in non-coding regions and are relatively common in the population. This challenge in interpreting the data has highlighted the need to understand the influence of genetic variation on the function of genes and regulatory regions.

Two studies in this special ENCODE issue take a step forward in this effort, analyzing the potential functional consequences of individual genetic variants. In a paper from Vernot and colleagues, the most comprehensive assessment of human regulatory variation yet is presented by analyzing regulatory regions marked by DNase I hypersensitivity, an experimental property that indicates gene activity, and the whole-genome sequences of 53 people. The authors found that individuals are more likely to have functionally relevant variants in regulatory regions of DNA compared to protein-coding regions and provide further insights into patterns of regulatory variation at the individual and population levels.

The second study, by Boyle et al., utilized RegulomeDB, a database of ENCODE regulatory data among other sources, to analyze 69 whole-genome sequences and "score" genetic variants to isolate those that may be functionally important. The team identified thousands of potentially functional regulatory variants and estimate that the human genome harbors as much, if not more variation in regulatory regions and than protein-coding DNA. The authors expect this resource to facilitate the annotation of human genome sequences.

Please direct requests for pre-print copies of the manuscripts to Peggy Calicchia, Administrative Assistant, Genome Research (; +1-516-422-4012). In addition to the 10 articles highlighted above, the following will also appear in the issue:About Genome Research:

Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.

About Cold Spring Harbor Laboratory Press:

Cold Spring Harbor Laboratory is a private, nonprofit institution in New York that conducts research in cancer and other life sciences and has a variety of educational programs. Its Press, originating in 1933, is the largest of the Laboratory's five education divisions and is a publisher of books, journals, and electronic media for scientists, students, and the general public.

Genome Research issues press releases to highlight significant research studies that are published in the journal.

Cold Spring Harbor Laboratory

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to