UMMS scientists lead effort to annotate human genome

July 29, 2020

WORCESTER, MA - UMass Medical School researchers Zhiping Weng, PhD, and Jill Moore, PhD, and MD/PhD students Michael Purcaro and Henry Pratt are lead authors on the latest publication of data from the ambitious ENCODE project. Collaborating with other members of the ENCODE consortium, the UMMS team used computational biology to identify functional elements in the human genome.

These elements act as switches, controlling when and where genes are turned on and how they are tuned. Results from their data analysis, published in the latest issue of Nature, identified 926,535 human candidate cis-regulatory elements (cCREs), which are regions of noncoding DNA that control neighboring genes. The full data set is now available to scientists in visual form at, a web tool also developed by the team.

"There are 3 billion base pairs in our genome and not every one of them has a known function," said Dr. Weng, the Li Weibo Chair in Biomedical Research, professor of biochemistry & molecular pharmacology and director of the Program in Bioinformatics & Integrative Biology. "Identifying and annotating the specific regions of DNA that help control our genes is key to understanding the complexity of the genome and how it works."

Only about 20,000 genes make up the protein coding portion of the human genome. Genes can be thought of as the primary workhorses of the genome, carrying instructions for making proteins, the large, complex molecules that do most of the work in cells and that are required for the body's tissues and organs to do their respective jobs. Genes have been methodically studied down to the specific genetic code with which they encode their instructions. However, this leaves large swaths of DNA outside of these protein coding areas, many of which are known to affect health and promote disease.

"If our genome is like a car, then the protein coding part of the car is the engine," said Weng. "It propels us forward. How we control and make use of that engine--accelerating, turning, braking--is controlled by other mechanisms. In the genome, one family of these mechanisms is the cis-regulatory elements that promote and enhance, turn on or off, and fine-tune our genes."

Established in 2003, the ENCODE project--short for Encyclopedia of DNA Elements--is a global effort to understand how the human genome works. The goal is to develop an annotated encyclopedia of the functional elements--regions of DNA that code for molecular products or biochemical activities with roles in gene regulation--contained in the human genome. While much is known about protein coding genes, this only represents 2 percent of the entire genome. Far less is known about the other 98 percent of the genome, some of which helps control these genes. Working as an integral part of the ENCODE consortium during Phase III of the project, the UMMS team established a registry of a million candidate DNA "switches" from the human genome. This represents 7.8 percent of the genome that could potentially play an important role in how genes work.

The human body is made up of thousands of different cell types--liver cells, skin cells, neurons. Although all of these cells carry identical sets of DNA, these diverse cells carry out very different functions by using the information encoded in the genome differently. The DNA regions that turn genes on or off and tune the exact levels of activity are responsible for this diversity. They drive the formation of different cell types and control how they function in the body.

To find the different switches that lead to such a diverse array of cell types, the 500 plus scientists that make up ENCODE studied sets of biochemical features that are associated with the genetic switches that control genes. In total, researchers performed more than 6,000 biochemical experiments (4,834 involving human samples and 1,158 with mouse samples). They analyzed chromatin accessibility, histone modifications, DNA methylation, chromatin looping and a host of other assays, to pinpoint regions of the genome where chemical reactions associated with regulatory activity were occurring. Performed in more than 500 different cell types, these experiments yielded millions of locations in the human genome where these regulatory switches could potentially reside, from which the UMMS team established the Registry of cCREs.

The hope is that scientists will use these candidate areas to help establish potential links between regulatory switches and disease. For example, the ENCODE data could be employed to provide new insights into genome-wide association studies that connect areas outside of protein-coding genes that are associated with genetic diseases, explained Dr. Moore, a bioinformatician in the Weng Lab and project manager of the ENCODE Data Analysis Center.

Of the almost 1 million human cCREs identified, Weng and ENCODE collaborators tested 150 using functional assays to see if genetic changes in these areas might impact health. One area of interest, which resides near the neural gene AGAP1 and has been associated with schizophrenia, was shown to have regulatory activity in the brains of embryonic mice. Further functional testing can be performed on these elements to explore how and why they impact disease. Scientists can also use the candidate areas to compare against their genetic studies for health and disease. The Weng lab leads such effort in the PsychENCODE Consortium, a large-scale collaborative project like ENCODE that focuses on the role of regulatory elements in human brain development and psychiatric disorders.

To make use of all this data, Purcaro and Pratt developed online resources to share this information with members of the scientific community. SCREEN, short for Search Candidate cis-Regulatory Elements by ENCODE, allows scientists to visualize and interactively search the 926,535 human cCREs derived from the ENCODE data, along with ENCODE data and other rich annotations in more than one thousand biological samples.

"Over the last 10 years, genome-wide association studies into disease have identified many areas of potential interest outside of the protein coding genes," said Weng. "This tool gives scientists a new and powerful way to explore if some of those disease-causing areas of the genome are in regulatory regions."

The full ENCODE III findings are included in a collection of 14 papers in Nature, Nature Methods and Nature Communications, as well as a corresponding perspective piece in Nature by Weng and colleagues.

University of Massachusetts Medical School

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to