New annotated database sifts through mountains of sequencing data to find gene promoters

December 22, 2010

Researchers at The Wistar Institute announce the release of an online tool that will help scientists find "gene promoters"--regions along a DNA strand that tell a cell's transcription machinery where to start reading in order to create a particular protein. The Mammalian Promoter Database (MPromDb) integrates the genome sequencing data generated at Wistar with publicly available data on human and mouse genomics. MPromDb pinpoints known promoters and predicts where new ones are likely to be found, the researchers say.

"Several complete genome sequences are available, including highly accurate assembled sequences from more than 1,000 individuals from the '1000 Genome Project,' with the goal of providing a comprehensive resource on human genetic variation and guiding us into the personal genomics era," said Ramana V. Davuluri, Ph.D., associate professor in Wistar's Molecular and Cellular Oncogenesis Program and associate director of The Wistar Institute Center for Systems and Computational Biology. "With this information, researchers can design personalized diagnostics and therapeutics or delve deeper into the study of gene regulation than previously thought possible."

Davuluri and his colleagues published details of how they built MPromDb in the journal Nucleic Acids Research, available online now.

Contrary to what was once the textbook view of genetics, one gene may not encode just one protein. In fact, scientists now know that a single gene may encode multiple versions of a given protein--called a protein's isoforms--which allows cells to make almost 100,000 distinct proteins even though our DNA only encodes about 20,000 protein-coding genes. As the body grows in the womb, cells may use different isoforms at different stages of development. Likewise, different adult cells may also use different isoforms of a protein depending on what type of cell it is, such as a neuron versus a skin cell.

"We have evolved this beautiful system where our DNA creates tremendous diversity from a limited set of genetic instructions," Davuluri said. "Recent evidence shows that at least half of all of our genes have alternative promoters that allow cells to make transcript variants and protein isoforms."

Earlier reports from the Davuluri laboratory showed that nearly 40 percent of genes use alternative promoters to create protein isoforms. According to Davuluri, integrating this information with data from other studies would surely find significantly more of these alternative promoters.

"Much of the genetic variations occur outside protein coding regions, such as gene regulatory regions," Davuluri said. "MPromDb provides context for data in the form of gene promoter annotations that can tell you where and when our bodies make a particular protein variant."

MPromDb mines its information from huge databases maintained by national and international consortiums of researchers, such as Gene Expression Omnibus (GEO) maintained by the National Center for Biotechnology Information and ENCyclopedia of DNA Elements (ENCODE) run by the National Human Genome Research Institute. Essentially, MPromDb looks for key DNA sequences that could be potential binding sites for Polymerase II, an enzyme that creates the RNA transcript that the cell later translates into protein. The current database contains information on over 42,000 human promoters found in six different cell types and over 48,000 mouse gene promoters found in 10 different cell types.

"In fact, scientists are so good at generating this sort of information using next generation sequencing methods, that they collect information far in excess of what they might need for a given experiment or project," Davuluri said. "This information all ends up in places like GEO, waiting to be discovered by groups like ours."

According to Davuluri, the Wistar Center for Systems and Computational Biology plans to expand MPromDb to include epigenetic data--information on modifications to DNA that affect gene regulation; protein-DNA interaction data; and genetic variation data for both humans and mice.
-end-
Funding for this project comes from the National Human Genome Research Institute of the National Institutes of Health, the American Cancer Society, and the Philadelphia Healthcare Trust. Davuluri holds the Philadelphia Healthcare Trust Professorship at The Wistar Institute.

Co-authors of this study include Wistar Center for Systems and Computational Biology researchers Ravi Gupta, Ph.D.; Anirban Bhattacharyya, Ph.D.; Francisco J. Agosto-Perez; and Priyankara Wickramasinghe, Ph.D.

The Wistar Institute

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.