Mining biotech's data mother lode

December 20, 2005

The BioGrid project brought together six partners from the UK, Germany, Cyprus and The Netherlands to address one of the key problems facing the life sciences today.

"How to integrate the huge volume of disparate data - on gene expression, protein interactions and the vast output of literature both inside and outside laboratories - to find out what is important," says Dr Michael Schroeder, Professor of the Bioinformatics group at Dresden Technical University and coordinator of this IST-funded project. "I attended a workshop recently, held by the W3 consortium, and many of the companies there said that this was the biggest problem they face."

Currently, pharmaceutical and biotech companies produce vast quantities of raw data on the problems that interests them. Microarrays process thousands of samples to discover what genes are over expressing. These over-expressing genes - numbering sometimes in their thousands, too - create proteins. The researchers then need to discover what protein interactions are taking place among all the different proteins created by the over-expressing genes. This is not trivial.

If a researcher can identify protein interactions they then need to do a search on their company intranet to see what other work company labs have produced relevant to the topic. Finally, the researcher must perform a search of academic journals to find relevant journal papers. Currently PubMed, the most important public literature database available, has 15,000,000 entries, and the number is growing every day. Finding relevant data there is again not a trivial task.

Dr Schroeder gives an example. "The medical faculty here were studying pancreatic tumours. They found 1,000 genes over expressing. Using our software they were able to find, among others, three protein interactions that were particularly relevant. Using our literature search ontology they were able to discover that two of these interactions were novel. They are now going to study these novel interactions more closely," he says.

BioGrid explained This is how the project will help companies integrate all the data they need to make relevant discoveries using a BioGrid. A BioGrid is essentially a data and computational Grid created through a suite of tools developed by the project.

Here's how it works. One element of the software suite analyses over-expressing genes discovered during micro assays to establish what proteins become encoded. This uses standard techniques.A second analysis tool in the suite predicts what possible protein-protein interactions are taking place. This is novel. When a gene encodes a protein, the protein folds up into a unique shape, forming a 3D structure. This structure can only interact, or fit, with some proteins, but not others, like pieces of a jigsaw puzzle.

BioGrid's protein interaction software includes a database of the 20,000 known protein structures and uses that database to identify which ones could potentially interact, among the thousands of proteins created by the over-expressing genes. Once interesting potential protein interactions are known, BioGrid's ontology-based search technology can mine company or journal data for any relevant information.

Linking all these software tools together is a rules-based Java scripting language called Prova, also developed by the BioGrid team. It is the glue the sticks the Gene Expression, Protein Interaction and ontology-based literature analysis together into an integrated, cohesive unit. "It's an open source language, available at www.prova.ws, and about 20 groups are using it around the world right now. We made it open source because you need to develop a community to keep a programming language alive," says Dr Schroeder.
-end-
Contact:
Michael Schroeder
Professor in Bioinformatics
Biotec/Dept. of Computing, TU Dresden
Germany
Tel: +49-351-46340060
Email: ms@biotec.tu-dresden.de Source: Based on information from BioGrid

IST Results

Related Proteins Articles from Brightsurf:

New understanding of how proteins operate
A ground-breaking discovery by Centenary Institute scientists has provided new understanding as to the nature of proteins and how they exist and operate in the human body.

Finding a handle to bag the right proteins
A method that lights up tags attached to selected proteins can help to purify the proteins from a mixed protein pool.

Designing vaccines from artificial proteins
EPFL scientists have developed a new computational approach to create artificial proteins, which showed promising results in vivo as functional vaccines.

New method to monitor Alzheimer's proteins
IBS-CINAP research team has reported a new method to identify the aggregation state of amyloid beta (Aβ) proteins in solution.

Composing new proteins with artificial intelligence
Scientists have long studied how to improve proteins or design new ones.

Hero proteins are here to save other proteins
Researchers at the University of Tokyo have discovered a new group of proteins, remarkable for their unusual shape and abilities to protect against protein clumps associated with neurodegenerative diseases in lab experiments.

Designer proteins
David Baker, Professor of Biochemistry at the University of Washington to speak at the AAAS 2020 session, 'Synthetic Biology: Digital Design of Living Systems.' Prof.

Gone fishin' -- for proteins
Casting lines into human cells to snag proteins, a team of Montreal researchers has solved a 20-year-old mystery of cell biology.

Coupled proteins
Researchers from Heidelberg University and Sendai University in Japan used new biotechnological methods to study how human cells react to and further process external signals.

Understanding the power of honey through its proteins
Honey is a culinary staple that can be found in kitchens around the world.

Read More: Proteins News and Proteins Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.