Database of cancer records now available for research

September 21, 2005

Data on more than 22,000 cancer cases are now available for research by bona fide clinical and medical researchers. This repository is the first major output of the Clinical e-Science Framework (CLEF), an e-Science project funded by the Medical Research Council (MRC). Sophisticated security systems, also developed by CLEF, ensure secure and ethical access to the databank. Dr Catalina Hallett will demonstrate the query of the new database at the e-Science All Hands meeting in Nottingham on 20 September.

Patient records contain a wealth of information that could be very useful to medical research. To make this information accessible to researchers, however, it must be extracted from what is often written text and presented in such a way that it can be compared with data from scientific and other databases. CLEF has developed techniques to capture relevant information from text automatically and enter it into a database. The project has also implemented stringent access control, authentication and secure transmission protocols using sophisticated encryption standards to protect against accidental disclosures.

Professor Alan Rector, CLEF's director, said: "The CLEF repository is optimised to treat electronic healthcare records as an interactive knowledge source for academic researchers and clinicians to help them access the latest medical information. Once fully deployed, it will lead to previously unthinkable, rapid advances in healthcare research by enabling researchers to analyse data stored in a wide range of geographically-spread databases, on-line."

Professor David Ingram's team at University College, London built the repository using a new method for importing and structuring data so that users can do population queries over longitudinal data sets. The CLEF repository supports the large-scale analysis of patient records in a Grid environment. It can handle complex queries, whilst retaining the critical semantic, structural and medico-legal integrity of the data.

The process, developed in part by Professor Alan Rector's team at the University of Manchester, structures the source data in multiple steps enabling users to put complex clinical questions to the repository. First data is structured in a longitudinal format, then by clinical context and finally by the actual type of data. Previously, the retrieval of similarly complex data would have required time-consuming manual search and data analysis. Using the work of Professor Rob Gaizauskas' team from the University of Sheffield, the CLEF system is able to extract key medical information from clinical records that are in a narrative format, for example medical letters, discharge summaries, radiology reports, etc.

A new, generic WYSIWYM ("What you see is what you mean") interface that was developed by Professor Donia Scott's team at The Open University enables users to pose complex clinical queries in natural language and receive answers in plain English text or simple tables and graphs. Users no longer need to learn "computer-speak" to communicate with an electronic database.

CLEF's future work includes extending its database and refining its use of knowledge resources to help both patients and professionals to access the right information and interpret scientific data. The project's aim is to provide user-friendly and secure tools to improve clinical and research practices, teaching methods and care management processes.
To attend the e-Science All Hands meeting, or the media briefing at 3pm on Wednesday 21 September, go to Conference website


Professor Alan Rector, Director of CLEF Project, Department of Computer Science, University of Manchester tel. +44 (0) 161 275 6188/6149 e-mail:

Dr Aniko Zagon, CLEF Industry Liaison, JEZZ Remedies Ltd. Tel: 07970 130 681 e-mail:

Judy Redfearn, e-Science/Research communications officer, JISC/e-Science Core Programme tel. 07768 356309 e-mail:


CLEF website
MRC website
UK e-Science Programme

Notes for editors

e-Science is the very large scale science that can be carried out by pooling access to very large digital data collections, very large scale computing resources and high performance visualisation held at different sites. A computing grid refers to geographically dispersed computing resources that are linked together by software known as middleware so that the resources can be shared. The vision is to provide computing resources to the consumer in a similar way to the electric power grid. The consumer can access electric or computing power without knowing which power station or computer it is coming from.

The UK e-Science Programme is a coordinated £230M initiative involving all the Research Councils and the Department of Trade and Industry. It has also leveraged industrial investment of £30M. The Engineering and Physical Sciences Research Council manages the e-Science Core Programme, which is developing generic technologies, on behalf of all the Research Councils.

The UK e-Science Programme as a whole is fostering the development of IT and grid technologies to enable new ways of doing faster, better or different research, with the aim of establishing a sustainable, national e-infrastructure for research and innovation. Further information at

Engineering and Physical Sciences Research Council

Related Medical Research Articles from Brightsurf:

Patients say ask before using medical records for research
A new study led by Michigan Medicine researchers finds that even when patients understand the overall benefit to society, they still want to be able to give consent at least once before their de-identified data is used for research.

Most patients willing to share medical records for research purposes
In a survey, UC San Diego researchers report most patients are willing to share medical records for research purposes, with a few caveats.

Tax hurts investment in medical device research and development
New Iowa State University research shows companies cut funding for research and development in response to a tax imposed on medical devices as part of the Affordable Care Act.

Centralized infrastructure facilitates medical education research
The Council of Academic Family Medicine Educational Research Alliance has enabled a large number of research teams to conduct meaningful scholarship with a fraction of the usual time and energy.

Sex, gender, or both in medical research
Only a minority of medical studies take sex and gender into account when analyzing and reporting research results.

Research!America to honor medical and health research advocacy leaders
Research!America's 21st annual Advocacy Awards will honor outstanding advocates for research whose contributions to health and medicine have saved lives and improved quality of life for patients worldwide.

Ohioans say it is important for the state to lead in education and medical research
An overwhelming majority of Ohio residents say it is important for the state to be a leader in education (89 percent) and in medical and health research (87 percent), according to a state-based public opinion survey commissioned by Research!America.

Medical research influenced by training 'genealogy'
By analyzing peer-reviewed scientific papers that examined the effectiveness of a surgical procedure, researchers at University of California, San Diego School of Medicine provide evidence suggesting that the conclusions of these studies appear to be influenced by the authors' mentors and medical training.

Diversity in medical research is a long way off, study shows
Despite Congressional mandates aimed at diversifying clinical research, little has changed in the last 30 years in both the numbers of studies that include minorities and the diversity of scientists being funded, according to a new analysis by researchers at UCSF.

Research!America to honor leaders in medical and health research advocacy
Research!America's 20th annual Advocacy Awards will honor exceptional advocates for research whose achievements in their fields have brought hope to patients worldwide.

Read More: Medical Research News and Medical Research Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to