Bluesky Facebook Reddit Email

DNA palette code for time-series archival data storage

10.24.24 | Science China Press

Celestron NexStar 8SE Computerized Telescope

Celestron NexStar 8SE Computerized Telescope combines portable Schmidt-Cassegrain optics with GoTo pointing for outreach nights and field campaigns.


This study is led by Dr. Yingjin Yuan (Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China) and Dr. Xiaoguang Tong (Department of Neurosurgery, Huanhu Hospital, Tianjin, 300350, China). The research team focused on storing brain magnetic resonance imaging (MRI) data from clinical patients and proposed an innovative coding scheme enabling efficient and reliable storage of digital information in DNA.

Brain MRI is highly valuable in clinical diagnosis, surgical planning, and treatment evaluation due to its non-invasive and high-precision nature. Throughout a patient's treatment journey, ongoing follow-ups and comparative analysis of historical images are crucial for detecting subtle yet significant changes in the patient's condition, thereby supporting personalized and precise medical interventions. For some patients, these datasets require stable storage for several decades or more. However, the large volume and extended retention periods of such medical data challenge current storage technologies. Similar time-series archival characteristics are observed in other scientific datasets, such as meteorological observations and planetary exploration data. These datasets, generated from continuous monitoring of specific subjects or regions, also require long-term, stable storage solutions.

DNA is considered a promising medium for addressing these data storage challenges. However, DNA-based data storage systems face several technical obstacles, including burst errors during DNA synthesis and sequencing, as well as the disordered spatial arrangement of the oligonucleotides (oligos). These issues result in various error types, random orders, and high error rates, which directly impact the accuracy and reliability of DNA-based data storage.

The research team provided an innovative coding scheme called DNA Palette. Tailored for the characteristics of time-series archival data, the DNA Palette code uses unordered combinations of index-free oligos to construct a bijective mapping between binary information and oligos. Results from in vitro storage experiments using clinical brain MRI data, along with large-scale simulation tests conducted on public MRI datasets (over 30,000 files, 10 GB), planetary science datasets, and meteorological datasets, demonstrate that the DNA Palette code offers high net information density, broad applicability, and data recovery capabilities under low sequencing coverage rates.

National Science Review

10.1093/nsr/nwae321

Keywords

Article Information

Contact Information

Bei Yan
Science China Press
yanbei@scichina.com

How to Cite This Article

APA:
Science China Press. (2024, October 24). DNA palette code for time-series archival data storage. Brightsurf News. https://www.brightsurf.com/news/1EOJ502L/dna-palette-code-for-time-series-archival-data-storage.html
MLA:
"DNA palette code for time-series archival data storage." Brightsurf News, Oct. 24 2024, https://www.brightsurf.com/news/1EOJ502L/dna-palette-code-for-time-series-archival-data-storage.html.