Bluesky Facebook Reddit Email

AI deciphers long-range DNA signals behind RNA splicing

06.30.26 | The Institute of Medical Science, The University of Tokyo

Apple iPhone 17 Pro

Apple iPhone 17 Pro delivers top performance and advanced cameras for field documentation, data collection, and secure research communications.

Accurate RNA splicing is essential for gene expression and human health, yet predicting how DNA sequence variations affect splicing remains a major challenge. Although recent artificial intelligence (AI) models have improved splice site prediction, many struggle to capture regulatory signals located thousands of DNA bases away from the sites they influence. This limitation restricts our ability to understand disease-causing mutations and the complex mechanisms governing RNA processing, particularly in disorders ranging from genetic diseases to cancer.

To address these challenges, Professor Kenta Nakai from the Human Genome Center, the Institute of Medical Science and Ms. Yuna Miyachi, a Ph.D. student, from the Department of Computer Science, Graduate School of Information Science and Technology, both at The University of Tokyo, Japan, developed SpliceSelectNet (SSNet), a hierarchical Transformer-based deep learning framework for splice site prediction. Their study, published in Nucleic Acids Research on June 22, 2026, introduces a computational approach capable of analyzing DNA sequences spanning up to 100,000 base pairs while maintaining single-nucleotide resolution. By combining local and global attention mechanisms, SSNet efficiently captures both nearby and distant regulatory signals that contribute to RNA splicing.

Many existing computational tools struggle to model long-range genomic interactions because the computational cost increases rapidly with sequence length. To overcome this limitation, SSNet divides long DNA sequences into smaller blocks, analyzes local patterns within each block, and then integrates information across the entire sequence through a hierarchical attention process. This design allows the model to preserve dense attention while remaining computationally efficient. In addition, the researchers enabled visualization of attention scores, allowing them to identify which DNA regions the model considered important during prediction.

The model was trained and evaluated using several large genomic datasets and benchmarked against leading splice prediction systems. Across multiple validation datasets, SSNet achieved state-of-the-art performance for splice site prediction and aberrant splicing detection. The researchers also showed that the model could capture the effects of distant regulatory sequences beyond the effective range of conventional convolutional neural network approaches. In simulations using the DMD gene and evaluations of pathogenic variants from ClinVar, SSNet maintained sensitivity to regulatory signals located many thousands of base pairs from the affected splice site.

" The key achievement of this work is that we successfully modeled ultra-long-range genomic interactions while preserving high computational efficiency and single-nucleotide resolution," says Prof. Nakai. " We also demonstrated that the regions highlighted by the model closely correspond to biologically meaningful regulatory elements, helping to bridge predictive accuracy and biological interpretability."

The study suggests that hierarchical Transformer architectures could become valuable tools beyond splice site prediction. The same framework may support future research into promoter-enhancer interactions, three-dimensional genome organization, and broader DNA language models. The researchers also expect opportunities for collaboration with researchers in clinical and genomic medicine, where the technology could help screen variants in non-coding regions that currently have uncertain significance. In pharmaceutical research, the approach could assist in designing oligonucleotide therapeutics that target abnormal splicing.

"Many existing AI models for DNA analysis were adapted from natural language processing, but DNA has fundamentally different properties ," explains Ms. Miyachi. " By redesigning the architecture to account for long-range genomic interactions and strict sequence resolution, we aimed to create a system better suited to biological reality."

By enabling accurate and interpretable analysis of genomic regions spanning up to 100,000 base pairs, SSNet represents a significant advance in computational genomics. Its ability to capture long-range regulatory signals while maintaining single-nucleotide precision provides a powerful new framework for studying RNA splicing, interpreting disease-associated variants, and advancing the development of precision genomic medicine.

***

Reference
Authors: Yuna Miyachi 1 and Kenta Nakai 1,2
DOI: 10.1093/nar/gkag625
Affiliations: 1 Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
2 Laboratory of Functional Analysis in Silico , Human Genome Center, the Institute of Medical Science, The University of Tokyo, Tokyo, Japan

About The Institute of Medical Science, The University of Tokyo
The Institute of Medical Science, The University of Tokyo (IMSUT), established in 1892 as the Institute of Infectious Diseases and renamed IMSUT in 1967, is a leading research institution with a rich history spanning over 130 years. It focuses on exploring biological phenomena and disease principles to develop innovative strategies for disease prevention and treatment. IMSUT fosters a collaborative, interdisciplinary research environment and is known for its work in genomic medicine, regenerative medicine, and advanced medical approaches like gene therapy and AI in healthcare. It operates core research departments and numerous specialized centers, including the Human Genome Center and the Advanced Clinical Research Center, and is recognized as Japan’s only International Joint Usage/Research Center in life sciences.

About Professor Kenta Nakai from the Institute of Medical Science, The University of Tokyo
Kenta Nakai is a Professor in the Department of Computer Science, Graduate School of Information Science and Technology, and the Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo, Japan. He earned his Ph.D. in Science from Kyoto University in 1992 and has since made significant contributions to bioinformatics and computational biology. In 2007, he served as President of the Japan Society of Bioinformatics. His research focuses on sequence analysis, genome analysis, and the development of computational methods for interpreting genome sequences. With an h-index of 51, his work has had substantial impact.

Funding information
This work was supported by JST SPRING (grant number: JPMJSP2108).

Nucleic Acids Research

10.1093/nar/gkag625

Computational simulation/modeling

People

SpliceSelectNet: A Hierarchical Transformer-Based Deep Learning Model for Splice Site Prediction

22-Jun-2026

No competing interests are declared.

Keywords

Article Information

Contact Information

Project Coordination Office
The Institute of Medical Science, The University of Tokyo
koho@ims.u-tokyo.ac.jp

Source

How to Cite This Article

APA:
The Institute of Medical Science, The University of Tokyo. (2026, June 30). AI deciphers long-range DNA signals behind RNA splicing. Brightsurf News. https://www.brightsurf.com/news/LN2GPP41/ai-deciphers-long-range-dna-signals-behind-rna-splicing.html
MLA:
"AI deciphers long-range DNA signals behind RNA splicing." Brightsurf News, Jun. 30 2026, https://www.brightsurf.com/news/LN2GPP41/ai-deciphers-long-range-dna-signals-behind-rna-splicing.html.