Imagine squeezing your entire digital archive—photos, videos, everything—into something smaller than a sugar cube. DNA data storage, still an emerging frontier, offers precisely this astonishing promise: storage density millions of times greater than traditional methods, unparalleled longevity, and significantly lower environmental costs. But until now, a glaring problem has stood between DNA storage and widespread practicality: painstakingly slow and error-ridden data retrieval. Researchers from Technion’s Henry and Marilyn Taub Faculty of Computer Science have just solved this, unveiling an AI-driven solution that's three orders of magnitude faster than previous methods—without sacrificing accuracy. The groundbreaking research was recently published in Nature Machine Intelligence, highlighting Technion's transformative AI-driven approach to DNA data retrieval
DNA’s appeal as a storage medium is nothing new. Back in 2013, scientists in Denmark managed to extract intact DNA from a horse bone over 700,000 years old, while more recently, an international team retrieved mammoth DNA more than a million years old. By comparison, magnetic disks—today’s predominant storage method—are lucky to last a few decades.
Beyond longevity, DNA is also extraordinarily energy-efficient and compact. Global data centers currently gobble up 3% of the world's electricity and pump out roughly 2% of its total carbon emissions. DNA storage, however, could compress data density to a point where a single megabyte container can theoretically store 100 terabytes—effectively a 100-million-fold improvement.
But despite these compelling advantages, real-world adoption has faced stubborn obstacles. Writing (synthesizing) and reading (sequencing) DNA data both remain error-prone and cumbersome processes. DNA sequences regularly suffer from insertion, deletion, and substitution errors. Plus, due to the limits of synthesis technology, multiple identical DNA sequences are stored together, unordered, with inevitable misreads and misplaced copies during retrieval.
The Technion team—including Ph.D. student Omer Sabary and researchers Dr. Daniella Bar-Lev, Dr. Itai Orr, Prof. Eitan Yaakobi, and Prof. Tuvi Etzion—set out to tackle these challenges head-on. Their solution? DNAformer: a revolutionary AI-driven approach that dramatically accelerates data reading speeds and vastly improves reliability.
DNAformer leverages transformer neural networks—a type of AI architecture adept at recognizing patterns in vast amounts of data—to reconstruct accurate DNA sequences from erroneous copies. To prepare for this ambitious task, the researchers trained DNAformer on simulated datasets, generated by a custom-built simulator specifically designed to mimic realistic DNA sequencing errors.
Beyond its core AI capabilities, DNAformer integrates a bespoke error-correction algorithm designed specifically for the quirks and complexities of DNA-based storage. Moreover, to tackle particularly troublesome data—highly noisy DNA sequences marred by severe errors—the method includes an advanced algorithmic safeguard. This added measure ensures robust, reliable results even under difficult sequencing conditions.
To showcase DNAformer’s potential, the researchers ran comprehensive tests involving a diverse, 3.1-megabyte dataset. It included a color photograph, a 24-second audio clip capturing astronaut Neil Armstrong’s historic lunar words, written material extolling DNA’s virtues as storage, and randomized data simulating encrypted or compressed information. DNAformer delivered breathtaking results, retrieving data at speeds approximately 3,200 times faster than existing high-accuracy methods—trimming the process from days to just ten minutes.
Remarkably, DNAformer also enhanced accuracy by up to 40% compared to other fast retrieval methods. Such combined improvements in speed and accuracy could finally make DNA storage a viable candidate for large-scale, practical use.
The Technion researchers emphasize DNAformer's scalability and flexibility, underscoring its potential for customization and future adaptation. Tailored versions could soon target specific market applications, adapting seamlessly as DNA synthesis and sequencing technologies evolve.
Supported by prominent grants from the European Research Council (ERC Grant, DNAStorage), the European Innovation Council (EIC Grant, Project DiDAX), and the Israel Science Foundation (ISF), this study marks a substantial leap forward. Thanks to Technion’s team and their AI-powered innovation, DNA-based data storage may soon leave the lab and make the jump into everyday tech reality.
Imagine squeezing your entire digital archive—photos, videos, everything—into something smaller than a sugar cube. DNA data storage, still an emerging frontier, offers precisely this astonishing promise: storage density millions of times greater than traditional methods, unparalleled longevity, and significantly lower environmental costs. But until now, a glaring problem has stood between DNA storage and widespread practicality: painstakingly slow and error-ridden data retrieval. Researchers from Technion’s Henry and Marilyn Taub Faculty of Computer Science have just solved this, unveiling an AI-driven solution that's three orders of magnitude faster than previous methods—without sacrificing accuracy. The groundbreaking research was recently published in Nature Machine Intelligence, highlighting Technion's transformative AI-driven approach to DNA data retrieval
DNA’s appeal as a storage medium is nothing new. Back in 2013, scientists in Denmark managed to extract intact DNA from a horse bone over 700,000 years old, while more recently, an international team retrieved mammoth DNA more than a million years old. By comparison, magnetic disks—today’s predominant storage method—are lucky to last a few decades.
Beyond longevity, DNA is also extraordinarily energy-efficient and compact. Global data centers currently gobble up 3% of the world's electricity and pump out roughly 2% of its total carbon emissions. DNA storage, however, could compress data density to a point where a single megabyte container can theoretically store 100 terabytes—effectively a 100-million-fold improvement.
But despite these compelling advantages, real-world adoption has faced stubborn obstacles. Writing (synthesizing) and reading (sequencing) DNA data both remain error-prone and cumbersome processes. DNA sequences regularly suffer from insertion, deletion, and substitution errors. Plus, due to the limits of synthesis technology, multiple identical DNA sequences are stored together, unordered, with inevitable misreads and misplaced copies during retrieval.
The Technion team—including Ph.D. student Omer Sabary and researchers Dr. Daniella Bar-Lev, Dr. Itai Orr, Prof. Eitan Yaakobi, and Prof. Tuvi Etzion—set out to tackle these challenges head-on. Their solution? DNAformer: a revolutionary AI-driven approach that dramatically accelerates data reading speeds and vastly improves reliability.
DNAformer leverages transformer neural networks—a type of AI architecture adept at recognizing patterns in vast amounts of data—to reconstruct accurate DNA sequences from erroneous copies. To prepare for this ambitious task, the researchers trained DNAformer on simulated datasets, generated by a custom-built simulator specifically designed to mimic realistic DNA sequencing errors.
Beyond its core AI capabilities, DNAformer integrates a bespoke error-correction algorithm designed specifically for the quirks and complexities of DNA-based storage. Moreover, to tackle particularly troublesome data—highly noisy DNA sequences marred by severe errors—the method includes an advanced algorithmic safeguard. This added measure ensures robust, reliable results even under difficult sequencing conditions.
To showcase DNAformer’s potential, the researchers ran comprehensive tests involving a diverse, 3.1-megabyte dataset. It included a color photograph, a 24-second audio clip capturing astronaut Neil Armstrong’s historic lunar words, written material extolling DNA’s virtues as storage, and randomized data simulating encrypted or compressed information. DNAformer delivered breathtaking results, retrieving data at speeds approximately 3,200 times faster than existing high-accuracy methods—trimming the process from days to just ten minutes.
Remarkably, DNAformer also enhanced accuracy by up to 40% compared to other fast retrieval methods. Such combined improvements in speed and accuracy could finally make DNA storage a viable candidate for large-scale, practical use.
The Technion researchers emphasize DNAformer's scalability and flexibility, underscoring its potential for customization and future adaptation. Tailored versions could soon target specific market applications, adapting seamlessly as DNA synthesis and sequencing technologies evolve.
Supported by prominent grants from the European Research Council (ERC Grant, DNAStorage), the European Innovation Council (EIC Grant, Project DiDAX), and the Israel Science Foundation (ISF), this study marks a substantial leap forward. Thanks to Technion’s team and their AI-powered innovation, DNA-based data storage may soon leave the lab and make the jump into everyday tech reality.