All the information contained in a big data center could be stored in a few grams of DNA, but the technology is still too costly to implement. Researchers from Harvard and Technicolor research lab described a new, cheap, enzymatic method to encode information in DNA molecules. This study, published in Nature Communications, shows an alternative way of coding information and reading it using Nanopore sequencing.
DNA storage: origins and progress so far
The idea of storing digital data in DNA sounds a bit like science fiction. But in fact, it is a thought that has been around for a while now. The Russian physicist Mikhail Neiman first proposed the idea in 1964, when he described how to use nucleic acids for information storage. In 2012 Church, Gao, and Kosuri encoded a 5.27 book in DNA microchips, and the following year scientists from the European Bioinformatics Institute and Agilent published the storage and data retrieval of Shakespeare’s sonnets, an audio clip from Martin Luther King’s “I Have a Dream” speech, Watson and Crick’s paper on the structure of DNA, a photo, and an explanatory file.
Defining the medium and exact way of storing DNA information is one of the biggest challenges the technology has to face. Cells store their DNA information in chromosomes. And artificial chromosomes are possible to make, but is this the ideal storage medium for digital information? Leslie Mitchell, President and cofounder of Neochromosome, a NY-based artificial chromosome company doesn’t think so. “I don’t think the information storage sector has any reason to build chromosomes unless they want to also propagate the DNA inside cells. But then the sequence will be subject to the mutation rate of the organism.”
In 2016, a team of scientists from the University of Washington and Microsoft described a DNA storage system that gives random access and allows the retrieval of a specific file or folder (as opposed to reading the whole DNA sequence until finding the desired piece of information). In May this year, the same research group built an automated DNA encoding, retrieval, and data storage device (a photograph of which can be seen below). Professor Luis Ceze, senior author of this study, notes that “scaling throughput for DNA write and read and overall system integration at a massive scale” was the biggest technological challenge.
A photograph of a complete automated DNA storage system. Image credit: Takahashi et al, 2019, Scientific reports (CC BY 4.0)
Enzymatic DNA synthesis for cost reduction
DNA synthesis cost appears to be the main limitation in adopting the technology. Stephane Lemaire and Pierre Crozet from IBPC, Paris, speculate that “to enable commercially viable DNA data storage solutions, the cost has probably to decrease by several orders of magnitude. And it is not yet clear which technology will enable this breakthrough”.
George Church’s group from Harvard may have a solution. In an article recently published in Nature Communications, the researchers described a method for enzymatic synthesis of DNA specifically for data storage. Their reasoning is counterintuitive: they opted for a synthesis mechanism with much less precision than standard chemical synthesis or typical DNA polymerization. Henry Lee, the lead author of this study, recalls: “We reasoned that strict single-base precision, which is the primary goal when synthesizing biological sequences, would not be required when storing non-biological information.”
Lee and his collaborators used an enzyme called terminal deoxynucleotidyl transferase. This enzyme, in the condition used, adds a random amount of nucleotides of a specific type to the end of a sequence. The researchers reasoned that the information code would rely not on the nucleotide sequence but on the transition from one base to the next. This means that a series of adenines followed by a series of thymines would be a digit, while a series of guanines following the same adenines would be another digit. Henry Lee explains that they had to develop a new ternary code (as opposed to the 01 binary code) because “it allowed us to maximize information density per nucleotide.”
The researchers present the encoding principles, the data analysis and proofreading statistics – to ensure that no information is lost – and the optimal way to read the information (Nanopore sequencing had the best results). Luis Ceze, who was not involved in this work, explains that “the observation that controlling transitions is easier and synergistic with nanopore read-out is also a significant aspect of the work, since it determines how information is encoded.”
Enzymatic DNA synthesis has a lot of promise. Emily Leproust, CEO of Twist Bioscience, notes that “there is a lot of excitement around enzymatic synthesis – not just for data storage but for DNA synthesis as well, as it could significantly decrease turnaround time”.
Church, the senior author of the Nature Communications study, notes that enzymatic synthesis has an unparalleled advantage if the application requires the stored DNA interaction with living cells, “where small size and low energy can be crucial. Toward this end, we recently published in Science a demonstration of storing 4 trillion bits of information spread out over every cell in the body of an animal (only one billionth of the total body mass)”.
Will DNA storage replace our silicon hard drives?
There is a lot of excitement on the potential of using DNA as a storage medium. But how can such a disruptive technology be incorporated in our current IT infrastructure? Lemaire and Crozet have a hypothesis: “DNA based storage is currently adapted for long term storage of large amount of data. Nevertheless, this represents the vast majority of digital data stored in the world, the archives: we want them stored securely (controlled accessibility and long term conservation) while being accessible when necessary if ever (Write Once Read Never)”.
Ceze agrees: “The first natural use is in archival storage, at the bottom of the storage hierarchy. These are for data with frequent writes but infrequent reads and tolerant for high-latency access”.
Leproust, however, says that “there are applications for both cold and hot data storage that would reduce the total cost of ownership compared to what will be available from magnetic storage”. And as for the timeline? “We believe that a 3-5 year timeline to bring DNA data storage to commercial viability” she says.
I am looking forward to see a DNA hard drive connected to my laptop. There are still technological challenges and high application costs to be resolved. But DNA, nature’s universal information storage material, will definitely have a central role in future, information-driven societies.
DNA data storage is a key theme at SynBioBeta 2019 October 1-3 in San Francisco. Register today to secure your spot among the thought leaders, entrepreneurs, and investors leading the charge to a future written in DNA.3