Little did you know that the fate of the cute cat videos you post on Facebook is on magnetic tape stored at the bottom of a long, dark corridor in a data warehouse in California’s central valley. But the tapes last only about 10 years, and to save your kitties from being erased from history forever, they must be transferred to new tapes. Yes, this is the current state-of-the-art for long-term data storage, and it hasn’t changed in 50 years. But what if we could reinvent long-term physical data storage from the ground up? That’s where DNA and synthetic biology comes in.
The concept is simple: store 0s and 1s in the sequences of A’s, T’s, C’s, and G’s of DNA. But why DNA? DNA is the ultimate storage molecule. As the basic blueprint for all life, it has been storing data for eons. DNA is incredibly stable and easy to preserve: we’ve recovered readable DNA from a 28,000 year-old mammoth carcass. It is also incredibly space-efficient. A single gram of DNA can store almost a zettabyte of digital data, meaning that all of the digital data currently on Earth could be stored in less than twenty grams of DNA. We simply need to capitalize on what nature has already perfected.
The field of DNA data storage has been exploding ever since, but what’s yet to be seen is a fully working prototype that will both write and read DNA at a rate as fast as, yet cheaper than, digital methods of storage. Both industry and investors have picked up the gauntlet, working toward making DNA a commercially viable solution for long-term data storage.
“There has been noticeable acceleration in the pace of advances on encoding and decoding,” says David A. Markowitz, a Program Manager at the Intelligence Advanced Research Projects Activity (IARPA), which invests in high-risk, high-payoff research programs tackling the intelligence community’s most difficult challenges (such as data storage). But, he adds, “There hasn’t been a prominent publication by anybody who has built an actual device that is likely to have real utility for archival storage and retrieval.”
Markowitz, who runs applied R&D programs at the interface between biology and engineering, first started exploring the molecular information storage, or MIST, space in 2015, and was soon led to the Semiconductor Research Corporation (SRC), which at the time had just started putting together a roadmap that had DNA data storage as an initial focus area. Over the next two years, Markowitz worked with the SRC to assemble a community of stakeholders in industry, academia, government, and venture capital primed to tackle data storage with DNA. Markowitz says that when they started, there was only one startup company working on DNA data storage — now there are around 15.
The community continues to grow, and IARPA remains dedicated to supporting the movers and the shakers — they just released a solicitation for their MIST program, which is working to demonstrate molecular information storage and retrieval devices with performance suitable for practical applications (such as a “DNA hard drive). But why would a government program be interested in supporting industry and academia? There is currently a large resource barrier associated with developing devices that are practically useful for real world storage applications, and industries tackling this issue are engaging in a high-risk, high pay-off undertaking. According to Markowitz, such high-risk, high-payoff projects are the sweet spot for IARPA, who works to de-risk these technologies and make them more attractive for private investment so that industry can “further develop, produce and ultimately sell next generation storage devices back to the government.”
The SRC, which along with IARPA played a critical role in building the molecular information storage community, is also actively supporting growth of the sector through the Semiconductor Synthetic Biology for Information Processing and Storage Technologies (SemiSynBio) Program. A partnership with the National Science Foundation and IARPA, the program aims to support interdisciplinary research programs that will optimize DNA data storage through the intersection of the synthetic biology and semiconductor industries.
Victor Zhirnov, SRC’s Chief Scientist and Director of the Program, says that the SRC is investigating DNA data storage because it has a potential to sustain the ongoing exponential growth in production and use of data. Current technologies will face physical limits and recourse limitations in the next 10-15 years, thus their sustainability might be difficult, which puts us at the cusp of a serious data storage problem. With programs like IARPA’s MIST and the SRC’s SemiSynBio making significant contributions to the sector’s growth, Zhirnov expects commercially viable DNA data storage solutions in the next ten years.
According to Zhirnov, two of the industry leaders in the DNA data storage sector are Twist Bioscience and Microsoft, who late last year announced revolutionary partnership to produce 10 million base pair oligonucleotides — an unfathomable length using current technologies — for storing data on DNA. In a blog post announcing the partnership, Twist stressed the fact that, because of our inability to keep pace with the amount of digital data we are producing, “If civilization ended today, our Information Age would leave no relics. Our successors would hardly recognize that the 21st century was a time of unprecedented production and consumption of digital data—let alone the data’s purpose or its significance to our society. That, indeed, would be a sad loss of history and accomplishment.
Catalog is another revolutionary company tackling the major challenges facing the DNA data storage sector. They are addressing the currently slow-as-molasses methodology of translating data into DNA by creating the molecular world’s version of typeface. They create large amounts of a small number of DNA molecules, which can be arranged into countless combinations — each piece of data coded into a unique combination.
Also critical to successful DNA data storage is the quality of the DNA used as a storage medium. Currently, the production of high-fidelity DNA at scale — critical for data storage applications — is an unmet goal. Several companies in the DNA synthesis space are tackling the problem from multiple angles. For example, Evonetix is using a novel silicon-chip based technology to significantly increase throughput and fidelity, while Molecular Assemblies is developing a state-of-the-art enzymatic catalysis technology to achieve longer, higher fidelity DNA — at drastically reduced costs and without the toxic byproducts that accompany chemical catalysis methods.
Several investors are also pushing the DNA data storage (and synthesis) sector ever forward. The Data Collective portfolio boasts Catalog, Evonetix, and Molecular Assemblies, while Illumina Ventures has invested in Twist Bioscience and DNA Script. SOSV has invested in Helixworks Technologies and Kilobaser. And, Tech Coast Angels invested in Iridia with Synthomics receiving funds from TEEC Angel Fund.
There are still many challenges facing us as we enter the age of the DNA hard drive. But with dedicated organizations like IARPA and the SRC, industry leaders like Twist, Microsoft, and Catalog, and investors like the Data Collective working together, our Facebook kitties may soon be immortalized in the same material each of us carries around in our bodies’ cells: DNA.
Victor Zhirnov of the SRC and David Markowitz of IARPA will be speaking at a DNA data storage session and workshop at SynBioBeta 2018. They, along with the other panelists, will share the latest breakthroughs in DNA data storage and DNA synthesis, and share a vision for how DNA could become your failsafe, lossless backup.