This article is brought to you by DCVC (Data Collective), a leading deep tech and synthetic biology venture capital firm that backs entrepreneurs solving some the world’s hardest problems computationally — from industrial biotech to algorithmic drug discovery to smart agriculture. To explore the transformative companies in the DCVC portfolio, please visit https://www.dcvc.com/companies.html.
We are living in a digital world, and are producing data at exponential rates. We can hardly keep up; physical storage devices such as flash drives and hard drives have been largely replaced by cloud-based storage systems as a viable solution — yet even the cloud will fail to keep pace if data production continues to grow at its current rate. And, the cloud brings one disadvantage that will continue to grow: people do not physically own their data. As we enter the age where genome sequencing will become a normal part of preventive and personalized medicine, combined with swaths of personal data collected by health apps, data privacy and ownership are becoming significant concerns.
People want control over their data and who can access it, and the cloud, while it can store massive amounts of data like genome sequences and millions of data points collected by smartwatches, is met with increased speculation regarding data security. Flash drives and external hard drives are not the answer: the amount of data we are producing render them inefficient and expensive. We need a new solution, and that solution may be coming in the form of a biological molecule that has been with us since the beginning of time: DNA.
“What I’m really excited about .. is really the fact that DNA is the only solution out there right now with the potential to give people the ability to physically own their data when it comes to massive amounts,” says Hyunjun Park, co-founder and CEO of Boston-based Catalog.
DNA is seen by many as the ultimate solution to our data storage problem. It is one million times more information dense than flash drives, and data stored in DNA can be preserved for over 1,000 years (compared to a few years for most modern hard drives). Costs will be slashed by eliminating the need to use semi-trucks to physically transport data (which is actually cheaper than transmitting extremely large amounts of data over the internet).
It may also be easier to prevent or at least detect attempts to modify the data. The information density would permit thousands of copies to be made and stored in several different locations, making it virtually impossible to change every single copy — a characteristic that would be particularly important for sensitive information used in national security or forensics applications. And, as Park points out, it would likely be much easier to detect edits made to DNA, unlike magnetic tapes or flash drives, which can be seamlessly written over.
Too good to be true? Not exactly, although traditional approaches to storing data on DNA have been extremely slow and expensive. Historically, researchers have focused on de novo DNA synthesis, mapping the sequence of As, Ts, Cs, and Gs to the sequence of bits, synthesizing enough DNA to capture all numbers to be stored. Using this approach, new DNA molecules are made for every single piece of data that needs to be stored — an almost insurmountable problem in terms of time and money.
A new approach to storing data in DNA
Catalog, however, is breaking the mold. Instead of using de novo DNA synthesis, they are using a methodology that is akin building a printing press with typefaces. They generate large quantities of a small number of different DNA molecules, which can be arranged and rearranged into countless different combinations — a different combination for each piece of data being stored. This approach decimates the time and cost needed for storing data in DNA.
Park and his co-founder, Nathaniel Roquet, proved the viability of this approach about a month into IndieBio’s accelerator program in 2016. “We had entered the incubator with just an idea, two guys with really nothing else except for what we had in our heads, so it was up to us to really get the proof of concept done so that we could show everyone that this was actually something that was doable,” remembers Park.
Park and Roquet achieved their goal, encoding Robert Frost’s classic poem “The Road Not Taken,” — a total of 1kB of data. Because they didn’t have a liquid handling robot, they spent about four hours straight pipetting to make their dream a reality. Roquet even had to apply an ice pack to his wrist once the job was done. But the pain and effort was worth it.
Since those early days at IndieBio, Catalog has scaled up, successfully encoding The Hitchhiker’s Guide to the Galaxy in addition to the Robert Frost classic. Fortunately for Roquet’s wrist, the process is no longer manual either — instead, a liquid handling robot enables them to store much larger quantities of data on DNA. The two are currently in the process of scaling up their platform so that finally, a commercially viable solution to DNA data storage will be available. Their efforts are being made possible through $9 million in funding led by New Enterprise Associates (NEA) with participation from several other organizations, including Data Collective.
Their goals may seem like a pipe dream to many, but Catalog’s investors believe that Catalog is leading the way toward a sustainable, viable form of DNA data storage. In a recent press release, NEA General Partner Forest Baskett stated that “the Catalog team has unmatched scientific expertise paired with an entirely new approach, uniquely positioning them to make DNA data storage a viable solution in the near term.”
Perhaps, through Catalog’s work, we will soon be able to hold our own sensitive data, long relegated to enormous amounts of hard drive space or the mysteries of the cloud, in the palms of our hands.