Synthetic biology is rooted in the idea that engineering principles of standardization, modularity, and abstraction can be applied to biology. Despite the messy reality of biological systems, the field has advanced these principles. Open source is another principle more commonly seen in engineering that can significantly impact how synthetic biology is done.
“Why we have this open source conversation in synthetic biology is because synthetic biology borrows a lot from the language of software,” said Philipp Boeing, co-founder of UK-based company Bento Lab. At least in theory, “we can have the code, the program, and the license, and all of that.”
Open synthetic biology has gained momentum in recent years. Response to the COVID pandemic highlighted the value of frictionless collaboration possible when people share ideas openly, noted Jennifer Molloy, a senior research associate at the University of Cambridge.
In synthetic biology, the cost of hardware is often a strong first barrier. Open-source hardware can allow underfunded academic labs and synthetic biology startups to keep operations lean. Examples of these include OpenFlexure Microscope, a 3D printed microscope, and open source platforms for DNA sequencing and synthesis.
Then there are hardware that aren’t necessarily open source but are open access or support open protocols. Bento Lab makes a PCR workstation that isn’t open source, but the company offers several open source DNA extraction kits, including for educational purposes. “We believe that openness is really important to science and engineering,” said Boeing.
The PCR workstation shows how open access can improve a product. Originally conceptualized as a compact unit that a lab could put in some spare space or use to train students, researchers quickly reimagined it for its portability. Researchers are using these devices to diagnose infectious diseases in the field and sequence fish species in the Amazon Rainforest. Protocols for these are often shared openly.
Similarly, the modularity of the OpenFlexure Microscope has allowed researchers to build microscopes based on its design for applications like water quality testing and assessing soil health. “It's because they can take these modules and kind of hack them together, said Molloy. The open source nature essentially “creates a whole ecosystem of tools, rather than just the one, that people are also making openly available.”
Another open hardware that has gained traction recently, mainly due to growing interest in precision fermentation and sustainable proteins, is small bioreactors. For example, the cellular agriculture research institute New Harvest shared open source designs for a perfusion bioreactor for 3D tissue culture.
Beyond equipment, synthetic biology labs require access to cells and strains, and researchers are opening these up as well. For instance, Molloy’s research group is working on an open source E. coli expression strain. While “these workhorses have been around since the 80s, they're still encumbered effectively due to the material transfer agreements even though the patents expired 20 years ago,” said Molloy.
Likewise, the Open Yeast Toolkit offers free and redistributable genetic parts to help researchers and startups start engineering Saccharomyces cerevisiae and Pichia pastoris. Molloy added that AI will advance the trend of putting strains and molecules relevant to synthetic biology out in the open.
Earlier this year, Berkeley-based startup Profluent released OpenCRISPR 1, an open source alternative to CRISPR developed with generative AI. “It is a CRISPR kind of homolog that is different enough to the patented CRISPR that you can possibly use it without infringing the CRISPR patents,” said Molloy. She added that researchers, including her, are still investigating its added that she and others are still investigating its applicability.
Synthetic biologists are also looking to tap into the diversity of non-model microbes. However, “the primary bottleneck for working with non-model microbes is the lack of fundamental genetic tools and lab protocols, which are laborious and costly to develop,” said Nili Ostrov, CSO of Boston-based biotech Cultivarium. To address this gap, Cultivarium is developing open source tools that reduce the time and cost of engineering non-model organisms.
One of these tools is a software package that identifies active methylation patterns and restriction-modification systems in bacterial genomes. “This is helpful when developing protocols for DNA delivery into non-model microbes as the existence of R M systems is oftentimes a barrier to introduce recombinant DNA or phage infection,” said Ostrov. Other tools from Cultivarium help researchers pick the right plasmid and selective marker for a non-model organism and predict microbial growth conditions required to culture non-model microbes from their amino acid composition.
While Cultivarium’s tools help researchers with the wet stuff, other open software is focused on the in silico aspects.
As in computer science more broadly, bioinformatics and computational biology researchers have long openly shared their code. This trend continues with AI for biology, most prominently in proteomics. For instance, the protein structure prediction AlphaFold benefitted from open source work in both biology and machine learning. However, unlike its previous iterations, AlphaFold 3 is a closed source.
Researchers are also developing open source alternatives like OpenFold to predict protein folding. “One major advantage today when using OpenFold is that the system is faster and more efficient,” said Nazim Bouatta, a senior research fellow at Harvard Medical School and OpenFold contributor. Consequently, the runtime is shorter, and larger proteins can be folded on individual GPUs. “This has been achieved precisely because we took advantage of open source software.”
“We have released the full code, which means that OpenFold is trainable. You can take the data, and you can retrain the system from scratch,” said Bouatta. This is another difference between OpenFold and AlphaFold, and, as with hardware, the freedom to play around permits others to build on a system’s capabilities. “The advantage of being able to retrain is that you can explore new ideas, and OpenFold has been used by both academics and industry to explore new modalities,” added Bouatta.
Another use case at the intersection of machine learning and proteomics is the use of generative AI to create novel proteins, as illustrated by OpenCRISPR 1 mentioned earlier. “Having learned from all of the examples of proteins that we know to date, it's able to generate entirely novel proteins in much the same way as an image generative model can generate new faces of humans who never existed,” explained Gevorg Grigoryan, CTO of Cambridge based biotech Generate Biomedicines.
The company released its generative AI platform for protein design as an open source tool. “We wanted to communicate to the field how we thought protein design and generation tasks should be solved in this reality where machine learning capabilities are becoming broadly accessible,” said Grigoryan. This democratizes protein design by allowing researchers who aren’t necessarily experts in proteins, like materials scientists, to generate proteins for their tasks.
Researchers are developing a range of DNA and protein language models for different purposes. Luxembourgish startup Helical brings them together under a unified framework. “It doesn't matter which model you use, you can always use the same four lines of code in our platform,” said Rick Schneider. It is an open-core platform, which means only part of it is open source. The open source component lets users benchmark different models against each other for their use cases.
For both large language models and DNA/protein language models, expanding the length of a context window is an active area of research. “DNA is 3 billion nucleotides. If you only look at 25,000 nucleotides at a time, you might miss important interactions that happen between parts of the genome millions of nucleotides away from each other,” Schneider explained. Schneider added that open source development would be key to overcoming this challenge.
Beyond structure prediction and language models, researchers and companies are also developing open software for day-to-day lab operations. These include tools to plan experiments, analyze plates, and automate liquid handling, among other routine tasks.
In information technology, “most commercially available software has a foundation in open source software with extra parts,” said Molloy. Generative Biomedicines and Helical are examples of that in synthetic biology but companies will need to figure out ways to replicate that at scale as well as do that for hardware.
Moreover, to advance innovation in synthetic biology, hardware and software should be interoperable. In other words, users should be able to combine them with other tools as they see fit. Molloy and Boeing stressed that the repairability of hardware is also critical, particularly in supply chain-constrained regions. Open frameworks push the ecosystem of synthetic biology tools towards improved interoperability and reparability.
Lastly, collaboration is essential to building a sustainable and equitable bioeconomy, and open source is inherently collaborative. And that’s why there needs to be greater engagement with open-source projects in synthetic biology.
Synthetic biology is rooted in the idea that engineering principles of standardization, modularity, and abstraction can be applied to biology. Despite the messy reality of biological systems, the field has advanced these principles. Open source is another principle more commonly seen in engineering that can significantly impact how synthetic biology is done.
“Why we have this open source conversation in synthetic biology is because synthetic biology borrows a lot from the language of software,” said Philipp Boeing, co-founder of UK-based company Bento Lab. At least in theory, “we can have the code, the program, and the license, and all of that.”
Open synthetic biology has gained momentum in recent years. Response to the COVID pandemic highlighted the value of frictionless collaboration possible when people share ideas openly, noted Jennifer Molloy, a senior research associate at the University of Cambridge.
In synthetic biology, the cost of hardware is often a strong first barrier. Open-source hardware can allow underfunded academic labs and synthetic biology startups to keep operations lean. Examples of these include OpenFlexure Microscope, a 3D printed microscope, and open source platforms for DNA sequencing and synthesis.
Then there are hardware that aren’t necessarily open source but are open access or support open protocols. Bento Lab makes a PCR workstation that isn’t open source, but the company offers several open source DNA extraction kits, including for educational purposes. “We believe that openness is really important to science and engineering,” said Boeing.
The PCR workstation shows how open access can improve a product. Originally conceptualized as a compact unit that a lab could put in some spare space or use to train students, researchers quickly reimagined it for its portability. Researchers are using these devices to diagnose infectious diseases in the field and sequence fish species in the Amazon Rainforest. Protocols for these are often shared openly.
Similarly, the modularity of the OpenFlexure Microscope has allowed researchers to build microscopes based on its design for applications like water quality testing and assessing soil health. “It's because they can take these modules and kind of hack them together, said Molloy. The open source nature essentially “creates a whole ecosystem of tools, rather than just the one, that people are also making openly available.”
Another open hardware that has gained traction recently, mainly due to growing interest in precision fermentation and sustainable proteins, is small bioreactors. For example, the cellular agriculture research institute New Harvest shared open source designs for a perfusion bioreactor for 3D tissue culture.
Beyond equipment, synthetic biology labs require access to cells and strains, and researchers are opening these up as well. For instance, Molloy’s research group is working on an open source E. coli expression strain. While “these workhorses have been around since the 80s, they're still encumbered effectively due to the material transfer agreements even though the patents expired 20 years ago,” said Molloy.
Likewise, the Open Yeast Toolkit offers free and redistributable genetic parts to help researchers and startups start engineering Saccharomyces cerevisiae and Pichia pastoris. Molloy added that AI will advance the trend of putting strains and molecules relevant to synthetic biology out in the open.
Earlier this year, Berkeley-based startup Profluent released OpenCRISPR 1, an open source alternative to CRISPR developed with generative AI. “It is a CRISPR kind of homolog that is different enough to the patented CRISPR that you can possibly use it without infringing the CRISPR patents,” said Molloy. She added that researchers, including her, are still investigating its added that she and others are still investigating its applicability.
Synthetic biologists are also looking to tap into the diversity of non-model microbes. However, “the primary bottleneck for working with non-model microbes is the lack of fundamental genetic tools and lab protocols, which are laborious and costly to develop,” said Nili Ostrov, CSO of Boston-based biotech Cultivarium. To address this gap, Cultivarium is developing open source tools that reduce the time and cost of engineering non-model organisms.
One of these tools is a software package that identifies active methylation patterns and restriction-modification systems in bacterial genomes. “This is helpful when developing protocols for DNA delivery into non-model microbes as the existence of R M systems is oftentimes a barrier to introduce recombinant DNA or phage infection,” said Ostrov. Other tools from Cultivarium help researchers pick the right plasmid and selective marker for a non-model organism and predict microbial growth conditions required to culture non-model microbes from their amino acid composition.
While Cultivarium’s tools help researchers with the wet stuff, other open software is focused on the in silico aspects.
As in computer science more broadly, bioinformatics and computational biology researchers have long openly shared their code. This trend continues with AI for biology, most prominently in proteomics. For instance, the protein structure prediction AlphaFold benefitted from open source work in both biology and machine learning. However, unlike its previous iterations, AlphaFold 3 is a closed source.
Researchers are also developing open source alternatives like OpenFold to predict protein folding. “One major advantage today when using OpenFold is that the system is faster and more efficient,” said Nazim Bouatta, a senior research fellow at Harvard Medical School and OpenFold contributor. Consequently, the runtime is shorter, and larger proteins can be folded on individual GPUs. “This has been achieved precisely because we took advantage of open source software.”
“We have released the full code, which means that OpenFold is trainable. You can take the data, and you can retrain the system from scratch,” said Bouatta. This is another difference between OpenFold and AlphaFold, and, as with hardware, the freedom to play around permits others to build on a system’s capabilities. “The advantage of being able to retrain is that you can explore new ideas, and OpenFold has been used by both academics and industry to explore new modalities,” added Bouatta.
Another use case at the intersection of machine learning and proteomics is the use of generative AI to create novel proteins, as illustrated by OpenCRISPR 1 mentioned earlier. “Having learned from all of the examples of proteins that we know to date, it's able to generate entirely novel proteins in much the same way as an image generative model can generate new faces of humans who never existed,” explained Gevorg Grigoryan, CTO of Cambridge based biotech Generate Biomedicines.
The company released its generative AI platform for protein design as an open source tool. “We wanted to communicate to the field how we thought protein design and generation tasks should be solved in this reality where machine learning capabilities are becoming broadly accessible,” said Grigoryan. This democratizes protein design by allowing researchers who aren’t necessarily experts in proteins, like materials scientists, to generate proteins for their tasks.
Researchers are developing a range of DNA and protein language models for different purposes. Luxembourgish startup Helical brings them together under a unified framework. “It doesn't matter which model you use, you can always use the same four lines of code in our platform,” said Rick Schneider. It is an open-core platform, which means only part of it is open source. The open source component lets users benchmark different models against each other for their use cases.
For both large language models and DNA/protein language models, expanding the length of a context window is an active area of research. “DNA is 3 billion nucleotides. If you only look at 25,000 nucleotides at a time, you might miss important interactions that happen between parts of the genome millions of nucleotides away from each other,” Schneider explained. Schneider added that open source development would be key to overcoming this challenge.
Beyond structure prediction and language models, researchers and companies are also developing open software for day-to-day lab operations. These include tools to plan experiments, analyze plates, and automate liquid handling, among other routine tasks.
In information technology, “most commercially available software has a foundation in open source software with extra parts,” said Molloy. Generative Biomedicines and Helical are examples of that in synthetic biology but companies will need to figure out ways to replicate that at scale as well as do that for hardware.
Moreover, to advance innovation in synthetic biology, hardware and software should be interoperable. In other words, users should be able to combine them with other tools as they see fit. Molloy and Boeing stressed that the repairability of hardware is also critical, particularly in supply chain-constrained regions. Open frameworks push the ecosystem of synthetic biology tools towards improved interoperability and reparability.
Lastly, collaboration is essential to building a sustainable and equitable bioeconomy, and open source is inherently collaborative. And that’s why there needs to be greater engagement with open-source projects in synthetic biology.