Everything on Earth comes with its own microbiome. Each soil sample has a unique microbiome with adaptations for their specific environmental conditions. Human skin and gut perform some of their functions due to the microbial communities living there. This biodiversity is a significant resource for biotechnology, giving thermostable enzymes and microbial therapeutics. Therefore, the genetic and bioinformatic study of environmental, clinical, or any biological sample, broadly called metagenomics, is a field that acts as a funnel for biotechnology discoveries.
The birth of metagenomics can be attributed to Norman Pace, who suggested isolating bulk DNA from environmental samples and analyzing their sequences. The initial studies carried out in the early 90s could only identify the microbial composition of environmental samples. Sequencing 16S RNA can reveal which microbes and what is the relative abundance in an ecosystem, but not much more.
A revolution in what we can learn from metagenomics came with the application of shotgun sequencing and next-generation sequencing (NGS). In the early 2000s, while working on sequencing the human genome, Craig Venter used shotgun sequencing to study the microbes living in the Sargasso Sea. NGS can sequence and assemble whole genomes, even from difficult or impossible-to-cultivate microbes.
However, we are far from exhausting the genomic resources nature offers. The vast majority of microbial life on Earth remains uncharacterized, representing a vast reservoir of untapped genetic and metabolic potential,” Yuji Suzuki, CEO of bitBiome, told me. “As we know, only about 0.001% of the microbial species on this planet have been identified, and we are on a mission to sequence and harness the power of all of them.” BitBiome is a spinoff of the Waseda University in Tokyo, Japan. The startup is tapping into the microbial potential of nature to identify blockbuster enzymes and solve problems using biology, using their proprietary single-cell sequencing technology.
Much of what metagenomic data can reveal remains underused due to the sheer amount of data requiring analysis. This is where new technologies, such as AI and machine learning, can analyze the vast genomic information out there and generate novel outcomes tailored to solve synthetic biology challenges. ML and AI can better annotate and categorize data, help in the assembly of genomes, and identify non-obvious correlations. “We traverse the protein sequence space through protein LLMs both for speed and ability to pick distant genes with a specific function,” Suzuki added.
The new wealth of information has allowed biodesign to take place at a whole new level. The potential impact of training AI using metagenomic data is certainly within the radar of synbio companies such as Ginkgo Bioworks, where new services opened up ways of using their wealth of bioengineering know-how to the community. For example, AI protein design can, while using existing sequencing databases, benefit tremendously from learning how biology has solved biodesign challenges in different environments and apply the lessons in industry applications.
The rise of better bioinformatic tools allows for the engineering of more complex biological designs. “In the past, research efforts had focused on engineering single strains or single genes,” Tae Seok Moon, Professor at J. Craig Venter Institute, said. “Recently, the focus has shifted towards engineering microbial communities, multiple genes at the same time, and even inter-kingdom interactions.” This new layer of bioengineering requires new tools but also understanding interactions using large data sets.
Implementing AI and ML, Moon and collaborators managed to create CRISPR tools that target complex microbial consortia, as reported in a 2022 study. “Those tools enable us to minimize the number of experiments guided by prediction,” Moon added.
It seems, however, that the most common current implementation of AI in strain engineering starts with AlphaFold. The prediction software comes with unprecedented predictive capability, allowing researchers to predict protein structure (and function) using only metagenomic data. “Where AI/ML are increasingly gaining traction is in solving protein structures and docking,” Suzuki explained. “These enable in silico workflows of drug discovery and biocatalysis.”. Scientists can test a large number of proteins with therapeutic or catalytic potential computationally, shortlisting the most interesting enzymes and coming up with novel proteins that can solve bioengineering challenges.
An impactful area where AI can amplify the information gained by metagenomics is human health, opening the door for precision medicine applications. There is recurring evidence that human health and welfare heavily rely on a healthy microbiome. “ The food we eat feeds us, but also feeds our microbes,” Jeremy Lim, CEO and founder of AMILI, said. The Singapore-based startup focuses on studying and modulating the human microbiome in Asian populations. “We aim to make healthcare truly personalized and understand the correlations between sequencing and phenotype in humans much more deeply,” Lim added.
There’s a plethora of data that can be collected around the human microbiome, including sequences, metabolites, dietary and disease influences, and the genetics of the human hosts. Finding meaningful correlations is where Deep Learning can have a significant impact, translating this data into actionable suggestions that can improve clinical practice or wellness. Lim parallels the algorithms that make recommendations in streaming or music platforms: the algorithms have decreased the cost of personalization, making it accessible for every unique user. The same can take place for microbiome-based precision health: simple metagenomic studies can feed into AI applications that adjust medicine dosages or nutrition based on an individual’s needs.
Biology innovation has, for many centuries, been slow and tedious. The biotech innovators need to choose their application area well and arm themselves with patience, passion, and resilience. “You must find a problem big enough that you’re passionate enough to see it through the years that it takes to develop the solution,” Lim highlighted when I asked what advice he would give to prospective startup founders. However, the new tools in AI/ML may make the discovery of new synthetic biology solutions faster, more accessible, and more efficient. New ways to catalog, process, and communicate information have always been a catalyst to technological breakthroughs, and it seems like synthetic biology is ready to ride the AI wave toward a more bio-inspired future.
Everything on Earth comes with its own microbiome. Each soil sample has a unique microbiome with adaptations for their specific environmental conditions. Human skin and gut perform some of their functions due to the microbial communities living there. This biodiversity is a significant resource for biotechnology, giving thermostable enzymes and microbial therapeutics. Therefore, the genetic and bioinformatic study of environmental, clinical, or any biological sample, broadly called metagenomics, is a field that acts as a funnel for biotechnology discoveries.
The birth of metagenomics can be attributed to Norman Pace, who suggested isolating bulk DNA from environmental samples and analyzing their sequences. The initial studies carried out in the early 90s could only identify the microbial composition of environmental samples. Sequencing 16S RNA can reveal which microbes and what is the relative abundance in an ecosystem, but not much more.
A revolution in what we can learn from metagenomics came with the application of shotgun sequencing and next-generation sequencing (NGS). In the early 2000s, while working on sequencing the human genome, Craig Venter used shotgun sequencing to study the microbes living in the Sargasso Sea. NGS can sequence and assemble whole genomes, even from difficult or impossible-to-cultivate microbes.
However, we are far from exhausting the genomic resources nature offers. The vast majority of microbial life on Earth remains uncharacterized, representing a vast reservoir of untapped genetic and metabolic potential,” Yuji Suzuki, CEO of bitBiome, told me. “As we know, only about 0.001% of the microbial species on this planet have been identified, and we are on a mission to sequence and harness the power of all of them.” BitBiome is a spinoff of the Waseda University in Tokyo, Japan. The startup is tapping into the microbial potential of nature to identify blockbuster enzymes and solve problems using biology, using their proprietary single-cell sequencing technology.
Much of what metagenomic data can reveal remains underused due to the sheer amount of data requiring analysis. This is where new technologies, such as AI and machine learning, can analyze the vast genomic information out there and generate novel outcomes tailored to solve synthetic biology challenges. ML and AI can better annotate and categorize data, help in the assembly of genomes, and identify non-obvious correlations. “We traverse the protein sequence space through protein LLMs both for speed and ability to pick distant genes with a specific function,” Suzuki added.
The new wealth of information has allowed biodesign to take place at a whole new level. The potential impact of training AI using metagenomic data is certainly within the radar of synbio companies such as Ginkgo Bioworks, where new services opened up ways of using their wealth of bioengineering know-how to the community. For example, AI protein design can, while using existing sequencing databases, benefit tremendously from learning how biology has solved biodesign challenges in different environments and apply the lessons in industry applications.
The rise of better bioinformatic tools allows for the engineering of more complex biological designs. “In the past, research efforts had focused on engineering single strains or single genes,” Tae Seok Moon, Professor at J. Craig Venter Institute, said. “Recently, the focus has shifted towards engineering microbial communities, multiple genes at the same time, and even inter-kingdom interactions.” This new layer of bioengineering requires new tools but also understanding interactions using large data sets.
Implementing AI and ML, Moon and collaborators managed to create CRISPR tools that target complex microbial consortia, as reported in a 2022 study. “Those tools enable us to minimize the number of experiments guided by prediction,” Moon added.
It seems, however, that the most common current implementation of AI in strain engineering starts with AlphaFold. The prediction software comes with unprecedented predictive capability, allowing researchers to predict protein structure (and function) using only metagenomic data. “Where AI/ML are increasingly gaining traction is in solving protein structures and docking,” Suzuki explained. “These enable in silico workflows of drug discovery and biocatalysis.”. Scientists can test a large number of proteins with therapeutic or catalytic potential computationally, shortlisting the most interesting enzymes and coming up with novel proteins that can solve bioengineering challenges.
An impactful area where AI can amplify the information gained by metagenomics is human health, opening the door for precision medicine applications. There is recurring evidence that human health and welfare heavily rely on a healthy microbiome. “ The food we eat feeds us, but also feeds our microbes,” Jeremy Lim, CEO and founder of AMILI, said. The Singapore-based startup focuses on studying and modulating the human microbiome in Asian populations. “We aim to make healthcare truly personalized and understand the correlations between sequencing and phenotype in humans much more deeply,” Lim added.
There’s a plethora of data that can be collected around the human microbiome, including sequences, metabolites, dietary and disease influences, and the genetics of the human hosts. Finding meaningful correlations is where Deep Learning can have a significant impact, translating this data into actionable suggestions that can improve clinical practice or wellness. Lim parallels the algorithms that make recommendations in streaming or music platforms: the algorithms have decreased the cost of personalization, making it accessible for every unique user. The same can take place for microbiome-based precision health: simple metagenomic studies can feed into AI applications that adjust medicine dosages or nutrition based on an individual’s needs.
Biology innovation has, for many centuries, been slow and tedious. The biotech innovators need to choose their application area well and arm themselves with patience, passion, and resilience. “You must find a problem big enough that you’re passionate enough to see it through the years that it takes to develop the solution,” Lim highlighted when I asked what advice he would give to prospective startup founders. However, the new tools in AI/ML may make the discovery of new synthetic biology solutions faster, more accessible, and more efficient. New ways to catalog, process, and communicate information have always been a catalyst to technological breakthroughs, and it seems like synthetic biology is ready to ride the AI wave toward a more bio-inspired future.