At the start of 2023, the MIT Technology Review named generative AI one of the top 10 breakthrough technologies of the year. While the article referred specifically to text-to-image programs like DALL-E 2 by OpenAI and Stable Diffusion by Stability AI, biotechnology has experienced its own AI boom this year.
McKinsey has identified more than 250 companies working on AI-driven drug discovery. Beyond biopharma, there are many other sectors of the bioeconomy that can benefit from introducing AI into their work stacks. Nearly every company can be transformed by AI. Whether you are working to design better enzymes, develop novel protein-based materials, or simply looking to optimize your workflows, chances are AI can help.
The number of potential applications of AI in biotech is virtually limitless. The new computational tool can help improve enzymes, model the physiology of whole cells, model conditions inside a bioreactor, improve the sustainability of dyes, and even predict what startups to invest in. This is a discussion that will be discussed in even more granularity at SynBioBeta 2024: The Global Synthetic Biology Conference, set to take place May 6-9, in San Jose California. In the meantime, let's take a look at what AI can do for synthetic biology.
First, let’s clarify something: the term ‘AI’ can be used for different things. For example, some companies use AI or machine learning (ML) to analyze data and provide insights about it. This approach can improve the efficiency of workflows, speed up discovery, and accomplish things that were previously impossible, like finding new connections between therapeutic targets in the human proteome.
A separate category is generative AI, which not only analyzes data but can create new designs and propose ways to improve upon existing ones. For example, it can suggest ways to make an enzyme work better at a higher temperature.
A large subset of generative AI models in biology are designed after Large Language Models (LLMs) like ChatGPT. The difference is that instead of human language, they are trained on biological data, such as sequencing, structure, and assay data.
Gen-AI models promise to have a huge impact on how we design biology. The global market for generative AI in biology was estimated at $74 million in 2022 and is projected to reach around $363 million by 2032, with a staggering pace of growth at a 17% CAGR.
This Cambrian-like explosion was made possible not only by advances in generative AI algorithms but also by the massive amounts of high-throughput DNA synthesis, sequencing, and screening data generated over the last decade. While the analysis of this data was once a bottleneck for many labs, it can now serve as an invaluable resource for training AI models to understand biology on a fundamental level.
Generative AI models for biology fall into two broad categories: foundational and specialized models. Foundational models take on the ambitious task of understanding how biology works. These models are trained on enormous datasets, such as all protein sequences stored in protein data banks. They study that data using an unsupervised learning process to uncover the underlying rules and learn the language of biology. The approach is very similar to how ChatGPT was trained on the entire corpus of the internet to converse in human language.
Specialized models, on the other hand, are trained on smaller, often proprietary datasets to accomplish a specific task, such as improving the stability of an enzyme. To use the chatbot analogy, this would be akin to teaching a language model to write nursery rhymes by training it exclusively on nursery rhyme books. These models can be improved and fine-tuned for a specific purpose through iterative cycles of data acquisition and learning.
Many companies are developing their own version of either foundational or specialized models. Ginkgo, for example, has partnered with Google Cloud to develop foundational Evolutionary Scale Models for engineering biology with predictive capabilities. Similarly, Recursion Pharmaceuticals partnered with NVIDIA to leverage its DGX Cloud to develop foundation generative AI models for drug discovery.
Others operate on a “generative-AI-as-a-service” model. Many of them, like Cradle Bio and Biomatter Designs, are based in Europe. These AI experts can help other companies tap into the power of generative protein design. Berkley-based startup Profluent also aims to advance generative modeling for protein design, including enzymes, antibodies, gene editors, and peptides. And if you want to try generative AI on your own, Deep Chain has created a cloud platform for AI-Accelerated Protein Design, and it even has a free version.
Now, let’s go over some examples of current applications of AI in biology.
A good example of a foundational model is the protein structure prediction program AlphaFold, developed by Google DeepMind. AlphaFold has revolutionized the prediction accuracy for 3D protein structures, catalyzing a new wave of progress in biology. It has been used for many applications, like discerning the structure of the Nuclear Pore Complex (NPC).
For synthetic biology, proteins are functional biological machines, the engines behind almost any application. The shape of the protein determines its function—therefore, knowing the structure can help engineer new properties into them.
Since AlphaFold came on the scene in 2021, the number of high-accuracy human protein structures available to scientists has more than doubled. One of the key elements that contributed to the success of AlphaFold is that the software is free to use for anyone in the research community, breaking down the barrier to entry for researchers who want to use AI and catalyzing innovation.
The creators of AlphaFold keep working on improving the model. The latest iteration, developed by Isomorphic Labs and Google’s DeepMind AlphaFold team, provides improvements to molecular docking, or protein-ligand interaction modeling. This process is often used to model drug binding but can also be applied to enzyme design.
Traditionally, molecular docking experiments were based on determining the crystal structure of the protein with the ligand bound to it. The original AlphaFold, however, did not include protein-ligand complex data and was not very good at predicting binding configurations. The newly released version was trained specifically on binding data and performed significantly better, adding another powerful tool to the AI toolkit.
While AlphaFold is exceptionally good at translating protein sequences into structure, there is another way to go about engineering proteins: direct sequence-to-function prediction. This tool can be used for improving protein properties like binding affinity, catalytic efficiency, or thermostability without the need to know the underlying protein structure.
Traditionally, enzyme engineering is done through many cycles of directed evolution or rational design, where an expert protein engineer works to identify ways to improve the protein, followed by testing those design ideas in the lab. AI-powered workflows can improve the efficiency of protein engineering workflows. By learning the relationship between the sequence and function of the protein, AI models can suggest sequences for testing and help scientists discover improved variants that would not be identified through traditional methods.
The applications of generative AI in the enzyme space are nearly limitless, from improving detergents to optimizing crops. For example, Protein Evolution, Inc. partnered with the DOE’s Agile BioFoundry and the Joint BioEnergy Institute (JBEI) to create AI-optimized enzymes that break down plastic and textile waste. Another synbio company, Arzeda, is working with Unilever to develop new cleaning enzymes with increased stability, performance, and sustainability benefits.
Beyond enzymes, AI models can help design other types of proteins, such as materials with improved properties or animal product alternatives. For example, a Berlin-based synthetic biology materials startup, Cambrium, which just raised 8 million euros in seed funding round, is using AI to make new sustainable skin-identical micromolecular collagen made in yeast cells.
Protein sequences are not the only type of data that can be analyzed by AI. A synthetic biology startup, Asimov, has come out with AI promoter models that predict which specific tissues those promoters are expressed in. Another biotech, Absci, which has adopted zero-shot AI methods for developing de novo antibodies, uses AI to analyze the natural patterns of codon usage to optimize protein expression in heterologous hosts. Gene expression optimization is an important tool for both existing and new synbio products.
There are also more generalizable applications of AI for biotechnology. Machine Learning Operations (MLOps) focuses on optimizing process performance, such as predicting instrument maintenance needs, consumables use, streamlining quality control, and more. Elemental Machines specializes in MLOps for biotech and synthetic biology companies. The company collects operational data using smart sensors in biotech labs to identify factors that can affect performance and control those variabilities.
Similarly, Clustermarket provides tools for technologically advanced labs to coordinate equipment usage, plan maintenance activities, and forecast resource requirements. LabVantage Analytics uses AI tools like the Laboratory Performance Optimizer to track performance and alert of potential issues before they happen.
Another area where AI can provide significant improvement is fermentation and process scale-up. It can analyze historical data, and identify optimal process parameters to improve the yield and productivity of fermentations. Integrating sensors lets researchers collect data about the processes happening in the bioreactor, which can be used to optimize nutrient feeding and control the pH, temperature, and other parameters. TeselaGen provides such fermentation optimization tools backed by machine learning.
AI can also be used to analyze strain fitness data and predict whether a strain will scale well. Currently, very little data are captured from commercial-scale fermentations, which are essential for scaling up the bioeconomy. The AI4B.io lab, a collaboration between DSM and Delft University, is working on a model called Fermentation Digital Twin, a real-time simulator of microbial performance in industrial bioreactor conditions.
Needless to say, the rapid progress in generative AI has to include conversations around biosecurity. The British government’s AI safety summit that took place last month brought together global policymakers, companies, and researchers to assess the risks of AI. A new report pertaining to biosecurity specifically was launched at the summit, providing recommendations for reducing biological risks associated with the development of AI.
In the USA, President Biden’s new executive order aimed to place guardrails on the development of AI technologies that threaten national security. While some think these measures may stifle innovation, others see it as a necessary measure to put in place for the rapidly developing technology.
Let’s not forget, however, that AI technology can serve as a valuable tool for identifying biological threats and developing mechanisms to prevent accidental or deliberate development of risky technologies. For example, Concentric by Ginkgo is a program established for early detection of pathogen threats to public health that relies heavily on developments in AI.
The concern about biosecurity emphasizes another important aspect of AI progress: improving the capabilities in other technology areas like screening, automation, annotation, and multiplex genome editing. These not only provide valuable data to improve AI models for biology but also represent points of interference where safety mechanisms can be put in place.
AI has served as a beacon of hope for biotech, promising to revolutionize the speed of discovery and bring in entirely new capabilities. However, innovations in AI need to go hand-in-hand with other foundational synthetic biology technologies to harness its full power. We need all hands on deck, with AI researchers, hardware engineers, high-throughput screening experts, biosecurity specialists, policymakers, and other bright minds of synbio working together. How will your company empower the AI revolution?
At the start of 2023, the MIT Technology Review named generative AI one of the top 10 breakthrough technologies of the year. While the article referred specifically to text-to-image programs like DALL-E 2 by OpenAI and Stable Diffusion by Stability AI, biotechnology has experienced its own AI boom this year.
McKinsey has identified more than 250 companies working on AI-driven drug discovery. Beyond biopharma, there are many other sectors of the bioeconomy that can benefit from introducing AI into their work stacks. Nearly every company can be transformed by AI. Whether you are working to design better enzymes, develop novel protein-based materials, or simply looking to optimize your workflows, chances are AI can help.
The number of potential applications of AI in biotech is virtually limitless. The new computational tool can help improve enzymes, model the physiology of whole cells, model conditions inside a bioreactor, improve the sustainability of dyes, and even predict what startups to invest in. This is a discussion that will be discussed in even more granularity at SynBioBeta 2024: The Global Synthetic Biology Conference, set to take place May 6-9, in San Jose California. In the meantime, let's take a look at what AI can do for synthetic biology.
First, let’s clarify something: the term ‘AI’ can be used for different things. For example, some companies use AI or machine learning (ML) to analyze data and provide insights about it. This approach can improve the efficiency of workflows, speed up discovery, and accomplish things that were previously impossible, like finding new connections between therapeutic targets in the human proteome.
A separate category is generative AI, which not only analyzes data but can create new designs and propose ways to improve upon existing ones. For example, it can suggest ways to make an enzyme work better at a higher temperature.
A large subset of generative AI models in biology are designed after Large Language Models (LLMs) like ChatGPT. The difference is that instead of human language, they are trained on biological data, such as sequencing, structure, and assay data.
Gen-AI models promise to have a huge impact on how we design biology. The global market for generative AI in biology was estimated at $74 million in 2022 and is projected to reach around $363 million by 2032, with a staggering pace of growth at a 17% CAGR.
This Cambrian-like explosion was made possible not only by advances in generative AI algorithms but also by the massive amounts of high-throughput DNA synthesis, sequencing, and screening data generated over the last decade. While the analysis of this data was once a bottleneck for many labs, it can now serve as an invaluable resource for training AI models to understand biology on a fundamental level.
Generative AI models for biology fall into two broad categories: foundational and specialized models. Foundational models take on the ambitious task of understanding how biology works. These models are trained on enormous datasets, such as all protein sequences stored in protein data banks. They study that data using an unsupervised learning process to uncover the underlying rules and learn the language of biology. The approach is very similar to how ChatGPT was trained on the entire corpus of the internet to converse in human language.
Specialized models, on the other hand, are trained on smaller, often proprietary datasets to accomplish a specific task, such as improving the stability of an enzyme. To use the chatbot analogy, this would be akin to teaching a language model to write nursery rhymes by training it exclusively on nursery rhyme books. These models can be improved and fine-tuned for a specific purpose through iterative cycles of data acquisition and learning.
Many companies are developing their own version of either foundational or specialized models. Ginkgo, for example, has partnered with Google Cloud to develop foundational Evolutionary Scale Models for engineering biology with predictive capabilities. Similarly, Recursion Pharmaceuticals partnered with NVIDIA to leverage its DGX Cloud to develop foundation generative AI models for drug discovery.
Others operate on a “generative-AI-as-a-service” model. Many of them, like Cradle Bio and Biomatter Designs, are based in Europe. These AI experts can help other companies tap into the power of generative protein design. Berkley-based startup Profluent also aims to advance generative modeling for protein design, including enzymes, antibodies, gene editors, and peptides. And if you want to try generative AI on your own, Deep Chain has created a cloud platform for AI-Accelerated Protein Design, and it even has a free version.
Now, let’s go over some examples of current applications of AI in biology.
A good example of a foundational model is the protein structure prediction program AlphaFold, developed by Google DeepMind. AlphaFold has revolutionized the prediction accuracy for 3D protein structures, catalyzing a new wave of progress in biology. It has been used for many applications, like discerning the structure of the Nuclear Pore Complex (NPC).
For synthetic biology, proteins are functional biological machines, the engines behind almost any application. The shape of the protein determines its function—therefore, knowing the structure can help engineer new properties into them.
Since AlphaFold came on the scene in 2021, the number of high-accuracy human protein structures available to scientists has more than doubled. One of the key elements that contributed to the success of AlphaFold is that the software is free to use for anyone in the research community, breaking down the barrier to entry for researchers who want to use AI and catalyzing innovation.
The creators of AlphaFold keep working on improving the model. The latest iteration, developed by Isomorphic Labs and Google’s DeepMind AlphaFold team, provides improvements to molecular docking, or protein-ligand interaction modeling. This process is often used to model drug binding but can also be applied to enzyme design.
Traditionally, molecular docking experiments were based on determining the crystal structure of the protein with the ligand bound to it. The original AlphaFold, however, did not include protein-ligand complex data and was not very good at predicting binding configurations. The newly released version was trained specifically on binding data and performed significantly better, adding another powerful tool to the AI toolkit.
While AlphaFold is exceptionally good at translating protein sequences into structure, there is another way to go about engineering proteins: direct sequence-to-function prediction. This tool can be used for improving protein properties like binding affinity, catalytic efficiency, or thermostability without the need to know the underlying protein structure.
Traditionally, enzyme engineering is done through many cycles of directed evolution or rational design, where an expert protein engineer works to identify ways to improve the protein, followed by testing those design ideas in the lab. AI-powered workflows can improve the efficiency of protein engineering workflows. By learning the relationship between the sequence and function of the protein, AI models can suggest sequences for testing and help scientists discover improved variants that would not be identified through traditional methods.
The applications of generative AI in the enzyme space are nearly limitless, from improving detergents to optimizing crops. For example, Protein Evolution, Inc. partnered with the DOE’s Agile BioFoundry and the Joint BioEnergy Institute (JBEI) to create AI-optimized enzymes that break down plastic and textile waste. Another synbio company, Arzeda, is working with Unilever to develop new cleaning enzymes with increased stability, performance, and sustainability benefits.
Beyond enzymes, AI models can help design other types of proteins, such as materials with improved properties or animal product alternatives. For example, a Berlin-based synthetic biology materials startup, Cambrium, which just raised 8 million euros in seed funding round, is using AI to make new sustainable skin-identical micromolecular collagen made in yeast cells.
Protein sequences are not the only type of data that can be analyzed by AI. A synthetic biology startup, Asimov, has come out with AI promoter models that predict which specific tissues those promoters are expressed in. Another biotech, Absci, which has adopted zero-shot AI methods for developing de novo antibodies, uses AI to analyze the natural patterns of codon usage to optimize protein expression in heterologous hosts. Gene expression optimization is an important tool for both existing and new synbio products.
There are also more generalizable applications of AI for biotechnology. Machine Learning Operations (MLOps) focuses on optimizing process performance, such as predicting instrument maintenance needs, consumables use, streamlining quality control, and more. Elemental Machines specializes in MLOps for biotech and synthetic biology companies. The company collects operational data using smart sensors in biotech labs to identify factors that can affect performance and control those variabilities.
Similarly, Clustermarket provides tools for technologically advanced labs to coordinate equipment usage, plan maintenance activities, and forecast resource requirements. LabVantage Analytics uses AI tools like the Laboratory Performance Optimizer to track performance and alert of potential issues before they happen.
Another area where AI can provide significant improvement is fermentation and process scale-up. It can analyze historical data, and identify optimal process parameters to improve the yield and productivity of fermentations. Integrating sensors lets researchers collect data about the processes happening in the bioreactor, which can be used to optimize nutrient feeding and control the pH, temperature, and other parameters. TeselaGen provides such fermentation optimization tools backed by machine learning.
AI can also be used to analyze strain fitness data and predict whether a strain will scale well. Currently, very little data are captured from commercial-scale fermentations, which are essential for scaling up the bioeconomy. The AI4B.io lab, a collaboration between DSM and Delft University, is working on a model called Fermentation Digital Twin, a real-time simulator of microbial performance in industrial bioreactor conditions.
Needless to say, the rapid progress in generative AI has to include conversations around biosecurity. The British government’s AI safety summit that took place last month brought together global policymakers, companies, and researchers to assess the risks of AI. A new report pertaining to biosecurity specifically was launched at the summit, providing recommendations for reducing biological risks associated with the development of AI.
In the USA, President Biden’s new executive order aimed to place guardrails on the development of AI technologies that threaten national security. While some think these measures may stifle innovation, others see it as a necessary measure to put in place for the rapidly developing technology.
Let’s not forget, however, that AI technology can serve as a valuable tool for identifying biological threats and developing mechanisms to prevent accidental or deliberate development of risky technologies. For example, Concentric by Ginkgo is a program established for early detection of pathogen threats to public health that relies heavily on developments in AI.
The concern about biosecurity emphasizes another important aspect of AI progress: improving the capabilities in other technology areas like screening, automation, annotation, and multiplex genome editing. These not only provide valuable data to improve AI models for biology but also represent points of interference where safety mechanisms can be put in place.
AI has served as a beacon of hope for biotech, promising to revolutionize the speed of discovery and bring in entirely new capabilities. However, innovations in AI need to go hand-in-hand with other foundational synthetic biology technologies to harness its full power. We need all hands on deck, with AI researchers, hardware engineers, high-throughput screening experts, biosecurity specialists, policymakers, and other bright minds of synbio working together. How will your company empower the AI revolution?