[DALL-E]

Folding the Future: How AI is Reshaping Protein Engineering

With its power to reduce costs, increase efficiency, and ability to address pressing global challenges, generative protein design is poised to transform the biotech landscape and the broader bioeconomy
AI & Digital Biology
BioDesign
Chemicals & Materials
by
Katia Tarasava, PhD
|
March 31, 2025

Since the birth of biotechnology, researchers have been on a quest to develop better proteins: enzymes that retain their function at high temperatures, antibodies that bind targets with one-in-a-million specificity, or structural proteins that are stronger, lighter, and cheaper to make than traditional materials. Nature’s versatile biological machines, proteins, are composed of strings of amino acids that fold into complex, dynamic shapes—like origami. The function of the protein is determined by its structure, which, in turn, depends on the amino acid sequence.

The vision of engineering custom proteins with specified properties and functions by designing a specific amino acid sequence has captivated scientists for a long time. The problem is that protein sequence space is practically immeasurable, and trying to engineer a protein to perform a specific function is a Sisyphean task. For decades, protein engineers evolved functions—such as an enzyme’s thermostability or substrate affinity—in the lab using a method called directed evolution. Alternatively, they would attempt to rationally design the desired functions by looking at the protein structure (a 3D model obtained through X-ray crystallography or cryo-electron microscopy) and trying to figure out which protein regions they needed to change, followed by making hundreds of different mutated variants and testing them in the lab.

Protein structure visualizations help scientists understand how amino acid sequences dictate function—but until recently, designing these structures required laborious trial and error. [Emw/Wikimedia CC BY-SA 3.0]

As artificial intelligence continues to transform protein engineering, thought leaders and startup pioneers alike will converge at this year’s SynBioBeta: The Global Synthetic Biology Conference to discuss the latest breakthroughs in generative models for protein design. Presenters will highlight how these computational tools are fast becoming an essential driver of innovation across the synthetic biology landscape.

The AlphaFold Revolution

Given the arduous and inefficient nature of engineering proteins in the lab, the field was long overdue for a computational makeover. However, until about 2018, the computational power required to solve a protein structure was considered unattainable. Back in 1969, molecular biologist Cyrus Levinthal estimated that it would take longer than the age of the known universe to solve the protein folding problem since a typical-length protein can fold into 10300 possible configurations. But that doesn’t mean no one has tried it.

Since the late 1980s, scientists developed molecular physics-based models to try to figure out the rules of protein folding. To compare how well the different models worked, researchers established the Critical Assessment of Structure Prediction (CASP) challenge, a protein structure prediction competition that began in 1994. In the course of the competition, teams would attempt to solve the structure of a protein based on their models and then compare those predictions with experimentally determined structures.

The majority of the initial models, such as Rosetta developed in 2024 Nobel Laurate David Baker’s lab at the University of Washington, were physics-based. While these models could accurately describe the thermodynamic forces driving protein folding, they lacked the computational power to brute-force 10300 possible configurations. The development of deep learning algorithms in the early 2010s was a breakthrough moment for protein structure prediction. In 2018, a team from DeepMind, the UK artificial intelligence lab bought by Google in 2014, entered into the thirteenth CASP competition with a model called AlphaFold and placed first. By the next competition in 2020, AlphaFold 2, a revamped algorithm from DeepMind, left all other models in the dust.

AlphaFold, which won the 2024 Nobel Prize in Chemistry for its researchers, was a true breakthrough in protein folding prediction. AlphaFold is a neural network-based model that predicts protein structures with atomic accuracy without requiring nearly as much computational power as physics-based models. To do that, the algorithm first searches genetic databases for similar protein sequences and creates a multiple-sequence alignment. Then, it generates a pairwise representation to encode spatial relationships between amino acids. The refined pairwise information is passed through a transformer network to produce a final prediction of the protein's structure.

AlphaFold, developed by DeepMind, revolutionized structural biology by using deep learning to predict protein structures with atomic-level precision. [DALL-E]

This AI tool has revolutionized biology, enabling the design of new proteins for medicine, energy, and sustainability. Since 2020, AlphaFold has been used to predict the structures of millions of proteins, making waves across the scientific community. (As of the writing of this article, the Nature paper describing AlphaFold has been cited 32998 times). The latest AlphaFold 3 model is freely available for non-commercial research use, democratizing protein engineering—something we could not have even imagined a decade ago.

Beyond AlphaFold

Since the breakthrough of AlphaFold, other AI models have been developed by academic labs and biotech companies, including  EvoBind, ESMFold, and RFDiffusion, which can predict protein structure and function, as well as elucidate the folding of other molecule types (DNA and RNA) and model complex molecular interactions between proteins and their ligands.

RFdiffusion is a model built on an open-source algorithm called RoseTTAFold, developed by David Baker’s lab. It generates new protein structures in a manner similar to how DALL-E or Midjourney generate art. This tool was able to solve new design challenges, including molecular binding and oligomer design. Both RoseTTAFold and RFdiffusion have since then been upgraded to All-Atom versions, which enable modeling of not just proteins but biological complexes made from different molecules, including DNA, RNA, small molecules, metals, and other bonded atoms.

As large language models like ChatGPT took the world by storm, scientists realized that you can use them for proteins as well—and this has turned out to be a very successful strategy. These gigantic models trained on extensive volumes of data enable scientists to generate de novo designs based not on protein structure but function. Dubbed protein language models (PLMs), these tools are a lot more accessible for researchers because tuning a natural language model is much easier than developing conventional machine learning algorithms.

The progress in generative AI for protein design is by no means finished. Researchers are still working to refine the algorithms to create better proteins for specific applications, as well as expand the capabilities of what is possible for biology in the age of AI. Last month, NVIDIA released GenMol, a foundational model for molecular generation. GenMol offers a versatile molecular generative framework based on discrete diffusion and non-autoregressive decoding designed to streamline drug discovery from molecule design to lead optimization.

Generative Design in Synthetic Biology

So, what does this mean for synthetic biology and the bioeconomy?

AI-designed enzymes are already being used to improve industrial processes—from high-heat manufacturing to plastic degradation—boosting both performance and sustainability. [DALL-E]

The dream of being able to design custom proteins has finally come true. For example, we can make new enzymes that are stable at high temperatures. Using enzymes that do not degrade in high-temperature industrial processes (such as making paper from wood pulp) translates into millions of dollars in savings. AI is also revolutionizing drug discovery research by allowing scientists to model how proteins and small-molecule drugs interact with their targets. Additionally, we can engineer proteins that have no equivalents in nature—which could help us solve the challenges of the twenty-first century.

Companies such as Cradle have been using protein language models to design better enzymes, receptor-binding proteins, and antibodies, with applications across the entire biotechnology spectrum, from industrial and food biotech to biopharma. Absci is transforming the biologics discovery pipeline by enabling de novo antibody design. And Ginkgo’s models have been able to improve PETase activity, an enzyme capable of degrading plastic.

As the recent Advanced Biotech for Sustainability (AB4S) report highlights, advanced biotechnology could reduce global emissions by 5% and generate $1 trillion in economic value across the food, agriculture, chemicals, personal care, and transportation fuels sectors. With generative AI becoming a mainstay in biotechnology, transforming the world through synthetic biology is no longer a pipe dream—it is just a work in progress.

Related Articles

No items found.

Folding the Future: How AI is Reshaping Protein Engineering

by
Katia Tarasava, PhD
March 31, 2025
[DALL-E]

Folding the Future: How AI is Reshaping Protein Engineering

With its power to reduce costs, increase efficiency, and ability to address pressing global challenges, generative protein design is poised to transform the biotech landscape and the broader bioeconomy
by
Katia Tarasava, PhD
March 31, 2025
[DALL-E]

Since the birth of biotechnology, researchers have been on a quest to develop better proteins: enzymes that retain their function at high temperatures, antibodies that bind targets with one-in-a-million specificity, or structural proteins that are stronger, lighter, and cheaper to make than traditional materials. Nature’s versatile biological machines, proteins, are composed of strings of amino acids that fold into complex, dynamic shapes—like origami. The function of the protein is determined by its structure, which, in turn, depends on the amino acid sequence.

The vision of engineering custom proteins with specified properties and functions by designing a specific amino acid sequence has captivated scientists for a long time. The problem is that protein sequence space is practically immeasurable, and trying to engineer a protein to perform a specific function is a Sisyphean task. For decades, protein engineers evolved functions—such as an enzyme’s thermostability or substrate affinity—in the lab using a method called directed evolution. Alternatively, they would attempt to rationally design the desired functions by looking at the protein structure (a 3D model obtained through X-ray crystallography or cryo-electron microscopy) and trying to figure out which protein regions they needed to change, followed by making hundreds of different mutated variants and testing them in the lab.

Protein structure visualizations help scientists understand how amino acid sequences dictate function—but until recently, designing these structures required laborious trial and error. [Emw/Wikimedia CC BY-SA 3.0]

As artificial intelligence continues to transform protein engineering, thought leaders and startup pioneers alike will converge at this year’s SynBioBeta: The Global Synthetic Biology Conference to discuss the latest breakthroughs in generative models for protein design. Presenters will highlight how these computational tools are fast becoming an essential driver of innovation across the synthetic biology landscape.

The AlphaFold Revolution

Given the arduous and inefficient nature of engineering proteins in the lab, the field was long overdue for a computational makeover. However, until about 2018, the computational power required to solve a protein structure was considered unattainable. Back in 1969, molecular biologist Cyrus Levinthal estimated that it would take longer than the age of the known universe to solve the protein folding problem since a typical-length protein can fold into 10300 possible configurations. But that doesn’t mean no one has tried it.

Since the late 1980s, scientists developed molecular physics-based models to try to figure out the rules of protein folding. To compare how well the different models worked, researchers established the Critical Assessment of Structure Prediction (CASP) challenge, a protein structure prediction competition that began in 1994. In the course of the competition, teams would attempt to solve the structure of a protein based on their models and then compare those predictions with experimentally determined structures.

The majority of the initial models, such as Rosetta developed in 2024 Nobel Laurate David Baker’s lab at the University of Washington, were physics-based. While these models could accurately describe the thermodynamic forces driving protein folding, they lacked the computational power to brute-force 10300 possible configurations. The development of deep learning algorithms in the early 2010s was a breakthrough moment for protein structure prediction. In 2018, a team from DeepMind, the UK artificial intelligence lab bought by Google in 2014, entered into the thirteenth CASP competition with a model called AlphaFold and placed first. By the next competition in 2020, AlphaFold 2, a revamped algorithm from DeepMind, left all other models in the dust.

AlphaFold, which won the 2024 Nobel Prize in Chemistry for its researchers, was a true breakthrough in protein folding prediction. AlphaFold is a neural network-based model that predicts protein structures with atomic accuracy without requiring nearly as much computational power as physics-based models. To do that, the algorithm first searches genetic databases for similar protein sequences and creates a multiple-sequence alignment. Then, it generates a pairwise representation to encode spatial relationships between amino acids. The refined pairwise information is passed through a transformer network to produce a final prediction of the protein's structure.

AlphaFold, developed by DeepMind, revolutionized structural biology by using deep learning to predict protein structures with atomic-level precision. [DALL-E]

This AI tool has revolutionized biology, enabling the design of new proteins for medicine, energy, and sustainability. Since 2020, AlphaFold has been used to predict the structures of millions of proteins, making waves across the scientific community. (As of the writing of this article, the Nature paper describing AlphaFold has been cited 32998 times). The latest AlphaFold 3 model is freely available for non-commercial research use, democratizing protein engineering—something we could not have even imagined a decade ago.

Beyond AlphaFold

Since the breakthrough of AlphaFold, other AI models have been developed by academic labs and biotech companies, including  EvoBind, ESMFold, and RFDiffusion, which can predict protein structure and function, as well as elucidate the folding of other molecule types (DNA and RNA) and model complex molecular interactions between proteins and their ligands.

RFdiffusion is a model built on an open-source algorithm called RoseTTAFold, developed by David Baker’s lab. It generates new protein structures in a manner similar to how DALL-E or Midjourney generate art. This tool was able to solve new design challenges, including molecular binding and oligomer design. Both RoseTTAFold and RFdiffusion have since then been upgraded to All-Atom versions, which enable modeling of not just proteins but biological complexes made from different molecules, including DNA, RNA, small molecules, metals, and other bonded atoms.

As large language models like ChatGPT took the world by storm, scientists realized that you can use them for proteins as well—and this has turned out to be a very successful strategy. These gigantic models trained on extensive volumes of data enable scientists to generate de novo designs based not on protein structure but function. Dubbed protein language models (PLMs), these tools are a lot more accessible for researchers because tuning a natural language model is much easier than developing conventional machine learning algorithms.

The progress in generative AI for protein design is by no means finished. Researchers are still working to refine the algorithms to create better proteins for specific applications, as well as expand the capabilities of what is possible for biology in the age of AI. Last month, NVIDIA released GenMol, a foundational model for molecular generation. GenMol offers a versatile molecular generative framework based on discrete diffusion and non-autoregressive decoding designed to streamline drug discovery from molecule design to lead optimization.

Generative Design in Synthetic Biology

So, what does this mean for synthetic biology and the bioeconomy?

AI-designed enzymes are already being used to improve industrial processes—from high-heat manufacturing to plastic degradation—boosting both performance and sustainability. [DALL-E]

The dream of being able to design custom proteins has finally come true. For example, we can make new enzymes that are stable at high temperatures. Using enzymes that do not degrade in high-temperature industrial processes (such as making paper from wood pulp) translates into millions of dollars in savings. AI is also revolutionizing drug discovery research by allowing scientists to model how proteins and small-molecule drugs interact with their targets. Additionally, we can engineer proteins that have no equivalents in nature—which could help us solve the challenges of the twenty-first century.

Companies such as Cradle have been using protein language models to design better enzymes, receptor-binding proteins, and antibodies, with applications across the entire biotechnology spectrum, from industrial and food biotech to biopharma. Absci is transforming the biologics discovery pipeline by enabling de novo antibody design. And Ginkgo’s models have been able to improve PETase activity, an enzyme capable of degrading plastic.

As the recent Advanced Biotech for Sustainability (AB4S) report highlights, advanced biotechnology could reduce global emissions by 5% and generate $1 trillion in economic value across the food, agriculture, chemicals, personal care, and transportation fuels sectors. With generative AI becoming a mainstay in biotechnology, transforming the world through synthetic biology is no longer a pipe dream—it is just a work in progress.

RECENT INDUSTRY NEWS
RECENT INSIGHTS
Sign Up Now