Since the discovery of Cas9, scientists have been identifying other CRISPR enzymes in attempts to find enzymes that are better fit for purpose - those that cut more efficiently, those that are smaller and thus easier to package in gene therapy vectors, or those that have minimal off-target editing.
Scientists have been optimizing CRISPR systems for years, and now, with the rise of AI in biotech, many in the field have been using AI to design the best CRISPR system. Let’s take a look at some of the innovations in the field.
Mammoth Biosciences is one of the companies on the search for a better Cas protein. They use computational approaches to first comb through metagenomic databases to identify promising ultracompact Cas proteins and then screen them experimentally. “We recognized that nature had evolved millions of different CRISPR variants, presenting an unprecedented opportunity to discover systems with superior therapeutic potential,” says Lucas Harrington, President, Tx & Co-founder of Mammoth Biosciences.
Once they identify their candidate Cas proteins, they synthesize and screen them in a variety of assays with the help of AI tools that predict their 3-dimensional structures and guide their mutation strategy to get the desired features. “What makes this particularly powerful is its iterative nature,” says Harrington. “Each round of engineering generates new data that improves the model's accuracy, making subsequent rounds of mutation increasingly effective. This dramatically accelerates our protein engineering efforts in the lab.” Recently, they’ve used this process to discover NanoCas, an ultracompact Cas enzyme one-third the size of Cas9.
Mammoth is focusing its efforts on therapeutics, such as treating the genetic diseases familial chylomicronemia syndrome and severe hypertriglyceridemia by targeting the APOC3 gene. They are also working on using their systems in neuromuscular and central nervous system targets.
Scribe Therapeutics is taking a different approach to engineering CRISPR systems. They’re systematically evolving another small CRISPR system based on CasX, discovered by Jennifer Doudna’s lab in 2019 (Doudna is Co-founder of Mammoth and Scribe). Evolving CasX results in a library of thousands to millions of variant enzymes. “We create that in an unbiased manner,” says Benjamin Oakes, Co-founder, President, and CEO of Scribe Therapeutics. “For example, we change every amino acid in the protein to every other amino acid one by one.” Since they have experimental data on all of the variants, they can use this data as inputs for machine learning. “It’s assay-labeled data that is holistic,” says Oakes. “We understand how every single change behaves, and what’s interesting is that data then becomes really useful for future library generation.”
Oakes sees an advantage to their methodology in that they’re starting with proteins that aren’t found in nature. “We essentially have to search out new areas of sequence space that we can only access by engineering because nature doesn't evolve for them,” says Oakes.
Scribe’s work has allowed them to build genetic medicines to treat cardiometabolic diseases with gene editing and epigenetic silencing. Their therapeutic targeting of the PCSK9 gene to lower low-density lipoprotein cholesterol (LDL-C) has been validated in non-human primates.
Another company in this space is Profluent. They recently released the first open-source AI-generated gene editor called OpenCRISPR-1, which is currently used by over a thousand researchers. This enzyme is over 400 mutations away from the well-known SpCas9 and about 200 mutations off from any other known natural Cas protein and has base editing capabilities.
The company is also working on using AI to understand and predict protein-DNA interactions. Their new model Protein2PAM predicts specificity to PAM sites of Cas proteins of Type I, II, and V CRISPR-Cas systems. This model takes a desired PAM sequence and then evolves Cas proteins in silico with specificity for the PAM site.
Cas enzymes are just one part of the equation. Another significant component of CRISPR systems are the guide RNAs (gRNAs) that direct Cas proteins to cut sites. gRNAs can be used to shut off genes completely or turn down their expression. And to be effective, gRNAs need to act at designated sites (on-target sites) while minimizing effects at other sites (off-target sites). “Predicting off-target effects remains largely dependent on extensive laboratory testing,” Harrington says. “However, as we accumulate more data, AI is increasingly enabling us to shift this analysis to computational methods.”
Researchers from New York University and Columbia University recently used AI algorithms to predict on- and off-target activity of Cas13 gRNAs. “We had about a quarter of a million measurements of Cas13 guide RNAs and their activity on the human transcriptome in human cells,” says Neville Sanjana, an Associate Professor of Biology at New York University. “At that scale, where you have sufficient data, you can really train these complicated networks, and they can learn something about the properties of effective guide RNAs.”
Not only were they able to predict on-target and off-target activity for gRNAs, they could also use the algorithm to identify gRNAs that knocked down gene expression by a certain amount. For example, they could design gRNAs that knock down gene expression by 40%. They verified these predictions experimentally. “What was amazing is that it does that really well, too,” says Sanjana.
Sanjana says he’s been impressed by the ability of AI to help synthesize information and allow researchers to quickly shift gears toward other fields. “As scientists, we tend to have a lot of knowledge in a very specific direction, but it can be helpful to take that and apply it in a slightly different direction,” he adds. As an example, a researcher who uses CRISPR to make edits to study disease A in organism B could then quickly switch to editing the genome of organism C to study disease D.
From selecting CRISPR enzymes, designing gRNAs, choosing a delivery method, and creating a protocol, CRIPSR experiments are complex. A collaboration between the labs of Mengdi Wang, from Princeton University, and Le Cong, from Stanford University created CRISPR-GPT, a LLM agent to help scientists design CRISPR experiments. They designed CRISPR-GPT to help train new scientists, plan reagents and design protocols, and help troubleshoot experiments by integrating data from various databases, the literature, and a specialized LLM fine-tuned with gene-editing knowledge from 11 years of scientific discussion. Starting from individual tasks like creating gRNAs or analyzing gene-editing data as standalone agents, they then assembled the individual agents into full workflows. Then, by layering on a “virtual scientist” using the fine-tuned LLM, CRISPR-GPT can then understand and respond to user queries, executing complex workflows tailored to each user's research.
“While CRISPR has revolutionized biotechnology, designing and optimizing experiments remains a time-intensive, error-prone, and iterative process,” Wang and Cong told SynBioBeta over email. “We really worked on CRISPR-GPT for building a lab co-pilot that can adapt and evolve with the users.”
Sanjana thinks that AI has certainly helped us design better CRISPR reagents, and the greatest impact will be using AI as a lab mate. He adds, “That is going to really accelerate discoveries in biomedical science.”
Since the discovery of Cas9, scientists have been identifying other CRISPR enzymes in attempts to find enzymes that are better fit for purpose - those that cut more efficiently, those that are smaller and thus easier to package in gene therapy vectors, or those that have minimal off-target editing.
Scientists have been optimizing CRISPR systems for years, and now, with the rise of AI in biotech, many in the field have been using AI to design the best CRISPR system. Let’s take a look at some of the innovations in the field.
Mammoth Biosciences is one of the companies on the search for a better Cas protein. They use computational approaches to first comb through metagenomic databases to identify promising ultracompact Cas proteins and then screen them experimentally. “We recognized that nature had evolved millions of different CRISPR variants, presenting an unprecedented opportunity to discover systems with superior therapeutic potential,” says Lucas Harrington, President, Tx & Co-founder of Mammoth Biosciences.
Once they identify their candidate Cas proteins, they synthesize and screen them in a variety of assays with the help of AI tools that predict their 3-dimensional structures and guide their mutation strategy to get the desired features. “What makes this particularly powerful is its iterative nature,” says Harrington. “Each round of engineering generates new data that improves the model's accuracy, making subsequent rounds of mutation increasingly effective. This dramatically accelerates our protein engineering efforts in the lab.” Recently, they’ve used this process to discover NanoCas, an ultracompact Cas enzyme one-third the size of Cas9.
Mammoth is focusing its efforts on therapeutics, such as treating the genetic diseases familial chylomicronemia syndrome and severe hypertriglyceridemia by targeting the APOC3 gene. They are also working on using their systems in neuromuscular and central nervous system targets.
Scribe Therapeutics is taking a different approach to engineering CRISPR systems. They’re systematically evolving another small CRISPR system based on CasX, discovered by Jennifer Doudna’s lab in 2019 (Doudna is Co-founder of Mammoth and Scribe). Evolving CasX results in a library of thousands to millions of variant enzymes. “We create that in an unbiased manner,” says Benjamin Oakes, Co-founder, President, and CEO of Scribe Therapeutics. “For example, we change every amino acid in the protein to every other amino acid one by one.” Since they have experimental data on all of the variants, they can use this data as inputs for machine learning. “It’s assay-labeled data that is holistic,” says Oakes. “We understand how every single change behaves, and what’s interesting is that data then becomes really useful for future library generation.”
Oakes sees an advantage to their methodology in that they’re starting with proteins that aren’t found in nature. “We essentially have to search out new areas of sequence space that we can only access by engineering because nature doesn't evolve for them,” says Oakes.
Scribe’s work has allowed them to build genetic medicines to treat cardiometabolic diseases with gene editing and epigenetic silencing. Their therapeutic targeting of the PCSK9 gene to lower low-density lipoprotein cholesterol (LDL-C) has been validated in non-human primates.
Another company in this space is Profluent. They recently released the first open-source AI-generated gene editor called OpenCRISPR-1, which is currently used by over a thousand researchers. This enzyme is over 400 mutations away from the well-known SpCas9 and about 200 mutations off from any other known natural Cas protein and has base editing capabilities.
The company is also working on using AI to understand and predict protein-DNA interactions. Their new model Protein2PAM predicts specificity to PAM sites of Cas proteins of Type I, II, and V CRISPR-Cas systems. This model takes a desired PAM sequence and then evolves Cas proteins in silico with specificity for the PAM site.
Cas enzymes are just one part of the equation. Another significant component of CRISPR systems are the guide RNAs (gRNAs) that direct Cas proteins to cut sites. gRNAs can be used to shut off genes completely or turn down their expression. And to be effective, gRNAs need to act at designated sites (on-target sites) while minimizing effects at other sites (off-target sites). “Predicting off-target effects remains largely dependent on extensive laboratory testing,” Harrington says. “However, as we accumulate more data, AI is increasingly enabling us to shift this analysis to computational methods.”
Researchers from New York University and Columbia University recently used AI algorithms to predict on- and off-target activity of Cas13 gRNAs. “We had about a quarter of a million measurements of Cas13 guide RNAs and their activity on the human transcriptome in human cells,” says Neville Sanjana, an Associate Professor of Biology at New York University. “At that scale, where you have sufficient data, you can really train these complicated networks, and they can learn something about the properties of effective guide RNAs.”
Not only were they able to predict on-target and off-target activity for gRNAs, they could also use the algorithm to identify gRNAs that knocked down gene expression by a certain amount. For example, they could design gRNAs that knock down gene expression by 40%. They verified these predictions experimentally. “What was amazing is that it does that really well, too,” says Sanjana.
Sanjana says he’s been impressed by the ability of AI to help synthesize information and allow researchers to quickly shift gears toward other fields. “As scientists, we tend to have a lot of knowledge in a very specific direction, but it can be helpful to take that and apply it in a slightly different direction,” he adds. As an example, a researcher who uses CRISPR to make edits to study disease A in organism B could then quickly switch to editing the genome of organism C to study disease D.
From selecting CRISPR enzymes, designing gRNAs, choosing a delivery method, and creating a protocol, CRIPSR experiments are complex. A collaboration between the labs of Mengdi Wang, from Princeton University, and Le Cong, from Stanford University created CRISPR-GPT, a LLM agent to help scientists design CRISPR experiments. They designed CRISPR-GPT to help train new scientists, plan reagents and design protocols, and help troubleshoot experiments by integrating data from various databases, the literature, and a specialized LLM fine-tuned with gene-editing knowledge from 11 years of scientific discussion. Starting from individual tasks like creating gRNAs or analyzing gene-editing data as standalone agents, they then assembled the individual agents into full workflows. Then, by layering on a “virtual scientist” using the fine-tuned LLM, CRISPR-GPT can then understand and respond to user queries, executing complex workflows tailored to each user's research.
“While CRISPR has revolutionized biotechnology, designing and optimizing experiments remains a time-intensive, error-prone, and iterative process,” Wang and Cong told SynBioBeta over email. “We really worked on CRISPR-GPT for building a lab co-pilot that can adapt and evolve with the users.”
Sanjana thinks that AI has certainly helped us design better CRISPR reagents, and the greatest impact will be using AI as a lab mate. He adds, “That is going to really accelerate discoveries in biomedical science.”