What if scientists could edit proteins as easily as writing a sentence? Researchers from Zhejiang University and HKUST (Guangzhou) have developed ProtET, an AI model that enables controllable protein modifications using simple text-based instructions. Published in Health Data Science, this groundbreaking approach is redefining functional protein design, with implications for enzyme activity, stability, and antibody binding.
Proteins drive nearly every biological function, and their precise editing has vast potential for medicine, biotechnology, and synthetic biology. However, traditional methods rely on slow, labor-intensive experiments and single-task AI models that offer limited flexibility. “Current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback,” explained the study authors.
ProtET changes that equation by employing a transformer-based architecture and a hierarchical training method that aligns protein sequences with natural language descriptions through contrastive learning. This allows scientists to modify proteins using intuitive text commands—making protein design more accessible and versatile than ever before.
Led by Mingze Yin of Zhejiang University and Jintai Chen of HKUST (Guangzhou), the research team trained ProtET on an extensive dataset of 67 million protein–biotext pairs derived from Swiss-Prot and TrEMBL databases. The model’s impact was profound: “ProtET improves the state-of-the-art results by a large margin, leading to substantial stability improvements of 16.67% and 16.90%,” stated the authors. It also optimized catalytic activity and antibody binding, outperforming existing AI-driven methods.
"ProtET introduces a flexible, controllable approach to protein editing, allowing researchers to fine-tune biological functions with unparalleled precision," said Mingze Yin, the study’s lead author.
ProtET successfully optimized protein sequences in various experimental scenarios, including enzyme activity enhancement, protein stability improvements, and antibody-antigen binding. In a striking zero-shot task, it designed SARS-CoV antibodies capable of forming stable and functional 3D structures, demonstrating its potential for real-world biomedical applications.
Looking forward, the researchers see ProtET becoming a standard tool for protein engineering, accelerating advancements in synthetic biology, genetic therapies, and biopharmaceutical manufacturing. By seamlessly integrating biological and natural language processing, this model marks a transformative leap in AI-driven protein design, opening doors to unprecedented scientific discoveries.
What if scientists could edit proteins as easily as writing a sentence? Researchers from Zhejiang University and HKUST (Guangzhou) have developed ProtET, an AI model that enables controllable protein modifications using simple text-based instructions. Published in Health Data Science, this groundbreaking approach is redefining functional protein design, with implications for enzyme activity, stability, and antibody binding.
Proteins drive nearly every biological function, and their precise editing has vast potential for medicine, biotechnology, and synthetic biology. However, traditional methods rely on slow, labor-intensive experiments and single-task AI models that offer limited flexibility. “Current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback,” explained the study authors.
ProtET changes that equation by employing a transformer-based architecture and a hierarchical training method that aligns protein sequences with natural language descriptions through contrastive learning. This allows scientists to modify proteins using intuitive text commands—making protein design more accessible and versatile than ever before.
Led by Mingze Yin of Zhejiang University and Jintai Chen of HKUST (Guangzhou), the research team trained ProtET on an extensive dataset of 67 million protein–biotext pairs derived from Swiss-Prot and TrEMBL databases. The model’s impact was profound: “ProtET improves the state-of-the-art results by a large margin, leading to substantial stability improvements of 16.67% and 16.90%,” stated the authors. It also optimized catalytic activity and antibody binding, outperforming existing AI-driven methods.
"ProtET introduces a flexible, controllable approach to protein editing, allowing researchers to fine-tune biological functions with unparalleled precision," said Mingze Yin, the study’s lead author.
ProtET successfully optimized protein sequences in various experimental scenarios, including enzyme activity enhancement, protein stability improvements, and antibody-antigen binding. In a striking zero-shot task, it designed SARS-CoV antibodies capable of forming stable and functional 3D structures, demonstrating its potential for real-world biomedical applications.
Looking forward, the researchers see ProtET becoming a standard tool for protein engineering, accelerating advancements in synthetic biology, genetic therapies, and biopharmaceutical manufacturing. By seamlessly integrating biological and natural language processing, this model marks a transformative leap in AI-driven protein design, opening doors to unprecedented scientific discoveries.