[DALL-E]

How AI and LLMs Could Revolutionize Functional Genomics Research

LLMs like GPT-4 are proving to be indispensable tools in unraveling the complex functions of genes
AI & Digital Biology
Reading, Writing, and Editing DNA
by
|
December 2, 2024

The quest to unravel the intricate workings of genes and their interactions lies at the heart of functional genomics. In a recent Nature Methods article, investigators at the University of California San Diego School of Medicine have now spotlighted how large language models (LLMs) like GPT-4 could automate this challenging field, potentially transforming how scientists approach functional genomics.

Gene set enrichment, one of the most widely used methods in this domain, typically involves comparing experimentally identified gene sets to established genomic databases. However, such databases often lack the scope to capture more novel or nuanced biological insights. By leveraging artificial intelligence (AI) to analyze these gene sets, researchers aim to drastically cut down the hours of manual labor currently required and pave the way for further automation in understanding gene interactions.

“Gene set enrichment is a mainstay of functional genomics, but it relies on gene function databases that are incomplete,” the researcher wrote. “Here, we evaluated five large language models (LLMs) for their ability to discover the common functions represented by a gene set, supported by molecular rationale and a self-confidence assessment.” 

The investigators went on to state that “For curated gene sets from Gene Ontology, GPT-4 suggests functions similar to the curated name in 73% of cases, with higher self-confidence predicting higher similarity. Conversely, random gene sets correctly yield zero confidence in 87% of cases.”

GPT-4 Leads the Pack

The UC San Diego team put five different LLMs to the test, with GPT-4 emerging as the most accurate. It achieved an impressive 73% success rate in identifying the functions of curated gene sets drawn from commonly used genomic databases. Its precision extended to rejecting random gene sets as valid in 87% of cases, minimizing the risk of hallucinated results—a common issue with AI models. GPT-4 also stood out for its ability to provide detailed, coherent narratives to support its analyses, a feature that could help researchers better interpret AI-generated insights.

In comparison, other models, such as GPT-3.5, Gemini Pro, Mixtral Instruct, and Llama2-70b, demonstrated varying levels of function recovery but often produced falsely confident assessments for random sets.

A Step Toward Automating Functional Genomics

The implications of this study extend beyond accuracy rates. GPT-4’s capabilities could address a longstanding bottleneck in functional genomics: the reliance on incomplete databases. By synthesizing molecular rationales and providing confidence assessments for its analyses, GPT-4 offers a way to bridge this gap, even identifying common functions in 45% of gene clusters derived from omics data with high specificity.

While these results are promising, the researchers acknowledge that further investigation is needed to fully realize the potential of LLMs in genomics. To this end, they have created a web portal that allows other scientists to explore and incorporate these models into their workflows. This initiative is part of a broader call to invest in the development of LLMs for genomics and precision medicine applications.

AI: Transforming the Scientific Process

The findings highlight the transformative power of AI in scientific research. Beyond functional genomics, LLMs have the potential to revolutionize the broader scientific process by synthesizing complex datasets, generating new hypotheses, and making testable predictions—all at unprecedented speed. With continued advancements, these tools could bring scientists closer to automating some of the most labor-intensive aspects of genomics research.

This study serves as a compelling example of how AI can complement human expertise, providing researchers with a new class of "omics assistants" that could accelerate discovery and innovation in the life sciences.

Related Articles

No items found.

How AI and LLMs Could Revolutionize Functional Genomics Research

by
December 2, 2024
[DALL-E]

How AI and LLMs Could Revolutionize Functional Genomics Research

LLMs like GPT-4 are proving to be indispensable tools in unraveling the complex functions of genes
by
December 2, 2024
[DALL-E]

The quest to unravel the intricate workings of genes and their interactions lies at the heart of functional genomics. In a recent Nature Methods article, investigators at the University of California San Diego School of Medicine have now spotlighted how large language models (LLMs) like GPT-4 could automate this challenging field, potentially transforming how scientists approach functional genomics.

Gene set enrichment, one of the most widely used methods in this domain, typically involves comparing experimentally identified gene sets to established genomic databases. However, such databases often lack the scope to capture more novel or nuanced biological insights. By leveraging artificial intelligence (AI) to analyze these gene sets, researchers aim to drastically cut down the hours of manual labor currently required and pave the way for further automation in understanding gene interactions.

“Gene set enrichment is a mainstay of functional genomics, but it relies on gene function databases that are incomplete,” the researcher wrote. “Here, we evaluated five large language models (LLMs) for their ability to discover the common functions represented by a gene set, supported by molecular rationale and a self-confidence assessment.” 

The investigators went on to state that “For curated gene sets from Gene Ontology, GPT-4 suggests functions similar to the curated name in 73% of cases, with higher self-confidence predicting higher similarity. Conversely, random gene sets correctly yield zero confidence in 87% of cases.”

GPT-4 Leads the Pack

The UC San Diego team put five different LLMs to the test, with GPT-4 emerging as the most accurate. It achieved an impressive 73% success rate in identifying the functions of curated gene sets drawn from commonly used genomic databases. Its precision extended to rejecting random gene sets as valid in 87% of cases, minimizing the risk of hallucinated results—a common issue with AI models. GPT-4 also stood out for its ability to provide detailed, coherent narratives to support its analyses, a feature that could help researchers better interpret AI-generated insights.

In comparison, other models, such as GPT-3.5, Gemini Pro, Mixtral Instruct, and Llama2-70b, demonstrated varying levels of function recovery but often produced falsely confident assessments for random sets.

A Step Toward Automating Functional Genomics

The implications of this study extend beyond accuracy rates. GPT-4’s capabilities could address a longstanding bottleneck in functional genomics: the reliance on incomplete databases. By synthesizing molecular rationales and providing confidence assessments for its analyses, GPT-4 offers a way to bridge this gap, even identifying common functions in 45% of gene clusters derived from omics data with high specificity.

While these results are promising, the researchers acknowledge that further investigation is needed to fully realize the potential of LLMs in genomics. To this end, they have created a web portal that allows other scientists to explore and incorporate these models into their workflows. This initiative is part of a broader call to invest in the development of LLMs for genomics and precision medicine applications.

AI: Transforming the Scientific Process

The findings highlight the transformative power of AI in scientific research. Beyond functional genomics, LLMs have the potential to revolutionize the broader scientific process by synthesizing complex datasets, generating new hypotheses, and making testable predictions—all at unprecedented speed. With continued advancements, these tools could bring scientists closer to automating some of the most labor-intensive aspects of genomics research.

This study serves as a compelling example of how AI can complement human expertise, providing researchers with a new class of "omics assistants" that could accelerate discovery and innovation in the life sciences.

RECENT INDUSTRY NEWS
RECENT INSIGHTS
Sign Up Now