OpenFold Advances Protein Modeling with AI and Supercomputing Power

Proteins, life’s building blocks, perform a wide range of functions based on their unique shapes. The molecules fold into specific forms and shapes that define their roles, from catalyzing biochemical reactions to providing structural support and enabling cellular communication.

Predicting the protein structure is challenging due to the complexity of the folds and shapes. Even slight variations in folding can significantly alter a protein’s function.

To address this complexity, researchers have developed a new open-source software tool called OpenFold that leverages the power of supercomputers and AI to predict protection structures. This can help scientists gain a deeper understanding of misfolded proteins associated with neurodegenerative diseases, such as Parkinson’s and Alzheimer’s disease, and develop new medicines. 

OpenFold, which was announced in a study published in the Nature Methods journal, builds on the success of AlphaFold2, an AI program developed by DeepMind that predicts the structure and interactions between biological molecules with unprecedented accuracy. 

AlphaFold2 is being used by over two million researchers for protein predictions in various fields, including drug discovery and medical treatments. While AlphaFold2 offers exceptional accuracy, it is limited by its lack of accessible code and data for training new models. 

(Shutterstock)

This restricts its application to new tasks, like protein-ligand complex structure prediction, understanding its learning process, or assessing the model’s capacity for unseen regions of fold space.

The research for OpenFold was initiated by Dr. Nazim Bouatta, a senior research fellow at Harvard Medical School, and his colleague Mohammed AlQuraishi, formerly at Harvard but now at Columbia University. The project was supported by several other researchers from Harvard and Columbia. 

The project eventually grew into the OpenFold Consortium, a non-profit AI research and development consortium developing free and open-source software tools for biology and drug discovery.

A core component of AI-based research is large language models (LLMs), which can process vast amounts of data to generate new and meaningful insights. The ability to use natural language to interact with AI has greatly enhanced accessibility and usability, allowing users to communicate with these systems more intuitively and effectively. 

One of the earliest applications of OpenFold was by Meta AI, formerly known as Facebook. Meta AI recently used OpenFold to integrate a ‘protein language model’ to launch an atlas featuring over 600 million proteins from bacteria, viruses, and other microorganisms that had not yet been characterized. 

Bouatta explained that living organizations are also organized in a language, referring to the four bases of DNA – adenine, cytosine, guanine, and thymine. “This is the language that nature picked to build these sophisticated living organisms.”

He further elaborated that proteins have a second layer of language, represented by the 20 amino acids that make up all proteins in the human body and determine their functions. While genome sequencing has gathered extensive data on these biological “letters”, a crucial piece that has been missing is a “dictionary” that can translate this data into predicting shapes. 

“Machine learning allows us to take a string of letters, the amino acids that describe any kind of protein that you can think of, run a sophisticated algorithm, and return an exquisite three-dimensional structure that is close to what we get using experiments. The OpenFold algorithm is very sophisticated and uses new developments that we’re familiar with from ChatGPT and others,” said Bouatta. 

The research was supported by Flatiron Institute, OpenBioML, Stability AI, the Texas Advanced Computing Center (TACC), and NVIDIA, all of whom provided the resources needed for the experiments described in this paper.

TACC provided the OpenFold team access to Lonestar6 and Frontera supercomputers, enabling large-scale machine learning and AI deployments that significantly accelerated their research and computational capabilities. 

Supercomputers, combined with AI, have transformed biological research by enabling the accurate and efficient prediction of protein structures. While these tools shouldn’t replace lab experiments, they do significantly enhance the speed and precision of research. According to Bouatta, supercomputers are the “microscope of the modern era for biology and drug discovery” and they have immense potential to help us understand life and cure diseases.

Related Items 

NCSA’s SEAS Team Makes Advanced Computing More Efficient and Accessible 

The Path to Insight Is Changing: The AI-HPC Paradigm Shift 

Nvidia Taps Into Generative AI Fervor with Unveiling of AI Foundations Cloud Services