Breakthrough Study Unlocks Protein Language Using AI to Combat Clumping Issues

In a new foundational study, researchers have managed to train artificial intelligence (AI) to crack the complex protein language. This experimental campaign required the design of more than 100,000 random protein fragments. Thirdly, it has great importance to pharmaceutical companies addressing protein aggregation problems. Director of the Center for Genomic Regulation (CRG) Dr. Mike Thompson,…

Lisa Wong Avatar

By

Breakthrough Study Unlocks Protein Language Using AI to Combat Clumping Issues

In a new foundational study, researchers have managed to train artificial intelligence (AI) to crack the complex protein language. This experimental campaign required the design of more than 100,000 random protein fragments. Thirdly, it has great importance to pharmaceutical companies addressing protein aggregation problems. Director of the Center for Genomic Regulation (CRG) Dr. Mike Thompson, who’s co-leading one such significant study. Co-authored by Dr. Benedetta Bolognesi from the Institute for Bioengineering of Catalonia (IBEC), the research uncovers how proteins consisting of twenty different amino acids can provide insight into human diseases and advance synthetic biology.

The research team set out to tackle an ambitious project. Their intention was to design completely de novo (i.e., from scratch) polypeptide fragments, consisting of twenty residues. Their initial experiments already found that roughly one fifth of the protein fragments led to clumping. This clumping poses significant obstacles for drug development. By employing advanced DNA synthesis and sequencing techniques, the researchers conducted hundreds of thousands of experiments in a single tube, effectively streamlining the data collection process.

Those behind this monumental study’s findings are poised to revolutionize the understanding of protein. They will change the way human health is understood and disease is treated. This data was then used to train the AI models implemented in this study. This presented an impressive set of opportunities for future research.

The Language of Proteins

Proteins are the molecular machinery of all living organisms, and they are made up of an amazing twenty different amino acids. These amino acids can then join together in a vast number of combinations, like letters creating a trillions of words. Each one of these “motifs” prescribes the activity of thousands of distinct proteins. Expectations Evolution has barely begun to tap the expansive, rich sequences possible. This implies that there’s a huge swath of unexplored opportunity outside of the protein coding.

What the researchers found is truly remarkable. In fact, they discovered that there are approximately 1,024 quintillion different combinations for building a protein fragment comprised of twenty amino acids. This overwhelming statistic is a testimony to the complicated nature of protein interactions. It underscores the critical need for novel methods to study their diverse functions in biological systems. By harnessing AI technology and machine learning, the study sought to explore the multi-dimensional universe of protein sequences that are largely untapped.

Tackling Protein Aggregation Issues

We had no idea that protein aggregation is, in fact, the biggest roadblock for drug developers. This makes it so that drug formulations can no longer be effective. As such, the ability to predict and avoid clumping will be key for engineering safe and effective therapeutics. The researchers created a groundbreaking methodology that provides a new look at the problem. This strategy enables scientists to comprehend how various protein pieces interact with each other.

As a result of their experiments, the team discovered that 21,936 of the 100,000 synthesized protein fragments showed clumping activity. This exploration demonstrates just how ubiquitous aggregation is among proteins. It further reveals details about how individual sequences play a role in this process. Knowledge of these interactions can guide drug design to create treatments that are more effective and have fewer side effects.

Collaboration and Future Implications

This work is a strong example of an engaging partnership between leading institutions. Perhaps most impressively, the Center for Genomic Regulation (CRG), Cold Spring Harbor Laboratory (CSHL), and Wellcome Sanger Institute have come together. By combining their skills and resources, these researchers have raised the bar for protein studies to create a standard for others to achieve.

This research is crucial for reasons that extend even beyond fundamental science. It informs the work of those in synthetic biology who are trying to engineer proteins with new, specific, and desirable properties. These AI models, trained on ~100,000 protein fragments, can be an incredible predictive science and modeling tool. This allows scientists to design proteins with specific applications for medicine and biotechnology.