In a typically black box field, a recent study found that language models can outperform state-of-the-art results in chemistry. They do this all without a sophisticated knowledge of the topic at hand. Jürgen Bajorath, cheminformatics scientist at the Lamarr Institute for Machine Learning and Artificial Intelligence at the University of Bonn. He and his colleague Jannik P. Roth, both from the University of Freiburg, recently published their results in the journal Patterns. Their work fully unpacks the predictions of transformer models in molecular design. As a case study they focus on sequence-based molecular design as a target domain.
The paper gets into the nitty-gritty of how well models work. In particular we explore their promise for predicting inhibitors for important enzyme families. Bajorath and Roth explore these associations to demonstrate the potential of existing language models in drug discovery. They expose the modeling challenges that these models encounter in this very particular context.
Understanding Transformer Models
Bajorath and Roth’s study utilized a sequence-based molecular design framework to gain insights into the predictive capabilities of transformer models. First, they fed the model information about specific families of enzymes and their respective inhibitors. This design provided a unique opportunity to analyze how well the model could identify potentially harmful interactions just by comparing similarities in amino acid sequences.
What the researchers found is pretty interesting. To their surprise, they discovered that the model considered enzymes, receptors, and other proteins equivalents as the amino acid sequences lined up by 50 to 60 %. This enabled the model to propose novel inhibitors based on such similarity.
“We used sequence-based molecular design as a test system to better understand how transformers arrive at their predictions,” – Jannik Roth
This innovative methodology underscores the importance of amino acid sequences in forecasting molecular interactions. Improbably, it does all this without requiring any understanding of the fundamental chemical concepts.
Limitations of Language Models
So, despite their utility, Bajorath is demonstrative in emphasizing that all language models are still opaque at their core. He argues that it’s nearly impossible to tell how these models come to their answers.
“All language models are a black box,” – Prof. Dr. Jürgen Bajorath
Bajorath elaborates on this notion, stating, “It’s difficult to look inside their heads, metaphorically speaking.” This highlights a critical concern in cheminformatics: while these models can suggest plausible inhibitors, they may not grasp the essential functional distinctions within enzyme sequences.
Specifically, during training, the models never learn to tell the functionally important sequence parts from the unimportant ones. Bajorath notes this limitation, stating, “This suggests that the model has not learned generally applicable chemical principles, i.e., how enzyme inhibition usually works chemically.”
Implications for Drug Research
The conclusions reached in Bajorath and Roth’s study are enormously relevant to drug discovery and molecular design. The researchers caution that a generalization informed by what is statistically detectable similarity can serve as an important rule of thumb. Yet, it could never supplant the nuanced expertise of the chemistry behind chemical interactions.
“Such a rule of thumb based on statistically detectable similarity is not necessarily a bad thing,” – Prof. Dr. Jürgen Bajorath
This perspective is particularly relevant given the growing reliance on language models like ChatGPT, Google Gemini, and Elon Musk’s “Grok.” These models are trained on vast quantities of text data, enabling them to generate sentences independently but lacking specific chemical knowledge.
Bajorath points out that NLP models can help accelerate discovery as a common thread across multi-disciplinary domains such as drug discovery. Researchers should be wary of their shortcomings. For instance, when testing a new enzyme from an established family, the algorithm correctly suggested a plausible inhibitor based on previously learned patterns.
“When we then used a new enzyme from the same family for testing purposes, the algorithm actually suggested a plausible inhibitor,” – Prof. Dr. Jürgen Bajorath
Bajorath heads the “AI in Life Sciences and Health” program at the Lamarr Institute. His research is deeply engaged in exploring the best ways to integrate artificial intelligence into scientific fields, particularly drug research.