Large Language Models in Drug Discovery

Gangwal, Amit; Lavecchia, Antonio

doi:10.1007/978-3-031-98022-0_14

Large language models (LLMs) like generative pre-trained Transformers and bidirectional encoder representations from Transformers have transformed natural language processing (NLP) and are increasingly applied in drug discovery. These models, trained on vast datasets, excel at text generation, comprehension, and pattern recognition, making them ideal for analyzing biomedical data, predicting drug interactions, and identifying new drug candidates. LLMs can synthesize information from multiple sources, speeding up hypothesis generation and streamlining drug development processes. They help overcome data scarcity issues by generating synthetic data, which enhances model training and prediction accuracy. In drug discovery, LLMs assist with molecule screening, target identification, and clinical trial optimization, reducing time and costs associated with traditional methods. Despite their potential, challenges remain, including data quality, interpretability, and the need for domain-specific adaptations. LLMs are also being explored for their ability to predict pharmacokinetics, toxicity, and drug–drug interactions (DDI). Future advancements will focus on integrating LLMs with other AI techniques, like Reinforcement Learning (RL) and generative models, to further enhance their capabilities in drug discovery. Collaboration between academia, industry, and regulatory bodies will be crucial for overcoming challenges and realizing the full potential of LLMs in delivering new, effective therapies.

Large Language Models in Drug Discovery / Gangwal, A., Lavecchia, A.. - (2026), pp. 437-468. [10.1007/978-3-031-98022-0_14]