Is Autonomous Drug Design the Future? A Review of AI and Machine Learning in Accelerating Drug Development

Anushya Krishnan, Scientific Collaborator, ReaxionLab

www.linkedin.com/in/anushyakrishnan

Received June 17, 2025. Accepted July 06, 2025.

Reaxion Crucible 2025, 1 (1): e2025002

Abstract

The incorporation of artificial intelligence (AI) into drug discovery has revolutionized how novel therapeutic agents are identified and developed. This article explores AI-driven approaches across the drug discovery pipeline, from target identification to molecule generation, screening, and synthesis. Machine learning (ML) models, including generative adversarial networks, variational autoencoders, and reinforcement learning, have demonstrated the ability to design novel compounds with optimized pharmacological properties. Key examples include the discovery of halicin, a novel antibiotic, and the rapid development of CDK20 (Cyclin-dependent kinase 20) inhibitors using AlphaFold and the Chemistry42 platform. AI also plays a growing role in multi-target drug design and protein engineering, exemplified by tools such as RFdiffusion. ML also enhances high-throughput screening, de novo protein design, and retrosynthetic route planning. These advancements significantly reduce the time and cost of early-stage drug development, offering a powerful complement to traditional experimental methods.

Keywords: Artificial intelligence, drug discovery, machine learning, target identification, de novo molecule design, reinforcement learning, AlphaFold, drug-target interaction, generative models, multi-target drug design

Drug discovery is a comprehensive process aimed at identifying active compounds that can alter disease progression and address unmet therapeutic needs through the development of new molecular entities[1]. These agents may be derived from various sources, including synthetic chemicals, natural products, biologicals, or repurposed existing drugs. However, challenges such as unknown disease mechanisms particularly in nervous system disorders make target identification difficult [2]. The process begins with identifying a biological target, such as a protein or receptor involved in disease progression. Despite advances that have improved speed, precision, and cost-efficiency, modern drug discovery still faces significant hurdles. Failures due to poor efficacy, adverse effects, or commercial limitations remain common. In recent years, artificial intelligence (AI), particularly machine learning (ML), has emerged as a transformative force in drug discovery. These technologies can uncover drug-disease relationships that might otherwise go unnoticed. AI excels at recognizing complex patterns within large datasets, helping researchers identify potential drug targets and design novel therapeutic molecules with greater accuracy and speed. However, AI-based approaches also pose challenges, particularly regarding potential biases and fairness in predictions [3]. Approaches like data augmentation and explainable AI can help improve model reliability while addressing concerns about bias and transparency.

The Drug Discovery Pipeline

Identifying Drug Targets and Generating Novel Molecules

The first step in drug discovery involves identifying biological targets, such as proteins, enzymes, or receptors, that play a role in disease progression [4]. This is often hindered by the complexity of biological systems and the vast amount of data involved. AI and ML models can mine genomic, proteomic, and disease-related datasets to uncover patterns and predict viable therapeutic targets. Approaches like unsupervised learning and network-based models help in understanding disease mechanisms more comprehensively. Once viable targets are identified, AI facilitates the design and optimization of therapeutic molecules. Traditional drug development tends to be slow and iterative, but AI-based generative models like Generative Adversarial Networks (GANs)[5], Variational Autoencoders (VAEs)[6], and Reinforcement Learning (RL)[7] have demonstrated greater speed and efficiency. These models can generate novel chemical structures and optimize them for key properties like binding affinity, solubility, and toxicity, even before any laboratory synthesis. For instance, halicin, a novel antibiotic, was discovered through a deep learning model that screened over 107 million compounds from the ZINC15 database. Its antibacterial activity was later validated in vitro and in vivo [8]. Reinforcement learning has also been used to guide graph-based generative models to produce molecules with optimal size, drug-likeness, and predicted dopamine receptor D2 (DRD2) activity, even for compounds outside the training set [7]. Apart from these, recent approaches also include multi-target de novo design, where AI models are used to create compounds that act on more than one target simultaneously. A Perturbation Theory and Machine Learning (PTML) model was developed to generate virtual dual inhibitors for CDK4 and HER2. This model achieved over 75% sensitivity and specificity in validation and designed six compounds, three of which were predicted to be effective dual inhibitors [9].

Screening, Evaluating Candidates and Engineering Therapeutic Proteins

High-throughput screening of candidate molecules is traditionally resource-intensive. AI enhances this process by predicting molecular interactions and prioritizing compounds with the highest probability of success. For example, DeepMind’s AlphaFold has had a major impact by accurately predicting the 3D shapes of proteins based on their amino acid sequences which made it easier to study how drugs interact with their targets and significantly improved virtual screening processes [10]. Beyond small molecules, AI is also advancing the design of therapeutic proteins. Recent breakthroughs in diffusion models have enabled de novo protein design. The RFdiffusion framework, built on RoseTTAFold and fine-tuned for structure denoising, successfully generated protein monomers, symmetric oligomers, and binders with high experimental accuracy. The accuracy of these designs has been validated through Cryo-EM imaging, showing a close match between predicted and actual structures [11].

Automated Synthesis and Optimization & Key Outcomes

AI models are also used in retrosynthetic analysis which is planning how to synthesize molecules from available precursors [12]. Transformer-based models, such as the Molecular Transformer, are trained on large datasets of chemical reactions (encoded in SMILES strings) and can recommend feasible reaction steps, including reagents, solvents, and catalysts. This integration of virtual molecule generation with practical synthesis greatly accelerates the drug discovery and development process [12]. Recent successes in AI-driven drug discovery highlight the transformative potential of machine learning across multiple stages of the pipeline. A reinforcement learning (RL)-guided generative model was able to produce diverse compounds with predicted dopamine receptor D2 (DRD2) activity in 95% of samples, significantly outperforming earlier models. This demonstrates the capability of RL in generating target-specific molecular structures during de novo design [8]. Another notable advancement is the cell-based multi-target QSAR (CBMT-QSAR) model, developed to identify anticancer agents across 17 liver cancer cell lines. The model achieved over 80% predictive accuracy and facilitated the virtual design of eight drug-like compounds, six of which were predicted to be effective across all tested cell lines. This underscores the utility of ML in generating multi-target drug candidates with high therapeutic relevance [13]. In a more integrated approach, AlphaFold was used to predict the 3D structure of CDK20, which was then refined and analyzed using the Chemistry42 AI platform to identify binding pockets and generate inhibitors. Out of 8,918 designed molecules, seven were synthesized, and one compound, ISM042-2-001, showed promising binding with a Kd of 9.2 µM. The entire workflow, from structure prediction to hit identification was completed within 30 days, exemplifying the efficiency of AI-powered platforms in accelerating early-stage drug discovery [14]. Table 1 provides a summary of these and other key case studies, showcasing how AI tools have been successfully applied across different phases of the drug discovery process.

Table 1. Key Examples of AI Applications Across the Drug Discovery Pipeline.

This table highlights notable case studies demonstrating the use of artificial intelligence at various stages of drug discovery, including target identification, molecule generation, screening, and optimization.

Stage

AI Techniques / Tools

Use Case

Target Identification

ML on omics data, Network analysis, imaging data

Predicting new protein targets in Alzheimer’s from gene expression datasets

Molecule Generation

GANs, VAEs, RL

Designing halicin, a novel antibiotic, using deep learning on 100M compounds

Screening & Evaluation

AlphaFold, Virtual Screening, Docking

AlphaFold used to predict 3D structure of CDK20; docking used to test drug fit

Optimization

Retrosynthesis AI, Transformer models

AI model plans chemical steps to synthesize a promising lead compound

Preclinical Prediction

QSAR (Quantitative Structure–Activity Relationship), Toxicity prediction

ML model identifies fragments effective across 17 liver cancer cell lines

Conclusion

AI is rapidly reshaping drug discovery by addressing long-standing inefficiencies and unlocking new capabilities across the development pipeline. From identifying disease targets to designing, evaluating, and synthesizing potential therapeutics, AI driven approaches have demonstrated clear advantages in speed, scalability, and precision. Case studies such as the discovery of halicin and the development of multi-target anticancer agents highlight AI’s potential to generate breakthrough therapies. As these technologies mature, integrating explainable AI and bias mitigation strategies will be crucial to ensuring trustworthy and equitable outcomes. With continued advancements, AI is set to become an indispensable tool in the future of drug discovery and development.

References:

I. Bano, U. D. Butt and S. A. H. Mohsan, in Novel Platforms for Drug Delivery Applications, Elsevier, 2023, pp. 619–643.

F. on N. and N. S. Disorders, B. on H. S. Policy and I. of Medicine, in Improving and Accelerating Therapeutic Development for Nervous System Disorders: Workshop Summary, National Academies Press (US), 2014.

J. Kleinberg, in Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, ACM, Irvine CA USA, 2018, pp. 40–40.

Q. Wu, J. Zheng, X. Sui, C. Fu, X. Cui, B. Liao, H. Ji, Y. Luo, A. He, X. Lu, X. Xue, C. S. H. Tan and R. Tian, Chem. Sci., 2024, 15, 2833–2847.

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, arXiv, 2014, preprint, DOI: 10.48550/ARXIV.1406.2661.

D. P. Kingma and M. Welling, arXiv, 2013, preprint, DOI: 10.48550/ARXIV.1312.6114.

S. R. Atance, J. V. Diez, O. Engkvist, S. Olsson and R. Mercado, J. Chem. Inf. Model., 2022, 62, 4863–4872.

J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, V. M. Tran, A. Chiappino-Pepe, A. H. Badran, I. W. Andrews, E. J. Chory, G. M. Church, E. D. Brown, T. S. Jaakkola, R. Barzilay and J. J. Collins, Cell, 2020, 180, 688-702.e13.

V. V. Kleandrova, M. T. Scotti, L. Scotti and A. Speck-Planche, CTMC, 2021, 21, 661–675.

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A. W. Senior, K. Kavukcuoglu, P. Kohli and D. Hassabis, Nature, 2021, 596, 583–589.

J. L. Watson, D. Juergens, N. R. Bennett, B. L. Trippe, J. Yim, H. E. Eisenach, W. Ahern, A. J. Borst, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, N. Hanikel, S. J. Pellock, A. Courbet, W. Sheffler, J. Wang, P. Venkatesh, I. Sappington, S. V. Torres, A. Lauko, V. De Bortoli, E. Mathieu, S. Ovchinnikov, R. Barzilay, T. S. Jaakkola, F. DiMaio, M. Baek and D. Baker, Nature, 2023, 620, 1089–1100.

P. Schwaller, R. Petraglia, V. Zullo, V. H. Nair, R. A. Haeuselmann, R. Pisoni, C. Bekas, A. Iuliano and T. Laino, Chem. Sci., 2020, 11, 3316–3325.

V. V. Kleandrova, M. T. Scotti, L. Scotti, A. Nayarisseri and A. Speck-Planche, SAR and QSAR in Environmental Research, 2020, 31, 815–836.

F. Ren, X. Ding, M. Zheng, M. Korzinkin, X. Cai, W. Zhu, A. Mantsyzov, A. Aliper, V. Aladinskiy, Z. Cao, S. Kong, X. Long, B. H. Man Liu, Y. Liu, V. Naumov, A. Shneyderman, I. V. Ozerov, J. Wang, F. W. Pun, D. A. Polykovskiy, C. Sun, M. Levitt, A. Aspuru-Guzik and A. Zhavoronkov, Chem. Sci., 2023, 14, 1443–1452.

Disclaimer: The views, interpretations, and conclusions presented in this article are those of the author(s) alone and do not necessarily reflect those of the journal, editorial board, or publisher. The journal assumes no responsibility for any loss, damage, or consequences arising from the use of the information, data, or methods described. Readers are encouraged to critically evaluate the content before applying it in practice.

Open Access: This article is published under a Creative Commons Attribution (CC BY 4.0) license. You are free to share and adapt the material, provided proper credit is given to the original author(s) and source.