Publications

A Reinforcement Learning (RL)-Motivated Simulation Framework for Evaluating Vancomycin Dosing Strategies

Bingyu Mao, Ziqian Xie, Laila Rasmy, Masayuki Nigo, Degui Zhi

AMIA Annual Symposium, Feb 2026

Abstract

Achieving and maintaining the therapeutic range in vancomycin treatment is important for optimal outcomes. While guidelines and best practices based on empirical studies exist, the theoretical best dosing strategies under various conditions remain illusive. We developed an RL-based simulation framework using a deep learning two-compartment pharmacokinetic model (PK-RNN-2CM) and introduced the area under the time-concentration curve (AUC) reward score, which translates clinical guidelines into an RL reward. Ground truth time-concentration curves were generated from patient-specific data, and simulated curves were produced under different dosing strategies with optional noise perturbations to mimic real-world settings. Evaluation metrics included 24-hour AUC assessments and RMSE. Results indicated that while the low-dosing AUC target (low-doser) and the high-dosing AUC target (high-doser) performed comparably in noise-free conditions, the low-doser achieved slightly higher AUC reward scores under noisy conditions, whereas the high-doser exhibited greater stability. This framework opens new approaches for optimizing vancomycin dosing.

Opportunities for AI-based Model-Informed Drug Development: A Comparative Analysis of NONMEM and AI-Based Models for Population Pharmacokinetic Prediction

Bingyu Mao, Yue Gao, Christine Xu, Sreeraj Macha, Shuai Shao, Malidi Ahamadi

The American Association of Pharmaceutical Scientists Journal, Nov 2025

Abstract

Model-informed drug development (MIDD) plays an important role in pharmacometrics by analyzing clinical data using mathematical models to optimize drug dosing strategies. Traditional methods such as nonlinear mixed effects modeling (NONMEM) have long been the gold standard in population pharmacokinetic (PPK) modeling. However, the development of artificial intelligence (AI) presents a potential improvement in predictive accuracy and computational efficiency. This study evaluates the effectiveness of AI-based MIDD methods for PPK analysis, comparing them against traditional NONMEM models. We tested five machine learning (ML) models, three deep learning (DL) models, and a neural ordinary differential equations (ODE) model on both simulated and real clinical datasets under different scenarios, assessing accuracy with metrics such as root mean squared error (RMSE) and coefficient of determination (R²). Simulated datasets with known ground truth were created using a two-compartment model, while the real clinical dataset included data from 1,770 patients pooled from multiple clinical trials. Results indicate that AI/ML models often outperform NONMEM, with variations in performance depending on model type and data characteristics. Neural ODE models showed good performance, providing high accuracy and explainability with large datasets. This work provides valuable insights into the relative strengths and future applications of AI/ML in PPK modeling.

Evaluating the Impact of AI-Based Model-Informed Drug Development (MIDD): A Comparative Review

Bingyu Mao, Yue Gao, Christine Xu, Sreeraj Macha, Shuai Shao, Malidi Ahamadi

The American Association of Pharmaceutical Scientists Journal, Jun 2025

Abstract

Model-informed drug development (MIDD) methods play critical role to ensure development of efficacious, and safe individualized therapies. The application of artificial intelligence/machine learning (AI/ML) within the field of drug development has exponentially expanded. Integrating AI/ML into traditional pharmacometrics approaches or using AI/ML as a stand-alone tool has the potential to optimize dosing strategies, inform clinical trial designs, and enhance robustness of quantitative assessments of drug efficacy and safety.
This review systematically evaluates the impact of AI-based model-informed drug development (MIDD) methods compared to traditional approaches by blending regulatory perspectives. We conducted a systematic search on PubMed using five Medical Subject Headings (MeSH) terms and included 67 relevant studies in the analysis. The results indicate that AI models have the potential of improving MIDD approaches through different stages of drug development to inform decision-making in clinical trials. However, limitations such as the lack of standardized evaluation metrics and standardized regulatory guidelines on the use of AI-based MIDD methods were noted. Overall, this review highlights the potential applications of AI in drug development and provides a foundation for future research to optimize and integrate AI-based approaches in this field.

Deep learning model for personalized prediction of positive MRSA culture using time-series electronic health records

Masayuki Nigo, Laila Rasmy, Bingyu Mao, Bijun Sai Kannadath, Ziqian Xie, Degui Zhi

Nature Communications, Mar 2024

Abstract

Methicillin-resistant Staphylococcus aureus (MRSA) poses significant morbidity and mortality in hospitals. Rapid, accurate risk stratification of MRSA is crucial for optimizing antibiotic therapy. Our study introduced a deep learning model, PyTorch_EHR, which leverages electronic health record (EHR) time-series data, including wide-variety patient specific data, to predict MRSA culture positivity within two weeks. 8,164 MRSA and 22,393 non-MRSA patient events from Memorial Hermann Hospital System, Houston, Texas are used for model development. PyTorch_EHR outperforms logistic regression (LR) and light gradient boost machine (LGBM) models in accuracy (AUROC_PyTorch_EHR = 0.911, AUROC_LR = 0.857, AUROC_LGBM = 0.892). External validation with 393,713 patient events from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset in Boston confirms its superior accuracy (AUROC_PyTorch_EHR = 0.859, AUROC_LR = 0.816, AUROC_LGBM = 0.838). Our model effectively stratifies patients into high-, medium-, and low-risk categories, potentially optimizing antimicrobial therapy and reducing unnecessary MRSA-specific antimicrobials. This highlights the advantage of deep learning models in predicting MRSA positive cultures, surpassing traditional machine learning models and supporting clinicians’ judgments.

PK-RNN-V E: A Deep Learning Model Approach to Vancomycin Therapeutic Drug Monitoring Using Electronic Health Record Data

Masayuki Nigo, Hong Thoai Nga Tran, Ziqian Xie, Han Feng, Bingyu Mao, Laila Rasmy, Hongyu Miao, Degui Zhi

Journal of Biomedical Informatics, Aug 2022

Abstract

Vancomycin is a commonly used antimicrobial in hospitals, and therapeutic drug monitoring (TDM) is required to optimize its efficacy and avoid toxicities. Bayesian models are currently recommended to predict the antibiotic levels. These models, however, although using carefully designed lab observations, were often developed in limited patient populations. The increasing availability of electronic health record (EHR) data offers an opportunity to develop TDM models for real-world patient populations.
Here, we present a deep learning-based pharmacokinetic prediction model for vancomycin (PK-RNN-V E) using a large EHR dataset of 5,483 patients with 55,336 vancomycin administrations. PK-RNN-V E takes the patient’s real-time sparse and irregular observations and offers dynamic predictions. Our results show that RNN-PK-V E offers a root mean squared error (RMSE) of 5.39 and outperforms the traditional Bayesian model (VTDM model) with an RMSE of 6.29. We believe that PK-RNN-V E can provide a pharmacokinetic model for vancomycin and other antimicrobials that require TDM.

Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data

Laila Rasmy, Masayuki Nigo, Bijun Sai Kannadath, Ziqian Xie, Bingyu Mao, Khush Patel, Yujia Zhou, Wanheng Zhang, Angela Ross, Hua Xu, Degui Zhi

The Lancet Digital Health, Apr 2022

Abstract

Predicting outcomes of COVID-19 patients at an early stage is critical for optimized clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, based on the need for extensive data pre-processing and feature engineering, these models have not been validated or implemented outside of the original study site.
In this study, we developed recurrent neural network-based models (CovRNN) to predict the outcomes of patients with COVID-19 by use of available electronic health record data on admission to hospital, without the need for specific feature selection or missing data imputation. CovRNN was designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and prolonged hospital stay (>7 days). For in-hospital mortality and mechanical ventilation, CovRNN produced time-to-event risk scores (survival prediction; evaluated by the concordance index) and all-time risk scores (binary prediction; area under the receiver operating characteristic curve AUROC was the main metric); we only trained a binary classification model for prolonged hospital stay. For binary classification tasks, we compared CovRNN against traditional machine learning algorithms: logistic regression and light gradient boost machine. Model performance was evaluated in the multi-hospital test set.
CovRNN binary models achieved AUROCs of 93·0% (95% CI 92·6–93·4) for the prediction of in-hospital mortality, 92·9% (92·6–93·2) for the prediction of mechanical ventilation, and 86·5% (86·2–86·9) for the prediction of a prolonged hospital stay, outperforming light gradient boost machine and logistic regression algorithms. External validation confirmed AUROCs in similar ranges (91·3–97·0% for in-hospital mortality prediction, 91·5–96·0% for the prediction of mechanical ventilation, and 81·0–88·3% for the prediction of prolonged hospital stay). For survival prediction, CovRNN achieved a concordance index of 86·0% (95% CI 85·1–86·9) for in-hospital mortality and 92·6% (92·2–93·0) for mechanical ventilation.

In progress

A deep-learning-based two-compartment predictive model (PKRNN-2CM) for vancomycin therapeutic drug monitoring

Bingyu Mao, Ziqian Xie, Masayuki Nigo, Laila Rasmy, Degui Zhi

MedRxiv, Jan 2024

Abstract

Objective: Vancomycin is a widely used antibiotic that requires therapeutic drug monitoring (TDM) for optimized individual dosage. The deep learning-based model PKRNN-1CM has shown the advantage of leveraging time series electronic health record (EHR) data for individualized estimation of vancomycin pharmacokinetic (PK) parameters. While one-compartment (1CM) PK models are commonly used because of their simplicity and previous trough-based clinical practices for dose adjustment, the pre-deep learning literature suggests the superiority of two-compartment models (2CM). Motivated by this, we introduce a novel deep-learning-based approach, PKRNN-2CM, for vancomycin TDM.
Methods: PKRNN-2CM combines RNN-driven PK parameter estimation with a 2CM PK model to predict vancomycin concentration trajectories. Training on both simulated data and real-world EHR data allows for a comprehensive evaluation of its performance.
Results: Experiments based on simulated data highlight PKRNN-2CM's superiority over the simpler 1CM model PKRNN-1CM (PKRNN-2CM RMSE=1.30, PKRNN-1CM RMSE=2.50). Application to real data showcases significant improvement over PKRNN-1CM (PKRNN-2CM RMSE=5.62, PKRNN-1CM RMSE=5.84, two-sample unpaired t-test p-value=0.01), with potential further gains expected with non-trough level measurements.
Conclusion: PKRNN-2CM is an important improvement in vancomycin TDM, demonstrating enhanced accuracy and performance compared to the PKRNN-1CM model. This deep learning model holds potential for future individualized vancomycin TDM optimization and broader application in diverse clinical scenarios.

Disease Risk Prediction Using Structured EHR Data: Can Generalist Large Language Models Match Specialized Clinical Foundation Models? A Comparative Evaluation with Fine-Tuning

Bingyu Mao, Made K. Prasadha, Ziqian Xie, Jianping He, Michael Ghebranious, Hua Xu, Degui Zhi, Laila Rasmy

MedRxiv, May 2026

Abstract

Background: Electronic health records (EHRs) with clinical decision support tools are now ubiquitous in healthcare organizations. Clinical foundation models (CFMs) pretrained on large-scale, heterogeneous structured EHR data have emerged as a powerful approach to improve predictive performance and generalizability. Meanwhile, large language models (LLMs) pretrained on broad data sources are being applied to an expanding range of healthcare tasks. However, it remains unclear whether generalist LLMs can match specialized CFMs for disease risk prediction using structured clinical data.
Methods: We compared CFMs (Med-BERT, CLMBR) against fine-tuned generalist LLMs (Mistral, LLaMA-2/3/3.1), a clinical LLM (Me-LLaMA), and LLM-generated embeddings paired with simple classifiers (using DeepSeek, Qwen3, and GPT-OSS) on two disease risk prediction tasks: heart failure risk among diabetic patients (DHF) and pancreatic cancer diagnosis (PaCa). Evaluations spanned multi-site EHR data, claims data, and an open-source single-institution benchmark (EHRSHOT). Performance was assessed using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC).
Results: On larger EHR and claims cohorts (>30,000 patients), fine-tuned CFMs outperformed fine-tuned LLMs by a small but statistically significant margin (<1% AUROC). The clinical LLM performed comparably to generalist LLMs despite being smaller. On the open-source PaCa cohort (3,810 patients, 199 cases), LLMs achieved slightly higher AUROCs that were not statistically significant (LLaMA-3.1-70B 86.1% vs. Med-BERT 85.3%, p=0.27), but CFMs achieved significantly higher AUPRC (Med-BERT 55.9% vs. LLaMA-3.1-70B 41.1%, p=0.001). Notably, LLM-generated trajectory embeddings paired with logistic regression or a simple MLP, without any LLM fine-tuning, achieved the best overall performance, with AUROC exceeding 90% (Qwen3) and AUPRC reaching 66% (GPT-OSS 20B).
Conclusion: LLM-generated embeddings with lightweight classifiers outperformed both fine-tuned CFMs and fine-tuned LLMs on AUROC and AUPRC. While these results demonstrate the potential of generalist models to match or surpass specialized CFMs, their substantially greater computational cost and variable AUPRC performance in the fine-tuning setting warrant caution. We provide a reproducible evaluation framework and codebase to support continued benchmarking.