A DATA-DRIVEN MACHINE LEARNING APPROACH FOR EARLY PREDICTION OF GESTATIONAL DIABETES MELLITUS

Authors

  • Noor Sami Razzaq Najjar The General Directorate of Education, the Ministry of Education of Iraq Najaf Author

DOI:

https://doi.org/10.61841/kmnfeh71

Keywords:

Gestational diabetes mellitus, machine learning, early prediction, gradient boosted trees, random forests, neural networks

Abstract

Gestational diabetes mellitus (GDM) is a common pregnancy complication that can have serious consequences for both the mother and the fetus if not detected early. In this study, we propose an machine learning to predict gestational diabetes mellitus using the patient's clinical characteristics. The dataset was preprocessed to handle missing values, normalize numerical variables, and identify relevant clinical features to improve predictive performance. The data were divided into two sets: a test set and a training set. SMOTE was applied only to the training set. Methods: We conducted a retrospective model development and internal validation study using 10,000 pregnancy records. Candidate predictors included age, pre-pregnancy body mass index (BMI), ethnicity, family history of diabetes, previous GDM Blood Pressure Previous Pregnancies Diet & Lifestyle Missing-data handling and categorical encoding were nested within stratified 10-fold cross-validation to prevent data leakage. Three algorithms were compared: gradient boosted trees (GBT), random forest (RF), and neural network (NN). The primary metric was area under the receiver operating characteristic curve (ROC-AUC); secondary metrics included precision, recall, specificity, F1-score, accuracy, and precision-recall AUC (PR-AUC).

Results: GBT achieved the highest overall discrimination, and tree-based models outperformed the neural network by a small but consistent margin in this dataset. GBT offered the most balanced overall performance profile. This study supports the feasibility of early GDM prediction from routinely available antenatal variables, although temporal or external validation is still required before clinical implementation.

The experimental results indicate that the proposed framework showed promising predictive performance across several evaluation metrics of accuracy, precision, F1-score, and recall. This approach provides an effective and scientific solution for predicting gestational diabetes and can support clinical decision-making systems.

Downloads

Published

2026-06-04

How to Cite

A DATA-DRIVEN MACHINE LEARNING APPROACH FOR EARLY PREDICTION OF GESTATIONAL DIABETES MELLITUS. (2026). Journal of Advance Research in Computer Science & Engineering (ISSN 2456-3552), 11(1), 1-7. https://doi.org/10.61841/kmnfeh71