Diabetes prediction based on Ensemble Methods

Jihan Askandar Mosa; Adnan Mohsin Abdulazeez

doi:10.19139/soic-2310-5070-2771

Diabetes prediction based on Ensemble Methods

Jihan Askandar Mosa Information Technology Management Dept., Technical College of Administration, Duhok Polytechnic University, Duhok, Iraq; Information Technology Dept., Shekhan Technical Institute, Duhok Polytechnic University, Duhok, Iraq
Adnan Mohsin Abdulazeez Technical College of Engineering, Duhok Polytechnic University, Duhok, Iraq https://orcid.org/0000-0002-4357-7331

DOI: https://doi.org/10.19139/soic-2310-5070-2771

Keywords: Diabetes Prediction, Ensemble Learning, Gradient Boosting, AdaBoost, XGBoost

Abstract

The incidence of diabetes, a chronic disease, is increasing worldwide, especially in low- and middle-income countries. To reduce complications and improve patient outcomes, early and accurate prediction is critical. Using two benchmark datasets, this test demonstrates an ensemble-based machine learning framework for diabetes prediction. Two ensemble strategies were evaluated using the Diabetes Prediction dataset and the Indian Diabetes Pima dataset: a sequential ensemble combining XGBoost, gradient boosting, and AdaBoost, and a parallel ensemble using a smooth voting classifier that encompassed logistic regression, decision tree, and K-Nearest Neighbors. forward feature selection strategies were used to find the most relevant predictors, improving model performance and generalizability. 70% of the data was used for training, 15% for validation, and 15% for testing. According to the experimental results, the sequential ensemble performed better on the Indian Pima dataset, achieving a training accuracy of 98.95%, a validation accuracy of 97.59%, and an F1 accuracy of 97.77%. This performance was better than the parallel ensemble, which achieved an F1 score of 96.62%, a validation accuracy of 96.38%, and a training accuracy of 98.16%. Overall, the sequential model outperformed both datasets, with the diabetes prediction dataset showing better performance than the parallel model. These results demonstrate how feature selection methods and boosting-based ensemble models can work together to create accurate and reliable medical prediction systems.

Published

2025-10-04

How to Cite

Mosa, J. A., & Abdulazeez, A. M. (2025). Diabetes prediction based on Ensemble Methods. Statistics, Optimization & Information Computing, 14(6), 3359-3379. https://doi.org/10.19139/soic-2310-5070-2771

Download Citation

Issue

Vol 14 No 6 (2025)

Section

Research Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).