Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features

  • Sokaina EL KHAMLICHI Research Team in Science and Technology, Higher School of Technology of Laayoune, Ibn Zohr University, Quartier 25 Mars, P.O. Box 3007, Laayoune, Morocco; LyRICA: Laboratory of Research in Informatics, Data Sciences and Artificial Intelligence, School of Information Sciences, B.P. 6204, Rabat-Instituts, Rabat, Morocco
  • Loubna Taidi Laboratory of Innovative Technology, Faculty of Sciences and Technologies, Abdelmalek Essaadi University, Tangier, Morocco
Keywords: COVID-19, epidemiology, machine learning, threat Management, imbalanced dataset, RUS, SMOTE, ADASYN

Abstract

Identifying COVID-19 patients at high risk of fatality is critically important for healthcare professionals, as it supports informed decision-making and enhances the capacity to manage emerging crises within medical systems. Nevertheless, COVID-19 datasets are frequently highly imbalanced, with substantially fewer fatality cases presenting a challenge to the development of effective machine learning algorithms. This study aims to develop a high-performing machine learning approach to predict COVID-19 mortality using a Mexican epidemiological dataset. To tackle the class imbalance issue, numerous sampling techniques are applied, including SMOTE, SMOTE-ENN, ADASYN, SMOTE-Tomek, and Random Under-Sampling (RUS). Predictive models are created using several machine learning algorithms: Logistic Regression, Decision Tree, Gaussian Naïve Bayes, K-Nearest Neighbors, and Random Forest. Besides, we performed feature selection analysis using Shap technique to determine the main relevant attributes for predicting COVID-19 mortality. The results illustrate that Random Forest model, trained on balanced data with SMOTE-ENN technique yielded the best performance, with 89.44% accuracy, 87.88% Recall, and 88.74% ROC AUC score. Furthermore, feature selection analysis shows that Type of Patient, Age, Pneumonia, Intubation, having contact with COVID-19 infected patients are the key important attributes for predicting COVID-19 risk of fatality among hospitalized individuals.
Published
2025-05-28
How to Cite
EL KHAMLICHI, S., & Loubna Taidi. (2025). Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features. Statistics, Optimization & Information Computing, 14(2), 677-703. https://doi.org/10.19139/soic-2310-5070-2159
Section
Research Articles