Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features

Sokaina EL KHAMLICHI; Loubna Taidi

doi:10.19139/soic-2310-5070-2159

Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features

Sokaina EL KHAMLICHI Research Team in Science and Technology, Higher School of Technology of Laayoune, Ibn Zohr University, Quartier 25 Mars, P.O. Box 3007, Laayoune, Morocco; LyRICA: Laboratory of Research in Informatics, Data Sciences and Artificial Intelligence, School of Information Sciences, B.P. 6204, Rabat-Instituts, Rabat, Morocco
Loubna Taidi Laboratory of Innovative Technology, Faculty of Sciences and Technologies, Abdelmalek Essaadi University, Tangier, Morocco

DOI: https://doi.org/10.19139/soic-2310-5070-2159

Keywords: COVID-19, epidemiology, machine learning, threat Management, imbalanced dataset, RUS, SMOTE, ADASYN

Abstract

Identifying COVID-19 patients at high risk of fatality is critically important for healthcare professionals, as it supports informed decision-making and enhances the capacity to manage emerging crises within medical systems. Nevertheless, COVID-19 datasets are frequently highly imbalanced, with substantially fewer fatality cases presenting a challenge to the development of effective machine learning algorithms. This study aims to develop a high-performing machine learning approach to predict COVID-19 mortality using a Mexican epidemiological dataset. To tackle the class imbalance issue, numerous sampling techniques are applied, including SMOTE, SMOTE-ENN, ADASYN, SMOTE-Tomek, and Random Under-Sampling (RUS). Predictive models are created using several machine learning algorithms: Logistic Regression, Decision Tree, Gaussian Naïve Bayes, K-Nearest Neighbors, and Random Forest. Besides, we performed feature selection analysis using Shap technique to determine the main relevant attributes for predicting COVID-19 mortality. The results illustrate that Random Forest model, trained on balanced data with SMOTE-ENN technique yielded the best performance, with 89.44% accuracy, 87.88% Recall, and 88.74% ROC AUC score. Furthermore, feature selection analysis shows that Type of Patient, Age, Pneumonia, Intubation, having contact with COVID-19 infected patients are the key important attributes for predicting COVID-19 risk of fatality among hospitalized individuals.

Published

2025-05-28

How to Cite

EL KHAMLICHI, S., & Loubna Taidi. (2025). Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features. Statistics, Optimization & Information Computing, 14(2), 677-703. https://doi.org/10.19139/soic-2310-5070-2159

Download Citation

Issue

Vol 14 No 2 (2025)

Section

Research Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features

Abstract

Most read articles by the same author(s)