Boosting Mixed-Effects Models with SMOTE: Insights from Java’s Human Development Index

Keywords: copula, human development index, optimization of mixed models, oversampling, unbalanced

Abstract

This study aims to evaluate the performance of various regression models on unbalanced and clustered data, using the 2018 Human Development Index (HDI) data of regencies in Java Island, Indonesia, as a case study. The models assessed include Linear Mixed Models (LMM), Generalized Estimating Equations (GEE), Mixed Effects Regression Trees (MERT), and Gaussian Copula Marginal Regression (GCMR). These models share a common foundation in incorporating random effects, allowing for a fair and systematic comparison. Model performance was evaluated using two key metrics: Median Absolute Error (MedAE) and Root Mean Square Error (RMSE), applied to both the original dataset and an oversampled version generated using the Synthetic Minority Oversampling Technique (SMOTE). The results indicate that applying SMOTE consistently improves model accuracy. MERT achieved the lowest MedAE across both datasets, demonstrating superior capability in minimizing median prediction errors. Meanwhile, GCMR yielded the best RMSE on the original data, highlighting its robustness in handling complex data structures without requiring oversampling. Residual analysis using boxplots further supports these findings, showing that SMOTE effectively reduces residual variability and enhances model stability. Among the evaluated models, MERT exhibited the most consistent performance overall. These findings underscore the utility of oversampling techniques such as SMOTE in improving regression model performance on unbalanced and hierarchically structured data. Furthermore, both MERT and GCMR are identified as strong candidates for such analytical scenarios, contributing valuable insights toward developing more robust and accurate predictive models in data science and applied statistics

Author Biographies

Anang Kurnia, IPB University
School of Data Science, Mathematics and Informatics
Khairil A. Notodiputro, IPB University
School of Data Science, Mathematics and Informatics
Indahwati, IPB University
School of Data Science, Mathematics and Informatics
Published
2025-10-30
How to Cite
Anggara, D., Kurnia, A., Notodiputro, K. A., & Indahwati, I. (2025). Boosting Mixed-Effects Models with SMOTE: Insights from Java’s Human Development Index. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3011
Section
Research Articles