Boosting Mixed-Effects Models with SMOTE: Insights from Java’s Human Development Index
				
										Keywords:
				
				
																		copula, 													human development index, 													optimization of mixed models, 													oversampling, 													unbalanced															
			
			
										Abstract
This study aims to evaluate the performance of various regression models on unbalanced and clustered data, using the 2018 Human Development Index (HDI) data of regencies in Java Island, Indonesia, as a case study. The models assessed include Linear Mixed Models (LMM), Generalized Estimating Equations (GEE), Mixed Effects Regression Trees (MERT), and Gaussian Copula Marginal Regression (GCMR). These models share a common foundation in incorporating random effects, allowing for a fair and systematic comparison. Model performance was evaluated using two key metrics: Median Absolute Error (MedAE) and Root Mean Square Error (RMSE), applied to both the original dataset and an oversampled version generated using the Synthetic Minority Oversampling Technique (SMOTE). The results indicate that applying SMOTE consistently improves model accuracy. MERT achieved the lowest MedAE across both datasets, demonstrating superior capability in minimizing median prediction errors. Meanwhile, GCMR yielded the best RMSE on the original data, highlighting its robustness in handling complex data structures without requiring oversampling. Residual analysis using boxplots further supports these findings, showing that SMOTE effectively reduces residual variability and enhances model stability. Among the evaluated models, MERT exhibited the most consistent performance overall. These findings underscore the utility of oversampling techniques such as SMOTE in improving regression model performance on unbalanced and hierarchically structured data. Furthermore, both MERT and GCMR are identified as strong candidates for such analytical scenarios, contributing valuable insights toward developing more robust and accurate predictive models in data science and applied statistics
						Published
					
					
						2025-10-30
					
				
							How to Cite
						
						Anggara, D., Kurnia, A., Notodiputro, K. A., & Indahwati, I. (2025). Boosting Mixed-Effects Models with SMOTE: Insights from Java’s Human Development Index. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3011
						Issue
					
					
				
							Section
						
						
							Research Articles
						
					
										Authors who publish with this journal agree to the following terms:
			
			
		- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
 - Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
 - Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).