Statistics, Optimization & Information Computing
http://47.88.85.238/index.php/soic
<p><em><strong>Statistics, Optimization and Information Computing</strong></em> (SOIC) is an international refereed journal dedicated to the latest advancement of statistics, optimization and applications in information sciences. Topics of interest are (but not limited to): </p> <p>Statistical theory and applications</p> <ul> <li class="show">Statistical computing, Simulation and Monte Carlo methods, Bootstrap, Resampling methods, Spatial Statistics, Survival Analysis, Nonparametric and semiparametric methods, Asymptotics, Bayesian inference and Bayesian optimization</li> <li class="show">Stochastic processes, Probability, Statistics and applications</li> <li class="show">Statistical methods and modeling in life sciences including biomedical sciences, environmental sciences and agriculture</li> <li class="show">Decision Theory, Time series analysis, High-dimensional multivariate integrals, statistical analysis in market, business, finance, insurance, economic and social science, etc</li> </ul> <p> Optimization methods and applications</p> <ul> <li class="show">Linear and nonlinear optimization</li> <li class="show">Stochastic optimization, Statistical optimization and Markov-chain etc.</li> <li class="show">Game theory, Network optimization and combinatorial optimization</li> <li class="show">Variational analysis, Convex optimization and nonsmooth optimization</li> <li class="show">Global optimization and semidefinite programming </li> <li class="show">Complementarity problems and variational inequalities</li> <li class="show"><span lang="EN-US">Optimal control: theory and applications</span></li> <li class="show">Operations research, Optimization and applications in management science and engineering</li> </ul> <p>Information computing and machine intelligence</p> <ul> <li class="show">Machine learning, Statistical learning, Deep learning</li> <li class="show">Artificial intelligence, Intelligence computation, Intelligent control and optimization</li> <li class="show">Data mining, Data analysis, Cluster computing, Classification</li> <li class="show">Pattern recognition, Computer vision</li> <li class="show">Compressive sensing and sparse reconstruction</li> <li class="show">Signal and image processing, Medical imaging and analysis, Inverse problem and imaging sciences</li> <li class="show">Genetic algorithm, Natural language processing, Expert systems, Robotics, Information retrieval and computing</li> <li class="show">Numerical analysis and algorithms with applications in computer science and engineering</li> </ul>International Academic Pressen-USStatistics, Optimization & Information Computing2311-004X<span>Authors who publish with this journal agree to the following terms:</span><br /><br /><ol type="a"><ol type="a"><li>Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a <a href="http://creativecommons.org/licenses/by/3.0/" target="_new">Creative Commons Attribution License</a> that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.</li><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol></ol>Convergence of the error in Hanafi-Wold's procedure on the PLS-PM task
http://47.88.85.238/index.php/soic/article/view/2223
<p>Partial least squares path modeling is a statistical method that facilitates examining intricate dependence relationships among various blocks of observed variables, each characterized by a latent variable. The computation of latent variable scores is a pivotal step in this method and it is accomplished through an iterative procedure. Within this paper, we investigate and tackle convergence challenges related to Hanafi-Wold's procedure in computing components for the PLS-PM algorithm. Hanafi-Wold's procedure, as well as alternative procedure, demonstrate the property of monotone convergence when mode B is considered for all blocks combined with centroid or factorial schemes. However, the absence of proof regarding the convergence of the error towards zero in Hanafi-Wold's procedure is a limitation compared to alternative procedure, which possesses this convergence property. Therefore, this paper aims to establish the convergence of the error towards zero in Hanafi-Wold’s procedure.</p>Abderrahim SahliZouhair El Hadri Mohamed Hanafi
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-172025-05-1714246948510.19139/soic-2310-5070-2223The New Topp-Leone Exponentiated Half Logistic-Gompertz-G Family of Distributions with Applications
http://47.88.85.238/index.php/soic/article/view/2238
<p>This research introduces a new family of distributions (FoD) titled the Topp-Leone Exponentiated-Half-Logistic-Gompertz-G (TL-EHL-Gom-G) distribution. The study explores a variety of statistical properties of the developed family, such as the quantile function, series expansion, order statistics, entropy, stochastic orders and moments. Through Monte Carlo simulations, various estimation techniques were compared, including the least squares (LS), Anderson Darling (AD), maximum likelihood (ML) and Cram\'er-von-Mises (CVM) methods via root mean square error (RMSE) and average bias (Abias). The results indicated that the ML estimation method performed better than other methods, hence, the selection for estimating the model parameters. To showcase the usefulness, robustness and applicability of the model, we applied it to three real-life data, including dataset with censored observations. The TL-EHL-Gom-W distribution, a special case of the TL-EHL-Gom-G FoD showed superiority over nested and non-nested models.</p>Wellington CharumbiraBroderick OluyedeFastel Chipepa
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-102025-06-1014248651410.19139/soic-2310-5070-2238Application of Ujlayan-Dixit Fractional Exponential Probability Distribution
http://47.88.85.238/index.php/soic/article/view/2346
<p>This research aims to present the probability density functions of random variables for exponential distribution by applying a new technique of the Ujlayan-Dixit (UD) fractionalde rivative and to find some basic concepts related to probability distributions of random variables, which are density, cumulative distribution, survival, and hazard functions. In addition, we provide the UD fractional isotopes with the expected values, rth moments, rth central moments, mean, variace, skewness, and kurtosis. Finally, we give the UD fractional analogs to some entropy measures such as Shannon, Renyi, and Tsallis entropy.</p>Iqbal H. JebrilRaed HatamlehIqbal BatihaNadia Allouch
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-032025-06-0314251552510.19139/soic-2310-5070-2346Unit New Half Logistic Distribution: Theory, Estimation, and Applications with Novel Regression Analysis
http://47.88.85.238/index.php/soic/article/view/2360
<p>In this paper, a new unit distribution is introduced. Some statistical properties of the proposed distribution are analyzed, including moments, Bonferroni and Lorenz curves, etc. Six estimation methods are investigated to estimate the three parameters of the proposed distribution. The performance of these estimators is compared using bias, mean square error, average absolute bias, and mean relative error using the Monte Carlo simulation. In addition, some real data analysis is performed using data sets on the amount of water from the California Shasta reservoir, the average failure times of a fleet of air conditioning systems, and skewed to-right data. A novel regression analysis is proposed based on the new distribution. A practical example illustrates its effectiveness and applicability compared to existing methods, including the beta, Kumaraswamy, and log-extended exponential geometric regression analyses.</p>Kadir KarakayaŞule Sağlam
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-152025-05-1514252654010.19139/soic-2310-5070-2360Modelling Labor Market Access Among Vocational Education Graduates in Indonesia: A Heckman Selection Approach
http://47.88.85.238/index.php/soic/article/view/2510
<p style="font-weight: 400;">Understanding the labour market outcomes of vocational education graduates is essential for evaluating the effectiveness of mid-level tertiary education. This study uses the Heckman two-step selection model to estimate the returns on vocational education for graduates of Diploma 4 (D1-D4) programs using nationally representative data from Indonesia’s 2023 National Labor Force Survey (SAKERNAS). In the first stage, a probit model estimates the probability of labour market participation as a function of educational attainment, demographic characteristics, household factors, and urban residence. The second stage estimates the determinants of wages, conditional on employment, with the Inverse Mills Ratio (IMR) from the first stage included to correct for selection bias. Results indicate that higher vocational qualifications significantly increase both the likelihood of employment and monthly wages. However, female graduates and rural residents face lower employment probabilities and earnings. This study contributes methodologically by demonstrating how selection-corrected statistical modelling can improve the accuracy of returns to education estimates, particularly for vocational education. Substantively, the findings offer policy-relevant insights into the evolving role of vocational education in promoting economic inclusion in Indonesia's post-secondary education and labour market systems.</p>FachrizalMia Rahma RomadonaSuryadi SuryadiAndi BudiansyahSyahrizal MaulanaMaulana AkbarRatna Sri HarjantiArif Hidayat
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-102025-06-1014254155510.19139/soic-2310-5070-2510A New Accelerated Failure Time Model with Censored and Uncensored Real-life Applications: Validation and Different Estimation Methods
http://47.88.85.238/index.php/soic/article/view/2594
<p><span class="fontstyle0">This study introduces a novel exponential accelerated failure time (AFT) model, detailing its fundamental properties and characterizations. To evaluate the performance of various estimation techniques, we conduct simulation studies that assess the finite-sample behavior of the estimators. Additionally, we propose a modified chi-square goodnessof-fit test tailored for the new model, applicable to both complete and right-censored datasets. The model’s validity is examined using the theoretical framework of the Nikulin-Rao-Robson (NRR) statistic, with maximum likelihood estimation employed for parameter estimation. Two separate simulation studies are carried out: one to evaluate the proposed AFT model and another to assess the efficacy of the NRR test statistic. Furthermore, the practical applicability of the test statistic is demonstrated through analyses of three real-life datasets.</span></p>Mohamed IbrahimHafida GoualKhaoula Kaouter MeriboutAbdullah H. Al-NefaieAhmad M. AboAlkhairHaitham M. Yousof
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-162025-05-1614255658310.19139/soic-2310-5070-2594Optimal design of life testing plans under Type II hybrid censoring scheme with random sample size
http://47.88.85.238/index.php/soic/article/view/1891
<p style="-qt-block-indent: 0; text-indent: 0px; -qt-user-state: 0; margin: 0px;">Hybrid censoring is a combination of Type I and Type II censoring schemes, which is divided into two types, Type I and Type II hybrid censoring schemes. One practical problem in the discussion of censoring is choosing the best censoring scheme. Towards this end, different criteria can be considered, each of them may lead to a different result. One of the most important criteria is the cost of the experiment. In this article, considering a cost function as an optimization criterion in Type II hybrid censoring, the optimal censoring scheme is determined. Here, the sample size is considered as a fixed value as well as a random variable from the power series distribution, and the optimal scheme of censoring is determined so that the cost function does not exceed a pre-determined value. Numerical computation as well as a simulation study are presented for illustrating the results.</p>Elham BasiriElham Hosseinzadeh
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-032025-06-0314258460110.19139/soic-2310-5070-1891Optimal Search for a lost moving target whose truncated Markov Chain with continuous time
http://47.88.85.238/index.php/soic/article/view/2177
<p>The main contribution of this paper center a round searching for a lost target which moves among finite number of cells according to truncated Markov chain with continuous time , the searcher distributes the effort among the states and the purpose is to minimize the search effort and the probability of undetection at the same time.</p> M. M. El-GhoulAbd-Elmoneim A. M. Teamah
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-102025-06-1014260261010.19139/soic-2310-5070-2177Optimizing cell load regulation capability in dynamic cell manufacturing systems
http://47.88.85.238/index.php/soic/article/view/2367
<p>Variation in production cell load arises from machine loads exceeding their capacity and the constraints of cellular capacity. This issue has become increasingly critical in scheduling cellular manufacturing systems. In this paper, we propose a novel approach for scheduling in dynamic cellular manufacturing systems. The objective is to minimize cell load variations and associated costs while achieving a balance between internal manufacturing and subcontracting.</p> <p>To address this, we developed a mixed-integer linear programming (MILP) mathematical model, which was solved using LINGO 19.0 software. The model focuses on reducing cell load variation, minimizing associated costs, and optimizing the balance between internal production and subcontracting. Extensive computational experiments use medium-scale problem instances with randomly generated demand scenarios.</p> <p>The results demonstrate the effectiveness of the proposed model in generating optimal solutions, significantly reducing cell load variation and related costs. Furthermore, computational efficiency is notable, with solutions obtained in very low processing times. This underscores the model's practical applicability and robustness in addressing real-world scheduling challenges in cellular manufacturing systems.</p>YAO K. AdrienKONE OumarEDI K. Hilaire TAKOUDA P. L. Matthias
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-082025-05-0814261162710.19139/soic-2310-5070-2367A Decoupled PI Design for a Second-Order Model of a Magnetic Levitation System
http://47.88.85.238/index.php/soic/article/view/2390
<p>This paper presents a decoupled control strategy for stabilizing a second-order magnetic levitation system (MLS) based on a nonlinear feedback linearization approach combined with a proportional-integral (PI) controller. The proposed methodology transforms the nonlinear dynamics of the MLS into an equivalent linear system using an exact feedback linearization scheme, enabling the application of classical control techniques. Stability of the closed-loop system is formally demonstrated through a Lyapunov-based analysis, ensuring asymptotic convergence to the equilibrium point. The controller structure permits flexible tuning of gains to shape the system’s dynamic response, achieving both critically damped and underdamped behaviors. The performance of the control scheme is validated through extensive numerical simulations under varying gain configurations, demonstrating fast convergence and high precision in position tracking. While the study assumes ideal conditions, the findings provide a foundation for future developments that address robustness against model uncertainties, disturbances, and practical implementation constraints. Overall, this work contributes a theoretically grounded and computationally efficient control design for MLS applications.</p>Oscar Danilo Montoya GiraldoWalter Julián Gil-GonzálezAndrés Leonardo Jutinico-Alarcón
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-262025-05-2614262863510.19139/soic-2310-5070-2390Optimizing Energy Management in AC Microgrids: A Comparative Study of Metaheuristic Algorithms for Minimizing Energy Losses and ${CO}_2$ Emissions
http://47.88.85.238/index.php/soic/article/view/2455
<p>This study tackles the energy management problem for wind distributed generators in AC microgrids (MGs) operating in both connected and isolated modes. A mathematical formulation is proposed to minimize energy losses and $CO$\(_2\) emissions, incorporating technical and regulatory constraints to reflect real-world MG operations. The solution methodology combines the Population-Based Genetic Algorithm (PGA) with an hourly power flow analysis based on the successive approximation (SA) method. To validate the proposed approach, a comprehensive comparison is conducted against three widely used metaheuristic algorithms: Particle Swarm Optimization (PSO), JAYA, and the Generalized Normal Distribution Optimizer (GNDO). Employing a rigorous statistical framework, including ANOVA and Tukey HSD tests, the algorithms' performance is evaluated through 100 independent runs per objective and configuration, using a 33-node AC microgrid with variable generation and demand as the test scenario. Results demonstrate that PGA consistently outperforms other algorithms, achieving lower mean values and variance in both energy loss and emission minimization. GNDO, by contrast, shows higher variability and less effective optimization. This work not only underscores the robustness and adaptability of PGA for sustainable microgrid management but also establishes a standardized framework for evaluating optimization algorithms in energy systems.</p>Héctor Pinto VegaLuis Fernando Grisales-NoreñaVanessa Botero-Gómez
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-282025-05-2814263666210.19139/soic-2310-5070-2455Dynamic Portfolio Optimisation in Morocco’s Stock Market through Machine-Learning Selection and Complex Mean-Variance Allocation
http://47.88.85.238/index.php/soic/article/view/2529
<p>Emerging markets, such as Morocco’s stock exchange, face challenges including low liquidity, sectoral concentration, and economic sensitivity, which render traditional portfolio optimization methods inadequate. This study introduces a hybrid framework that integrates machine learning (ML) for stock selection with a novel Mean Variance Complex-Based (MVCB) optimization to enhance performance in the Moroccan All Shares Index (MASI). Four ML models named Stepwise Regression, Random Forest, Generalized Boosted Regression, and XGBoost are used to predict returns based on fundamental and technical indicators, with XGBoost achieving superior accuracy. The MVCB method leverages complex returns derived from the Hilbert Transform, capturing dynamic market correlations and phase-amplitude relationships to optimize weights under volatility. Backtesting reveals that the MVCB portfolio outperforms traditional mean-variance (MV) and market benchmarks, yielding a 10.48\% annual return with 3.52\% volatility and a Sharpe ratio of 2.48 (compared to 1.12 for MASI). Sector diversification and reduced left-tail risk (19.3\%) mitigate crisis-driven correlation breakdowns. By synergizing predictive ML with adaptive optimization, this framework addresses instability in emerging markets, offering a robust, scalable solution for risk-adjusted returns. The results highlight the viability of data-driven strategies in volatile, resource-constrained environments.</p>Achraf BOUHMADYHamza KadiriKHALID belkhoutoutNadia RAISSI
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-282025-05-2814266367610.19139/soic-2310-5070-2529Machine Learning Models for Predicting COVID-19 Mortality Using Epidemiological Features
http://47.88.85.238/index.php/soic/article/view/2159
<p>Identifying COVID-19 patients at high risk of fatality is critically important for healthcare professionals, as it supports informed decision-making and enhances the capacity to manage emerging crises within medical systems. Nevertheless, COVID-19 datasets are frequently highly imbalanced, with substantially fewer fatality cases presenting a challenge to the development of effective machine learning algorithms. This study aims to develop a high-performing machine learning approach to predict COVID-19 mortality using a Mexican epidemiological dataset. To tackle the class imbalance issue, numerous sampling techniques are applied, including SMOTE, SMOTE-ENN, ADASYN, SMOTE-Tomek, and Random Under-Sampling (RUS). Predictive models are created using several machine learning algorithms: Logistic Regression, Decision Tree, Gaussian Naïve Bayes, K-Nearest Neighbors, and Random Forest. Besides, we performed feature selection analysis using Shap technique to determine the main relevant attributes for predicting COVID-19 mortality. The results illustrate that Random Forest model, trained on balanced data with SMOTE-ENN technique yielded the best performance, with 89.44% accuracy, 87.88% Recall, and 88.74% ROC AUC score. Furthermore, feature selection analysis shows that Type of Patient, Age, Pneumonia, Intubation, having contact with COVID-19 infected patients are the key important attributes for predicting COVID-19 risk of fatality among hospitalized individuals.</p>Sokaina EL KHAMLICHILoubna Taidi
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-282025-05-2814267770310.19139/soic-2310-5070-2159Elevating E-commerce Customer Experience: A Machine Learning-Driven Recommendation System
http://47.88.85.238/index.php/soic/article/view/2181
<p>In the era of e-commerce, providing an exceptional customer experience is pivotal for online businesses. This paper introduces a comprehensive machine learning-based recommendation system meticulously crafted to enhance the customer experience on e-commerce platforms. Our system employs a multifaceted approach, incorporating product popularity analysis, model-based collaborative filtering, and textual clustering, to address a spectrum of user profiles and business contexts. It excels in delivering personalized product recommendations, effectively tackling the challenges associated with attracting and retaining new customers, as well as guiding businesses in their nascent stages of online presence. By harnessing diverse methodologies, this system not only optimizes the customer journey but also offers a versatile framework for future research endeavors aimed at continuously refining and adapting to the dynamic e-commerce landscape.</p>Raouya El YoubiFayçal MessaoudiManal LoukiliMohammed El Ghazi
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-262025-05-2614270471710.19139/soic-2310-5070-2181Application of Rainbow Vertex Antimagic Coloring in Multi-Step Time Series Forecasting for Efficient Railway Passenger Load Management
http://47.88.85.238/index.php/soic/article/view/2214
<p>Let $G$ be a simple graph and connected. If there is a bijection function $f:E(G)\to\{1,2,\cdots,|E(G)|\}$ and the rainbow vertex antimagic coloring is under the condition all internal vertices of a path $x-y$ for any two vertices $x$ and $y$ have different weight $w(x)$, where $w(x) = \Sigma_{xx' \in E(G)}f(xx')$. The least number of colors used among all rainbow colorings produced by rainbow vertex antimagic labelings of a graph $G$ is the rainbow vertex antimagic connection number, $rvac(G)$. Our goal in this study is to prove some theorems related to $rvac(G)$. Furthermore, we apply RVAC as an administrative operator that controls passenger load anomalies at stations. This control uses spatio temporal multivariate time series Graph Neural Network (GNN) forecasting. Based on the results, we found that the metric evaluation of our GNN outperformed other models such as HA, ARIMA, SVR, GCN and GRU.</p>DafikElsa Yuli KurniawatiIka Hesti AgustinArika Indah KristianaRobiatul AdawiyahM Venkatachalam
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-062025-05-0614271873510.19139/soic-2310-5070-2214Improving Heart Disease Prediction Accuracy through Machine Learning Algorithms
http://47.88.85.238/index.php/soic/article/view/2319
<p>This study explores the application of a range of machine learning and deep learning techniques for predicting cardiovascular diseases. Various models, including Random Forest, Logistic Regression, Gradient Boosting, AdaBoost, Support Vector Machine (SVM), XGBoost, and both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks are evaluated. A comprehensive evaluation is conducted by considering supplementary metrics, refining hyperparameter tuning, assessing feature importance using SHAP, comparing traditional machine learning with deep learning approaches, and examining the clinical relevance. It concludes that XGBoost achieves the highest accuracy (88%), and notes that CNN and LSTM may prove beneficial with larger datasets. Moreover, the study investigates the practical applications of these models, focusing on their potential integration into clinical decision support systems.</p>Hussam ElbehieryMoshira A. EbrahimMohamed EassaAhmed Abdelhafeez Aya OmarHadeer Mahmoud
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-142025-05-1414273675510.19139/soic-2310-5070-2319A Machine Learning Framework for Discriminating between ChatGPT and Web Search Results
http://47.88.85.238/index.php/soic/article/view/2338
<p>ChatGPT is a large language model built by OpeanAI. It is based on an architecture called the Generative pre-trained transformer (GPT). It can generate text that appears to be written by a human and understands natural language questions. We want to investigate whether we can distinguish between query results from web search and ChatGPT by utilizing Machine learning (ML). To accomplish the investigation this research trains five different Machine learning (ML) methods on a balanced dataset containing 2010 samples of query results from ChatGPT and web search. These ML models are Random forest (RF), Naive Bayes (NB), Decision tree (DT), Support vector machine (SVM), and Logistic regression (LR). Each of these methods is experimented with two feature optimization techniques namely LDA and PCA. After analyzing the results of all experiments, it is determined that the combination of NB with LDA yields the highest accuracy of 99.75%. Besides this technique also identifies ChatGPT-generated and human-written text with an accuracy of 98.67 from an existing dataset, and this outcome outperforms the state-of-the-art (SOTA) techniques. However, the proposed intelligent approach will help to identify any text of ChatGPT.</p>Md. Sadiq IqbalMohammod Abul KashemMohammad Asaduzzaman Chowdhury
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-132025-05-1314275676910.19139/soic-2310-5070-2338Enhancing Prostate Cancer Risk Prediction Using a Hybrid Near Sets and Soft Sets Model: A Novel Approach for Improved Patient Care
http://47.88.85.238/index.php/soic/article/view/2382
<p>Prostate cancer is a major health concern, and accurate risk prediction is essential for effective treatment. This paper presents a novel hybrid model combining near sets and soft sets to enhance prostate cancer risk assessment. By integrating artificial intelligence with medical data, our model captures uncertainties and provides more precise, personalized risk evaluations. Experiments focusing on key clinical factors, such as age and PSA levels, demonstrate significant improvements in early detection and treatment decisions. This research highlights the potential of hybrid AI models to improve patient care and outcomes in oncology.</p>Amr H. AbdelhaliemNoha MM. Abdelnapi Mohammed A. Atiea
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-062025-05-0614277078810.19139/soic-2310-5070-2382 Improving spectral segmentation of 3D meshes using face patches
http://47.88.85.238/index.php/soic/article/view/2515
<p>A huge amount of research work has been devoted in recent years to segmentation of 3D meshes composed of planar triangular faces. In particular, spectral segmentation has had a fair share of this work because it is extremely faster than other segmentation techniques, especially those based on AI and machine learning. However, existing spectral segmentation techniques suffer from complex processing and heavy computation due to dealing directly with these faces. The present article is an attempt to address this issue by proposing an effective technique based on grouping the faces skillfully into higher-level structures called patches. Specifically, each patch is made of two neighbor faces, effectively cutting the number of low level structures processed by the segmentation technique into almost half. However, since the constituent mesh structures have changed from face to patch, the normal spectral segmentation methodology is altered to suit the new geometry. This alteration is reflected on the number of elements of both the eigenvectors and weight matrix, both reduced almost by 50\%. We have validated the proposed technique by segmenting numerous 3D meshes from public repositories. The resulting segments are colored in order to distinguish visually between different parts of the same 3D mesh. The experimental results indicate, both visually and quantitatively, that the proposed technique matches the performance of the best state-of-the-art methodologies, but at about half the time and space cost.</p>Fatma KhairyMohamed H. MousaHamed Nassar
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-05-282025-05-2814278980710.19139/soic-2310-5070-2515An Efficient Machine Learning Framework for Disease Gene Prediction in Parkinson’s Disease and Bladder Cancer
http://47.88.85.238/index.php/soic/article/view/2517
<p>Machine learning (ML) has been increasingly used in disease prediction, leveraging both phenotype and genotype data. However, genotype data have received comparatively less attention due to limited availability, whereas phenotype data have been more extensively studied. While breast cancer research is abundant, studies on other cancers, such as bladder cancer, and neurological diseases like Parkinson’s disease, remain limited. High-dimensional datasets pose challenges, including lengthy processing times, overfitting, an excess of features, and difficulties in classification. This study introduces a framework that integrates phenotype and genotype data for cancer prediction, aiming for high accuracy with a minimal number of relevant features. The framework consists of three main procedures: feature selection (FS), cancer prediction (CP), and identification of cancer-associated genes/features (CAG/F). FS employs a hybrid LEDF approach, combining the empirical distribution function (EDF) with three embedded methods: lasso regression selection (LRS), ridge regression selection (RRS), and random forest selection (RFS). EDF acts as a resampling tool with external (EEDF) and internal (IEDF) components that merge as E/IEDF. Features are selected based on classification accuracy using both union and intersection methods. CP applies multiple ML models with cross-validation to enhance prediction accuracy. Lastly, CAG/F identifies cancer-associated genes/features following the FS and CP steps. The algorithms E/IEDF-RFS, E/IEDF-LRS, and E/IEDF-RRS demonstrated excellent performance for RNA gene and dermatology datasets, achieving 100\% accuracy. E/IEDF-RFS reached 94.58\% accuracy for Parkinson’s Disease2, while EEDF-LRS performed best for DNA data with 94.85\% accuracy. E/IEDF-RRS showed 96.43\% accuracy for Parkinson’s Disease1 using RF classifiers, and IED-RFS and E/IEDF-LRS achieved 98.42\% accuracy for the BreastEw dataset.</p>Noura AbdulwahedGh.S. El-TawelM. A. Makhlouf
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-102025-06-1014280885010.19139/soic-2310-5070-2517A mathematical model for analyzing the transmission and control of COVID-19 pandemic with efficient interventions
http://47.88.85.238/index.php/soic/article/view/2117
<p>The COVID-19 pandemic caused by SARS-CoV-2 continues to pose a significant global threat. Mathematical modeling offers a valuable tool for understanding transmission dynamics and designing control strategies. The recent emergence of new variants of the disease and the incidents of infections on people who had been previously recovered from the disease has necessitated the need to study the control of the transmission of the disease in the face of re-infection. Thus, this study presents a novel compartmental model for the COVID-19 epidemic that incorporates the possibility of re-infection. The model captures the transition of individuals through susceptible, quarantined, exposed, infected, treated, and recovered compartments. Various mathematical analysis of the model were presented to provide the reading audience with vital information on the disease dynamics. Afterwards, we employed optimal control theory to identify interventions that minimize the infected population while considering the associated costs. By applying Pontryagin's Maximum Principle, we determine the optimal control strategies for these interventions. This framework allows us to evaluate the effectiveness of various control measures, such as minimal contact, early therapeutic treatment of infected individual, quarantine and vaccination, in mitigating the epidemic while accounting for the possibility of re-infection. Our findings can inform public health decision-making as they provide insights into the most effective strategies for controlling the transmission of COVID-19 in the presence of re-infection.</p>Olawale Joshua AdelekeAdebayo O. AdeniranMichael O. OlusanyaDimpho Mothibi
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-122025-06-1214285187210.19139/soic-2310-5070-2117Data to Decisions: leveraging penalized maximum likelihood estimation in agriculture
http://47.88.85.238/index.php/soic/article/view/2285
<p>Modelling agricultural participation is crucial for a comprehensive understanding of the agricultural sector, particularly in least developed economies where a large proportion of households rely on smallholder farming for their livelihood. Due to its importance, many households participate in agriculture in various ways along the stages of the value chain, and success often relies on several determinants. Climate change, rising input costs, alternative livelihoods, and changes in labour availability rank among the common predictors. This study utilises data from the household budget survey to explore these dynamics through Penalized Maximum Likelihood Estimation. We approach agricultural participation by examining various dimensions of agricultural output, which include output sold and household consumption, as well as production for processing and livestock feed consumption. Additionally, we consider factors such as land acquisition and farm asset ownership to provide a more nuanced understanding of participation. Our findings highlight the heterogeneity of agricultural participation, which is shaped by geographic location, household size, and income level. Importantly, the influence of these variables on participation evolves over time and differs across various forms of engagement, underscoring the need for tailored interventions to foster agricultural involvement. This holistic perspective reveals not only the multifaceted nature of agricultural participation but also the potential for diverse strategies to enhance engagement in the sector.</p>Katiso RamaleboRetius ChifuriraTemesgen ZewotirKnowledge Chinhamu
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-07-232025-07-2314287390310.19139/soic-2310-5070-2285Advanced Emotion Recognition: A Heuristic Approach Applied to EEG Signals Using Machine Learning
http://47.88.85.238/index.php/soic/article/view/2211
<p>Emotion analysis through electroencephalographic (EEG) signals has become a prominent research focus due to its applications in fields such as marketing, education, and mental health. Despite numerous methods available for emotion recognition, there remains a lack of robust metrics to validate the accuracy of these analyses against the actual emotional states. This study presents a novel heuristic approach for emotion analysis using EEG signals, employing an advanced algorithm that enhances the normalization of Valence and Arousal values through the Emotiv Epoc+ device. The algorithm not only refines these critical variables but also incorporates context-specific adjustments within an improved database schema, allowing for a more adaptive and precise evaluation of emotional states. Comparisons were made with the Self-Assessment Manikin (SAM) test, a validated tool in psychology, to verify the physiological responses recorded by the EEG signals. Initial findings demonstrated an accuracy of 76.47%, which increased to 79.45% after implementing the proposed enhancements, validated using the DBSCAN clustering algorithm. This study effectively demonstrates the algorithm’s capacity to classify emotional states in a sample of 15 participants aged 16 to 25 years, highlighting the potential of this heuristic approach in enhancing the reliability and applicability of EEG-based emotion recognition. The proposed methodology not only improves the accuracy of emotion detection but also establishes a foundation for integrating specific contextual factors into EEG analysis, thereby expanding its application in brain-computer interfaces, mental health monitoring, and other advanced research areas. These findings underscore the value of combining physiological data with validated psychological assessments, offering a significant advancement in the field of emotion recognition.</p>Nancy Gelvez Carlos MotenegroPaulo Gaona
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-07-172025-07-1714290492210.19139/soic-2310-5070-2211Improving nonlinear regression model estimation based on Coati Optimization Algorithm
http://47.88.85.238/index.php/soic/article/view/2563
<p>The mathematical and social sciences together with engineering fields use Non-Linear Regression analysis as one of their primary techniques. Controls and modeling of Non-Linear systems rely heavily on parameters estimation as a crucial problem. This paper presents a brief examination of this issue and develops an effective COA algorithm for parameter estimation accuracy enhancement of six Non-Linear Regression models (Negative exponential model, Model based on logistics, <em>Chwirut1</em> model , Hougen-Watson model, Dan Wood model , and Sigmoid model). Simulation tests showed that the Maximum Likelihood Estimation (MLE) method using the Coyote Optimization Algorithm (COA) achieved the best performance when selecting among different methods along with different samples sizes and the mean squared error criterion.</p> <p> </p>Omar Salim IbrahimZakariya Yahya Algamal
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-07-042025-07-0414292393610.19139/soic-2310-5070-2563A Control Chart Approach to Monitor and Improve Production Processes: Maximum Exponentially Weighted Moving Average using Auxiliary Variable (AV) and Multiple Measurement (ME)
http://47.88.85.238/index.php/soic/article/view/2641
<p>Control charts are fundamental tools in Statistical Process Control (SPC), employed to monitor and improve production processes. This study aims to evaluate the performance of the Maximum Exponentially Weighted Moving Average (Max-EWMA) chart by incorporating Auxiliary Variables (AV) and Multiple Measurements (ME). This study also evaluating the covariate method, a multiple measurement framework, and scenarios involving linearly increasing variance. The evaluation focuses on the analysis of both Type I and Type II error rates, using simulation-based methodologies. The results indicate that the Multiple Measurement approach consistently outperforms the Covariate method, exhibiting lower Type II error rates and higher robustness in detecting process shifts. An increase in parameter A enhances the chart’s sensitivity to mean shifts, whereas parameter B shows negligible influence. Additionally, linearly increasing variance contributes to improved detection capability, particularly under conditions of high correlation. Overall, the Multiple Measurement method demonstrates strong effectiveness and reliability across a variety of conditions, underscoring its practical utility in process control applications.</p>Debrina FerezagiaDeni Danial KesaEirene Christina SellyraDimas Anggara Cheng-Wen Lee
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-242025-06-2414293795510.19139/soic-2310-5070-2641Quasi Lindley Regression Model Residual Analysis for Biomedical Data
http://47.88.85.238/index.php/soic/article/view/2649
<p>The current study proposes and presents a new regression model for the response variable following the Quasi Lindley Regression. The unknown parameters of the regression model are estimated using the maximum likelihood method. A simulation study is conducted to evaluate the performance of the maximum likelihood estimates (MLEs). In addition, a residual analysis is performed for the proposed regression model. The log- Quasi Lindley Regression model is compared to several other models, including Lindley regression and gamma regression, using various statistical criteria. The results show that the suggested model fits the data better than these other models. The model is expected to have applications in fields such as economics, biological studies, mortality and recovery rates, health, hazards, measuring sciences, medicine, and engineering.</p>Ahmed Salih Wafaa Jaafar Hussein
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-122025-06-1214295696910.19139/soic-2310-5070-2649Hybrid Outlier Detection Framework Based on Optimized KMeans and HDBSCAN Using Bat Algorithm and LSTM Autoencoder
http://47.88.85.238/index.php/soic/article/view/2581
<p>Outlier detection is a critical task in data mining, especially in domains such as healthcare, cybersecurity, and fraud detection, where abnormal instances can signify crucial insights. Traditional approaches, including DBSCAN, Isolation Forest, and statistical techniques like Z-Score and IQR, often suffer from issues such as sensitivity to parameters, limited adaptability, and reduced effectiveness in high-dimensional or complex data. To overcome these limitations, this paper proposes a hybrid outlier detection framework that combines KMeans clustering with HDBSCAN, enhanced through Bat Algorithm-based optimization for dynamic selection of clustering parameters (eps and minsamples).</p> <p>The proposed method is evaluated alongside IS-DBSCAN, Autoencoders, and advanced graph-based approaches like Cluster Catch Digraphs (CCDs) with Outbound and Inbound Outlyingness Scores (OOS and IOS) use in this study. It explores and compares two advanced outlier detection approaches applied to two real-world datasets: the Online Retail and the Diabetes 130-US hospitals datasets. The first approach utilizes a scalable Spark-based DBSCAN algorithm, while the second integrates KMeans clustering with HDBSCAN, optimized via the Bat Algorithm (KMeans + HDBSCAN (BAT)). A Spark-based implementation of DBSCAN.These methods were evaluated on two real-world datasets—Diabetes and Online Retail—using Silhouette Score (SII) and classification Accuracy (Acc) as performance metrics with performanceperformance (F1 = 0.972, AUPRC = 0.947). Experimental results demonstrate that the proposed hybrid approach significantly outperforms the Spark-based DBSCAN in both clustering quality and classification performance, achieving a Silhouette score of 0.67 and accuracy of 66.8% on the Diabetes dataset performance (F1 = 0.66.2, AUC = 0.72.26%), and 0.59 and 97.35% respectively on the Online Retail dataset.For MINIST dataset The model achieved high performance (F1 = 0.92, AUC = 0.96), outperforming Isolation Forest, with notable improvements in clustering quality as BAT iterations increased.<br>These results highlight the effectiveness of integrating KMeans for initialization, HDBSCAN for density-based clustering, and the Bat Optimization algorithm for fine-tuning key parameters.</p>Mai AbdelsamieHosam RefaatMohammed AbdallahOsama Farouk
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-07-232025-07-23142970101710.19139/soic-2310-5070-2581Transforming IoT Security through Large Language Models: A Comprehensive Systematic Review and Future Directions
http://47.88.85.238/index.php/soic/article/view/2424
<p>The rapid integration of Large Language Models (LLMs) in Internet of Things (IoT) security presents both unprecedented opportunities and complex challenges. This systematic literature review examines 34 recent studies (2022-2024) to evaluate the effectiveness, challenges, and architectural innovations of LLM implementations in IoT security environments. Through a rigorous methodology following PRISMA guidelines, we analyze performance metrics, implementation strategies, and resource optimization approaches across diverse security applications. Our findings reveal significant advancements in detection capabilities, with frameworks like SecurityBERT achieving 98.2% accuracy while reducing model size by 89.85%, and privacy-preservation mechanisms demonstrating up to 98.247% protection effectiveness. However, persistent challenges emerge in resource optimization, real-time processing requirements, and cross-platform compatibility. The review identifies critical research gaps in standardization frameworks, ultra-constrained device optimization, and privacy-preserving architectures. Our analysis reveals promising architectural innovations, including hybrid deployment strategies reducing energy consumption by 45% and federated learning approaches achieving 97.12% accuracy while maintaining data privacy. This comprehensive review provides a foundation for future research directions in LLM-based IoT security, emphasizing the need for balanced approaches between security effectiveness and resource constraints. The findings suggest that successful implementation requires careful consideration of computational requirements, privacy preservation, and architectural optimization for resource-constrained environments</p>Mohammed Tawfik Amr H. Abdelhaliem Islam S. Fathi
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-07-132025-07-131421018104410.19139/soic-2310-5070-2424Assessing Financial Risk using Value-At-Risk (VaR) from the perspective of a third world economy, Zimbabwe’s Forex Market
http://47.88.85.238/index.php/soic/article/view/1307
<p>The Global Financial Depression of 2008 exposed the problems of financial risk estimations in the forex sector and impacted negatively on developing countries. In this paper, the performance of Generalised Autoregressive Conditional Heteroskedasticity (GARCH) family models are used to assess and compared in the estimation of Value at Risk (VaR). The study is based on three major currencies that are used in Zimbabwe’s multiple-currency regime against the USD. The three exchange rates considered are, the ZAR/USD, the EUR/USD, and the GBP/USD. Three univariate types of GARCH models, with the Student’s t and the Normal error distributions, are applied to the three currency indices to ascertain the best VaR estimation formula. Evaluation tests, namely the Violation ratio, Kupiec’s test, and Christoffersen’s test are used to assess the quality of the VaR performance. The GARCH (1, 1) with t-distributed errors produced relatively more accurate computations on the VaR for EUR/USD and GBP/USD at 99% level of significance, while the backtests results were inconclusive for ZAR/USD. The GARCH(1,1) model with t-distributed errors had the lowest Akaike's Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (SBIC) values. The GARCH (1,1) with t-distributed error model is suggested in computing VaR and making other deductions on the capital required. The Global Financial Depression of 2008 exposed the problems of financial risk estimations in the forex sector and impacted negatively on developing countries. In this paper, the performance of Generalised Autoregressive Conditional Heteroskedasticity (GARCH) family models is used to assess and compared in the estimation of Value at Risk (VaR). The study is based on three major currencies that are used in Zimbabwe’s multiple-currency regime against the USD. The three exchange rates considered are, the ZAR/USD, the EUR/USD, and the GBP/USD. Three univariate types of GARCH models, with the Student’s t and the Normal error distributions, are applied to the three currency indices to ascertain the best VaR estimation formula. Evaluation tests, namely the Violation ratio, Kupiec’s test, and Christoffersen’s test are used to assess the quality of the VaR performance. The GARCH (1, 1) with t-distributed errors produced relatively more accurate computations on the VaR for EUR/USD and GBP/USD at 99% level of significance, while the backtests results were inconclusive for ZAR/USD. The GARCH(1,1) model with t-distributed errors had the lowest Akaike's Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (SBIC) values. The GARCH (1,1) with t-distributed error model is suggested in computing VaR and making other deductions on the capital required.</p>Delson Chikobvu Thabani Ndlovu
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-06-232025-06-231421045105910.19139/soic-2310-5070-1307The Alpha Power Modified Weibull-Geometric Distribution: A Comprehensive Mathematical Framework with Simulation, Goodness-of-fit Analysis and Informed Decision making using Real Life Data
http://47.88.85.238/index.php/soic/article/view/2607
<p>A new lifetime distribution by compounding the Alpha Power Modified Weibull distribution, named Alpha Power Modified Weibull-Geometric distribution is introduced and discussed.The compounding of the distribution is motivated from the failure time of the system with series structure, where only the minimum lifetime value is considered. Various Statistical properties of the proposed distribution are investigated. Maximum likelihood<br>estimation method is used to estimate the model parameter. To assess the performance of the proposed method, monte carlo simulation study is conducted using various choices of effective sample size and parameter value. Finally, to illustrate the capability and flexibility of the proposed distribution, three real life data sets are considered and showed that the proposed distribution is more compatible by comparing with other competing lifetime distributions.</p> Magrisha NamsawBhanita Das Partha Jyoti HazarikaMorad Alizadeh
Copyright (c) 2025 Statistics, Optimization & Information Computing
2025-07-252025-07-251421060108710.19139/soic-2310-5070-2607