Statistics, Optimization & Information Computing http://47.88.85.238/index.php/soic <p><em><strong>Statistics, Optimization and Information Computing</strong></em>&nbsp;(SOIC) is an international refereed journal dedicated to the latest advancement of statistics, optimization and applications in information sciences.&nbsp; Topics of interest are (but not limited to):&nbsp;</p> <p>Statistical theory and applications</p> <ul> <li class="show">Statistical computing, Simulation and Monte Carlo methods, Bootstrap,&nbsp;Resampling methods, Spatial Statistics, Survival Analysis, Nonparametric and semiparametric methods, Asymptotics, Bayesian inference and Bayesian optimization</li> <li class="show">Stochastic processes, Probability, Statistics and applications</li> <li class="show">Statistical methods and modeling in life sciences including biomedical sciences, environmental sciences and agriculture</li> <li class="show">Decision Theory, Time series&nbsp;analysis, &nbsp;High-dimensional&nbsp; multivariate integrals,&nbsp;statistical analysis in market, business, finance,&nbsp;insurance, economic and social science, etc</li> </ul> <p>&nbsp;Optimization methods and applications</p> <ul> <li class="show">Linear and nonlinear optimization</li> <li class="show">Stochastic optimization, Statistical optimization and Markov-chain etc.</li> <li class="show">Game theory, Network optimization and combinatorial optimization</li> <li class="show">Variational analysis, Convex optimization and nonsmooth optimization</li> <li class="show">Global optimization and semidefinite programming&nbsp;</li> <li class="show">Complementarity problems and variational inequalities</li> <li class="show"><span lang="EN-US">Optimal control: theory and applications</span></li> <li class="show">Operations research, Optimization and applications in management science and engineering</li> </ul> <p>Information computing and&nbsp;machine intelligence</p> <ul> <li class="show">Machine learning, Statistical learning, Deep learning</li> <li class="show">Artificial intelligence,&nbsp;Intelligence computation, Intelligent control and optimization</li> <li class="show">Data mining, Data&nbsp;analysis, Cluster computing, Classification</li> <li class="show">Pattern recognition, Computer vision</li> <li class="show">Compressive sensing and sparse reconstruction</li> <li class="show">Signal and image processing, Medical imaging and analysis, Inverse problem and imaging sciences</li> <li class="show">Genetic algorithm, Natural language processing, Expert systems, Robotics,&nbsp;Information retrieval and computing</li> <li class="show">Numerical analysis and algorithms with applications in computer science and engineering</li> </ul> International Academic Press en-US Statistics, Optimization & Information Computing 2311-004X <span>Authors who publish with this journal agree to the following terms:</span><br /><br /><ol type="a"><ol type="a"><li>Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a <a href="http://creativecommons.org/licenses/by/3.0/" target="_new">Creative Commons Attribution License</a> that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.</li><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol></ol> The synthetic autoregressive model for the insurance claims payment data: modeling and future prediction http://47.88.85.238/index.php/soic/article/view/1584 <p><span class="fontstyle0">Time series play a vital role in predicting different types of claims payment applications. The future values of the expected claims are very important for the insurance companies for avoiding the big losses under uncertainty which may be produced from future claims. In this work, we define a new size-of-loss synthetic autoregressive model for the left skewed insurance claims datasets. The synthetic autoregressive model model is assessed due to some simulations experiments. The optimal parameter is also artificially determined. The insurance claims data is modeled using the synthetic autoregressive model.</span></p> Heba Soltan Mohamed Gauss M. Cordeiro Haitham Yousof Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-06 2025-05-06 14 1 1 19 10.19139/soic-2310-5070-1584 Adaptive Type-II Progressive Hybrid Censoring and Its Impact on Rayleigh Data Overlap Estimation http://47.88.85.238/index.php/soic/article/view/2148 <p>This article uses the adaptive type-II progressive hybrid censoring scheme, first introduced by Ng et al. in 2009, to estimate the overlap of two Rayleigh distributions with distinct scale parameters. The estimators for these overlap measures are derived using this censoring method, and their asymptotic bias and variance are also provided. In cases where small sample sizes make it challenging to assess the precision or bias of the estimators due to the lack of closed-form expressions for variances and exact sampling distributions, Monte Carlo simulations are used. Additionally, confidence intervals for these measures are constructed using both the bootstrap method and Taylor approximation. To highlight the practical importance of our proposed estimators, we present an analysis of real-life data focusing on the effect of mercaptopurine on sustaining remission in patients with acute leukemia.</p> Amal Helu Eman Aldabbas Omar Yasin Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-03-24 2025-03-24 14 1 20 41 10.19139/soic-2310-5070-2148 Estimating Kappa Distribution Parameters: A Comparative Study of Maximum Likelihood and LQ-Moment Approaches http://47.88.85.238/index.php/soic/article/view/2149 <p>The Kappa distribution, pioneered by researchers such as Hosking, stands as a widely applied continuous model in diverse scientific fields. This study delves into its practical utility, with a specific focus on amalgamating Gamma and Log-Normal distributions. The vital distributional parameters<br>($\alpha ,\beta,\theta$)are subject to estimation through both Maximum Likelihood (MLE) and LQ-moment methods. Across a spectrum of sample sizes (25, 50,100, and 150), the LQ-moment method consistently exhibits superior performance compared to MLE.Additionally, the research introduces two essential reliability metrics: Mean Inactivity Time (MIT) and Stress-Strength Reliability (SSR). MIT, influenced by distribution parameters, provides insights into the temporal behavior of the random variable. SSR evaluates system reliability by accounting for the probability of component failure under stress conditions. The paper concludes with a comparative analysis of parameter estimation methods, emphasizing the enhanced accuracy of the LQ-moment approach, particularly noticeable in smaller sample sizes (50 and 100).</p> Manal Mohammed Koran Sameera Abdulsalam Othman Delbrin Ahmed Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-09 2025-04-09 14 1 42 61 10.19139/soic-2310-5070-2149 Semiparametric Biresponse Regression Modeling Mixed Spline Truncated, Fourier Series, and Kernel in Predicting Rainfall and Sunshine http://47.88.85.238/index.php/soic/article/view/2166 <p>The biresponse semiparametric regression analysis combines parametric and nonparametric components to understand the relationship between two correlated response variables and predictor variables. In this approach, the nonparametric component can be estimated using spline truncated, Fourier series, or kernel methods, each suitable for specific data patterns. This study aims to estimate the parameters of a mixed semiparametric regression model on climate data using the Weighted Least Square (WLS) method and to select optimal knot points, oscillation parameters, and bandwidth based on the smallest Generalized Cross Validation (GCV) value. The results show that the best model combines a spline truncated component with one knot and a Fourier series component with one oscillation, yielding a minimum GCV of 7.401, an R² of 84.66%, and an MSE of 92.33. The findings suggest that the biresponse semiparametric regression model combining spline truncated, Fourier series, and kernel estimators is highly effective for modeling climate data with complex predictor patterns.</p> Hartina Husain Putri Indi Rahayu Muhammad Rifki Nisardi Muhammad Aslam Al-Fadhilah Ahmad Husain Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-23 2025-04-23 14 1 62 76 10.19139/soic-2310-5070-2166 The New Topp-Leone-Marshall-Olkin-Gompertz-G Family of Distributions: Properties, Different Estimation Techniques and Applications on Censored and Complete Data http://47.88.85.238/index.php/soic/article/view/2239 <pre>A new family of distributions (FoD) called the Topp-Leone-Marshall-Olkin Gompertz-G is presented in this paper. Derivations of some statistical properties were carried out. The model parameters were estimated using five methods, including weighted least squares, maximum likelihood estimation, least squares, Cram\'er-von Mises, and Anderson Darling. The simulation experiment assessed the precision of the model parameters through the utilization of five estimation methods. To evaluate the adaptability and utility of this new FoD, three real-life datasets were analyzed using a special case from the developed family of distributions, one of which contained censored data. Remarkably, the new model showed exceptional performance when compared against six other non-nested models. This comparison highlighted its superiority and effectiveness in modeling real-life datasets. </pre> Peter Tinashe Chinofunga Broderick Oluyede Fastel Chipepa Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-05 2025-04-05 14 1 77 104 10.19139/soic-2310-5070-2239 Bitcoin Halving Cycles and Their Impact on the Gold Relationship http://47.88.85.238/index.php/soic/article/view/2299 <p>This paper examines the interdependencies between Bitcoin and gold within the context of Bitcoin halving cycles. Using a comprehensive econometric approach, including cointegration tests, VAR and VECM models, DCC-GARCH modeling, and wavelet coherence analysis, we investigate short- and long-term dynamics linking these two assets.<br>Our findings indicate that, over the long term, Bitcoin exhibits characteristics similar to gold as a safe haven despite its high volatility and sensitivity to short-term shocks. Moreover, the incorporation of macroeconomic variables, such as stock market indices and oil prices, highlights the significant influence of broader economic conditions on this relationship. These results suggest that while Bitcoin may serve as a complementary asset to gold in diversified portfolios, prudent management is essential to mitigate the risks associated with its speculative nature.</p> Mustaph KHALFOUNI Rafia FRIJ Mohammed Lamarti Sefian Noaman LAKCHOUCH Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-05 2025-04-05 14 1 105 129 10.19139/soic-2310-5070-2299 Hybrid Euler method and Pontryagin Principle in fractional dengue model with sex classification and optimal controls http://47.88.85.238/index.php/soic/article/view/2393 <p>This study develops a fractional epidemiological model to investigate the dynamics of dengue transmission, incorporating<br>biological and behavioral differences between male and female human populations. The model utilizes<br>fractional calculus to capture memory effects, which are essential for understanding the long-term behavior of infectious<br>diseases. Control variables representing fumigation and preventive measures are introduced to evaluate<br>intervention strategies, formulating a fractional optimal control problem. To solve the model, Euler’s method is employed<br>for numerical approximation of the fractional differential equations, while Pontryagin’s Minimum Principle<br>and a forward-backward numerical approach are applied to determine optimal strategies. Numerical simulations<br>reveal that the combined control strategy, employing both fumigation and preventive measures, is the most effective<br>in minimizing infection levels and system costs. The results also demonstrate that higher fractional orders enhance<br>the efficiency of system dynamics. This research provides a robust framework for modeling dengue transmission and<br>designing cost-effective public health interventions, with potential extensions to account for additional real-world<br>complexities.</p> Faishal F. Herdicho Fatmawati Cicik Alfiniyah Chidozie W. Chukwu Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-23 2025-04-23 14 1 130 143 10.19139/soic-2310-5070-2393 Ensemble Method for Intervention Analysis to Predict the Water Resources of the Tigris River http://47.88.85.238/index.php/soic/article/view/2413 <p>Iraq faces real challenges and great concerns regarding water resources, including the noticeable decrease in the flow of the Tigris River into Iraqi territory due to various irrigation projects being implemented on the Turkish side, the latest of which was the construction of the Ilisu Dam, which exacerbated the water crisis and placed Iraq before a serious challenge. The flow river data utilized in this paper was the annual revenue of the Tigris River, representing the amount of water entering Iraq at the Turkish border data for the period (Oct-2014–Sep-2021), which equals 84 months. To enhance the accuracy of Tigris River flow forecasting with the ARIMA models as a classical statistical approach and the nonlinear model of eXtreme Gradient Boosting (XGBoost), a Random Tree ensemble model was proposed in this study. Two distinct ARIMA models are employed to capture the linear characteristics of the Tigris River flow: SARIMA and ARIMAX. XGBoost model was utilized to capture the nonlinear characteristics of the Tigris River flow. The results reveal that the Tigris River flow prediction using the Random Tree ensemble model achieves better than the other models introduced in this paper regarding the evaluation measurements. The forecast suggests stabilizing the river flow aligning with the low average river flow level, with variations observed. These seasonal changes reflect the impact of increased river flow during the rainy season in Iraq and Turkey during peak times and reduced river flow in the summer months.</p> Muzahem Al-Hashimi Heyam Hayawi Alawjar Mohammed Alawjar Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-08 2025-04-08 14 1 144 161 10.19139/soic-2310-5070-2413 A New Mixed Gamma-Exponential Frailty Model under Heterogeneity Problem with Validation Testing for Emergency Care Data http://47.88.85.238/index.php/soic/article/view/2451 <p>Frailty models play a crucial role in survival analysis as they account for unobserved differences among individuals, which may arise from various factors like genetics, environment, or lifestyle. These models help in identifying such factors and assessing their influence on survival outcomes. In this research, we introduce a new frailty model called the Mixed Gamma-Exponential (MxGEF) model for survival analysis. To evaluate its appropriateness, we apply the Rao-Robson-Nikulin (RR-Ni) and and the Bagdonaviμcius and Nikulin (B-Ni) goodness-of-fit tests, analyzing the distribution’s characteristics and comparing its effectiveness against commonly used distributions in frailty modeling. Through simulation studies and real-world data applications, including a dataset collected from an emergency hospital in Algeria, we demonstrate how the MxGEF model effectively captures heterogeneity and improves model fitting. Our findings suggest that the MxGEF model is a promising alternative to existing frailty models, potentially enhancing the accuracy of survival analyses across various fields, including emergency care. Additionally, we explore the applicability of the MxGEF model in insurance through simulations and real data analysis, showcasing its versatility and potential impact in this domain.</p> Hafida Goual Loubna Hamami Mohamed S. Hamed Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-01 2025-05-01 14 1 162 182 10.19139/soic-2310-5070-2451 Estimating concealment behavior via innovative and effective randomized response model http://47.88.85.238/index.php/soic/article/view/2522 <p>Estimating concealment behavior via direct questioning often fail. One proposed and effective solution to tackle this challenge is the Randomized Response Technique (RRT). This study aims to present a new efficient and easily applicable randomized response model as a tool for measuring concealment behavior. Efficiency examination and privacy protection of the proposed model are analyzed. As a real-world implementation of the model, the case of COVID-19 non-disclosure among university students is investigated as an example of concealment behavior. The proposed model, with a rational choice of parameters, was tested on a sample of university students and proved to be practically reliable. Health status disclosure ratio was estimated. This estimate serves as a foundation for predicting the concealment behavior in different fields.</p> Ahmad M. Aboalkhair El-Emam El-Hosseiny Mohammad A. Zayed Tamer Elbayoumi Mohamed Ibrahim A. M. Elshehawey Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-24 2025-04-24 14 1 183 192 10.19139/soic-2310-5070-2522 COMPLEXITY ANALYSIS OF LARGE-UPDATE INTERIOR-POINT METHODS FOR 𝓟∗(𝜿)-HLCP BASED ON A NEW PARAMETRIC KERNEL FUNCTION http://47.88.85.238/index.php/soic/article/view/2345 <p>This work proposes a primal-dual interior point technique for 𝒫∗(𝜅)-Horizontal Linear Complementarity Problem (𝒫∗(𝜅)-HLCP), based on a novel parameterized kernel function. Our new eligible parametric kernel function’s feature produces the following iteration bound 𝑂(((𝑝+1)(𝑛𝑝)𝑝+22(𝑝+1)⁄𝑙𝑜𝑔(𝑛𝜀))) for the large-update method. Finally, we present numerical results demonstrating the algorithm’s pratical performance among various parameters.</p> Mokrani Ibtissam Chalekh Randa Djeffal El Amir Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-24 2025-04-24 14 1 193 206 10.19139/soic-2310-5070-2345 Intelligent Decision Making and Knowledge Management System for Industry 4.0 Maturity Assessment http://47.88.85.238/index.php/soic/article/view/2461 <p>Achieving a seamless transition to Industry 4.0 requires a holistic, knowledge-driven approach that integrates multiple dimensions of digital transformation. This paper proposes a smart, data-driven ontology-based system that integrates strategic, operational, technological, and cultural dimensions for Industry 4.0 maturity assessment. Built using OWL (Ontology Web Language) for structured knowledge representation and SWRL rules (Semantic Web Rule Language) for intelligent inference, the proposed ontology-based system assesses manufacturing enterprises into five maturity levels: Pre-Adoption, Experimental, Transitional, Integrated, and Transformational. It leverages technical KPIs from SCADA, ERP, IoT, and the industrial real-time data sources to enable automated reasoning and data-driven decision-making. An industrial case study in an automotive manufacturing plant is developed to validate the proposed ontology-based system potentialities and effectiveness in optimizing the industry 4.0 maturity assessement process, maturity levels aggregations and effective insights generation. The results highlight its adaptability across industries, offering a scalable and intelligent solution for Industry 4.0 assessment and adoption. It highlight also its potential to ensure domain-specific digital transformation benchmarking and previous maturity models interoperability.</p> Asmae ABADI Chaimae ABADI Mohammed ABADI Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-03-29 2025-03-29 14 1 207 228 10.19139/soic-2310-5070-2461 Application of RBF neural network in predicting thalassemia disease in Mosul city http://47.88.85.238/index.php/soic/article/view/2303 <p>Thalassemia is a hereditary blood disorder that can be born to children if both parents are carriers of the gene mutation. The effect of this mutation is a faster than normal rate of destruction of red blood cells and thus iron accumulation and decreased availability of hemoglobin, The quantity, quality and shape of red blood cells are also reduced. In the current dataset, the number of severe thalassemia = 131, moderate thalassemia = 149 and 13 variables were used. A sample taken from Al-Hadbaa Specialized Hospital for Hematology and Bone Marrow Transplantation in Mosul, Iraq was used. In this study, before applying Python to this model, these variables were data cleaned to remove any gaps. The data was divided into several sections using cross-validation to test the model. Radial Basal Function (RBF) networks were used in this study to classify thalassemia patients with respect to the specified model performance measurement criteria. The experimental results revealed that RBF networks performed well with test accuracy of 96%; F1 score of 96%; high sensitivity of 97%; high specificity of 95\%; and a high positive predictive value of 95%. The resulting area under the curve was 99.5%, which is very close to the ideal for the sample. Through experiments, we found that the best setting is a learning rate of 0.1 and sixteen neurons in the hidden layer. Furthermore, a random forest model was used to identify the most significant features influencing the differentiation between types of thalassemia. The results showed that the most important features are HBA1 (adult hemoglobin) and HBF (fetal hemoglobin), which represent the main indicators for determining the type of thalassemia due to their significant impact on classification. This is followed by the HB (total hemoglobin) feature as a third important feature, and then growth delay and HBA2 with varying degrees of importance. These analyses helped identify the fundamental factors associated with the genetic and clinical differences between major thalassemia and intermediate thalassemia, contributing to enhancing the understanding of the precise classification of the disease and improving diagnostic and treatment strategies.</p> Mohamed Ali Hutheyfa Hazem Taha Copyright (c) 2024 Statistics, Optimization & Information Computing 2025-04-18 2025-04-18 14 1 229 246 10.19139/soic-2310-5070-2303 Enhancing Text Encryption and Secret Document Watermarking through Hyperladder Graph-Based Keystream Construction on Asymmetric Cryptography Technology http://47.88.85.238/index.php/soic/article/view/2310 <p>Message security remains a vital concern in cryptography. This paper introduces a novel enhancement to the classical Caesar cipher by generating a keystream from a Hyper-Ladder Graph, which combines hypergraph and ladder graph properties to produce complex and unpredictable patterns. The proposed method is evaluated against AES, DES, ChaCha20, and XChaCha20, showing superior performance in encryption time and memory efficiency, especially in constrained environments. To demonstrate broader applicability, we implemented the keystream in grayscale image watermarking. The binary keystream was first encrypted using RSA public key encryption, then embedded using the least significant bit (LSB) method. The results showed high imperceptibility with a PSNR of 57.05 dB and an SSIM of 0.9989. This integration of graph-based keystream and asymmetric cryptography offers robust security and flexibility, making it suitable for various domains such as secure text encryption, digital watermarking, and document authentication.</p> Dafik Swaminathan Venkatraman G. Sathyanarayanan Rifki Ilham Baihaki Indah Lutfiyatul Mursyidah Ika Hesti Agustin Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-29 2025-04-29 14 1 247 263 10.19139/soic-2310-5070-2310 Optimizing Data Replication in Cloud Computing Using Firefly-Based Algorithm for Selection and Placement http://47.88.85.238/index.php/soic/article/view/2317 <p>The rapid adoption of cloud computing has driven extensive research into data replication methods and their practical applications. Data replication is a vital process in cloud systems, ensuring data availability, improving performance, and maintaining system stability. This is especially crucial for data-intensive applications that require the distribution and sharing of large volumes of information across geographically dispersed centers. However, managing this process presented significant challenges. As the number of data replicas increases and they are distributed across multiple locations, the associated costs and complexity of maintaining system usability, performance, and stability also rise. In this study, we initially randomized the distribution of data replication files across the cloud infrastructure to simulate a realistic scenario where data already exists within the system before the application of replication algorithms. This approach allowed the algorithms to optimize the replication process based on the initial data distribution and adapt to the evolving demands of incoming workloads. To address the challenges of dynamic data replication in cloud environments, this paper introduced two algorithms: the Firefly Optimization Algorithm for Data Replica Selection (FFO-S) and the Firefly Optimization Algorithm for Replica Placement (FFO-P). A detailed simulation study was performed using the CloudSim platform to assess the effectiveness of the proposed FFO-S and FFO-P algorithms. The simulation environment was designed to closely emulate real-world cloud infrastructures, ensuring the practical applicability of the results<strong>.</strong></p> B. Hafiz Heba Abdelrahman BenBella Tawfik Hosam E.Refaat Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-07 2025-04-07 14 1 264 281 10.19139/soic-2310-5070-2317 Deploying an IoT-enabled Integrated Comprehensive Home Automation System using WSN for Enhanced Continuous Optimization and Fault Identification System http://47.88.85.238/index.php/soic/article/view/2340 <p>Nowadays, there is a concept of making the home smart, which reduces human efforts means we cannot operate, monitor, and look after electrical appliances or the environment. In other words, conventional home systems cannot change their state instantly based on different situations. This article proposes a smart home automation system combining multiple Internet of Things (IoT) techniques for secure and power-efficient control of the home environment designed with the user in mind. Entry is through an RFID reader, which asks users to swipe their card on the sensor for access. If the verification passes, the door unlocks, and the yellow LED lights up; otherwise, the red LED lights up. Within the house, there are also temperature and humidity sensors and gas sensors scattered throughout, which provide feedback data to be displayed on an LCD. When the level of gas crosses a limit, an email message goes to the house owner with a buzzer warning inside the home. There would be remote control of lights at home with the matching Blynk app and further voice-controlled features over Google Assistant. In the case of a water tank, an automated system using ultrasonic sensors to detect levels and control pump action when needed (so as not to run dry or overfill) includes status updates provided in the Blynk app. Also, it has a single-axis solar tracking system with LDR sensors for optimal alignment of the solar panel to get maximum energy from sunlight. It also comprises a rain sensor together with a servo motor that closes windows whenever there is precipitation or rain, in addition to home safety and convenience. This system helps to improve safety, comfort and energy management in the home by using IoT technologies that are used for controlling lighting or climate control systems manually.</p> Abu Salyh Muhammad Mussa Md. Taslim Arif Abdullah Al Mamun Abdul Hasib Md Anwarul Islam Rakib Hossen Anichur Rahman Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-05 2025-05-05 14 1 282 310 10.19139/soic-2310-5070-2340 Performance of Machine Learning Algorithms for Credit Risk Prediction with Feature Selection http://47.88.85.238/index.php/soic/article/view/2392 <p>Financial institutions increasingly rely on machine learning (ML) models to assess credit risk and make lending decisions. Accurate prediction hinges on effective feature selection, which can significantly enhance model performance. This paper investigates the efficacy of seven supervised ML algorithms in predicting credit risk: Naive Bayes, Support Vector Machine, Decision Tree, K-Nearest Neighbor, Artificial Neural Network, Random Forest, and Logistic Regression. Using a German credit dataset comprising 1000 observations with 20 explanatory variables, we evaluated model performance using accuracy, kappa statistic, and F1 score. Two data-splitting scenarios (70-30\% and 80-20\%) were employed to assess robustness. To optimize model performance, we addressed outliers through imputation methods and applied the Boruta algorithm for feature selection, which identified and eliminated six non-contributing features. Our findings consistently demonstrate the superiority of the Random Forest algorithm across both scenarios. In terms of accuracy, Random Forest achieved 77.3\% in the 70-30\% split and 80\% in the 80-20\% split, outperforming all other methods. These results underscore the potential of Random Forest as a valuable tool for credit risk assessment in financial institutions.</p> Muhammad M. Seliem Muhammad Amin Mona Mahmoud Abo El Nasr Emad Abdelaziz Elnaggar Hany Abdelmonem Mohamed Khalifa Mona Ahmed Abdelwahab Arab Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-04-05 2025-04-05 14 1 311 328 10.19139/soic-2310-5070-2392 Text-Line Segmentation Techniques for Arabic-Handwritten Documents: A Review http://47.88.85.238/index.php/soic/article/view/2466 <p>In this paper, a review of text-line segmentation is conducted, more specifically the focus was on Arabic-handwritten documents. Arabic-handwritten has a lot of difficulties when it comes to segmentation since the handwriting styles vary from one person to another, not to mention the skewness or inclined text-lines that might appear in the document that makes it difficult to implement text-line segmentation on it, along with diacritics marks that can also pose intricacies for the segmentation process. Therefore, in this research a general review about the related work that has been done in this field is first conducted, and the miscellaneous text-line segmentation algorithms used in recent years are reviewed. Moreover, comprehensive techniques for Arabic-handwritten text-lines segmentation are presented, elaborating the results of each method used along with datasets type. Finally, this review paper underscores the continuous need for innovative solutions that can handle the complexities of text-line segmentation in Arabic-handwritten documents, emphasizing the importance of pattern recognition and computer vision techniques. This will pave the way for researchers to build upon this work and invent robust new techniques to solve this problem in the future.</p> Waleed Abuain Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-01 2025-05-01 14 1 329 339 10.19139/soic-2310-5070-2466 A Hybrid Approach of Long Short Term Memory and Transformer Models for Speech Emotion Recognition http://47.88.85.238/index.php/soic/article/view/2521 <p>Speech emotion recognition (SER) has become a critical component of the next generations of technologies that interact between humans and machines. However, in this paper, we explore the advantage of the hybrid LSTM + Transformer model over the solo LSTM and Transformer models. The proposed method contains the following steps: data loading using benchmark datasets such as the Toronto Emotional Speech Set (TESS), Berlin Emotional Speech Database (EMO-DB), and (SAVEE). Secondly, to create a meaningful representation to preprocess raw audio data, Mel-Frequency Cepstral Coefficients (MFCCs) are used; thirdly, the model’s architecture is designed and explained. Finally, we evaluate the precision, recall, F1 score, classification reports, and confusion matrices of the model. The outcome of this experiment based on classification reports and confusion matrices shows that the hybrid LSTM + Transformer model has a remarkable performance on the TESS-DB, surpassing the other models with a 99.64% accuracy rate, while the LSTM model gained 97.50% and the Transformer model achieved 98.21%. For the EMO-DB, the LSTM model achieved the highest accuracy of 73.83%, followed by the hybrid that gained 71.96%, and the Transformer model achieved 70.09%. Lastly, LSTM obtained the highest performance on SAVEE-DB of 65.62% accuracy, followed by the Transformer model which achieved 58.33%, and the hybrid model achieved 56.25%.</p> Tarik AbuAin Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-01 2025-05-01 14 1 340 351 10.19139/soic-2310-5070-2521 Evaluating Fault Detection Techniques in Real Electrical Transformers: A Comparative Case of Study http://47.88.85.238/index.php/soic/article/view/2446 <p>Transformers are integral to the reliability of electrical networks, necessitating robust diagnostic methods for fault detection. This study conducts a comparative evaluation of three fault detection techniques using real-world data from distribution transformers. The methods analyzed include differential current analysis, correlation-based techniques, and flux linkage increments. Results demonstrate that differential current analysis exhibits the highest sensitivity (93.33\%), detecting faults at 4.41\% of the short-circuit current. Correlation-based methods follow with 86.67\% sensitivity, while flux linkage increments offer lower sensitivity but robust performance at high current levels. This comparative analysis provides actionable insights for enhancing transformer reliability through effective monitoring strategies.</p> Santiago Guzman-Arteaga Santiago Gómez-Arango Daniel Sanin-Villa María del Pilar Buitrago-Villada Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-06-12 2025-06-12 14 1 352 372 10.19139/soic-2310-5070-2446 Application of Ujlayan-Dixit Fractional Chi-Square Probability Distribution http://47.88.85.238/index.php/soic/article/view/2334 <p>In this study, we take into account the Ujlayan-Dixit (UD) fractional derivative in order to introduce the fractional probability density function for the Chi-Square distribution (CSD), and to establish certain new applications for this distribution through the use of fractional concepts in probability theory, such as cumulative distribution, survival, and hazard functions. Furthermore, other ideas and applications for continuous random variables are developed using the UD fractional analogs of statistical measures, which are expectation, rth-moments, rth-central moments, variance, and standard deviation. Lastly, we provide the UD fractional entropy measures, including Shannon, Tsallis, and R´enyi entropy.</p> Iqbal H. Jebril Raed Hatamleh Iqbal Batiha Nadia Allouch Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-06-03 2025-06-03 14 1 373 386 10.19139/soic-2310-5070-2334 Evaluation of process capability index based on exponential progressively Type-Ⅱ data with effect from multiple production lines http://47.88.85.238/index.php/soic/article/view/2469 <p>Process capability indices have been widely used to assess process performance to drive continuous improvement in quality and productivity, with larger ones being better for life cycle performance indicators. For multiple production lines, an overall process capability index is proposed in this paper. When the lifetime of units follows an exponential distribution and the differences in testing facilities are taken into account, the maximum likelihood estimation, uniformly minimum variance unbiased estimation and generalized estimation for the lifetime performance index were investigated. In order to investigate the advantages of each method, extensive Monte Carlo simulations are carried out. Finally, the practical applications of the proposed methods are demonstrated through the analysis of two real-life datasets.</p> Zirui Chu Liang Wang Yogesh Mani Tripathi Sanku Dey Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-06-09 2025-06-09 14 1 387 414 10.19139/soic-2310-5070-2469 Compliance-and-defiance dilemma game best strategy for three and four agents http://47.88.85.238/index.php/soic/article/view/2370 <p>Dilemma noncooperative games involving three or four agents are studied, where strategic interaction is based on selecting between the compliance strategy and the defiance strategy. When the agent applies the compliance strategy, it increases the agent’s loss amount; contrariwise, when the agent applies the defiance strategy, it does not increase the agent’s loss amount. It is assumed that if only one agent defies, it does not much affect the system, and there are no fines. When two or three agents defy, every agent is fined by the same amount. The objective is to determine and analyze the agent’s best strategy in the dilemma games with three and four agents dealing with the full-defiance situation cost as a variable, where partial-defiance situation cost is deduced from this variable. The agent’s best strategy is determined depending on the full-defiance situation cost. The best strategy for every agent is to defy (by applying the defiance pure strategy) with a probability that ensures the global minimum to the agent’s loss on the probabilistic interval [0; 1] by a given value of the full-defiance situation cost. In the three-agent dilemma game, the best strategy is to fully defy if this cost does not exceed 2/3, whereupon the agent’s loss is equal to the cost. As the cost is increased off value 2/3, the best strategy probability exponentially-like decreases, while the agent’s minimized loss increases in the same manner. Nevertheless, the best strategy ensures the agent’s minimized loss does not exceed 1 (a conditional unit), which is the cost of full compliance. In the four-agent dilemma game, the best strategy is to fully defy if the full-defiance situation cost does not exceed 8/9, whereupon the agent’s loss is equal to the cost. As the cost is increased off value 8/9, the best strategy probability exponentially-like decreases dropping down off value 1/4, which is also the best strategy (being far more favorable for the system) by when the full-defiance situation cost is 8/9. Similarly to the agent’s minimized loss in the three-agent game, the agent’s minimized loss in the four-agent game increases in the same manner never exceeding 1.</p> Vadim Romanuke Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-09 2025-05-09 14 1 415 433 10.19139/soic-2310-5070-2370 Stacked Ensemble Method: An Advanced Machine Learning Approach for Anomaly-based Intrusion Detection System http://47.88.85.238/index.php/soic/article/view/2352 <p>The subject of this article is IDS-Intrusion Detection Systems, which are strongly related to a comprehensive cyber attack prevention system. In the present day, an IDS for network infrastructure is a crucial topic. The advancement of SDN-Software Defined Networking has led to a rising need for software-based IDS-Intrusion Detection Systems. Diverse methodologies, including machine learning algorithms and other statistical models, have been used to develop distinct kinds of IDS-Intrusion Detection Systems to enhance performance. But still, that needs to be improved. Several studies have focused on solving these problems for this reason, utilizing methods like conventional machine learning models. However, existing systems need to improve, including low detection rate and high false alarm rate. The aim is to improve performance, specifically in terms of increases in detection rate. This work introduces a new IDS-Intrusion Detection System named SIDS-Stacked Intrusion Detection System, which utilizes a stack-based approach to improve detection accuracy and resilience. The objective is to utilize various predictive algorithms most efficiently. An ensemble classifier method is used to enhance the precision of the final prediction by amalgamating the outputs of multiple models. This research implemented numerous ML-machine learning methodologies, including Stochastic Gradient Descent, Logistic Regression, Random Forest, and Deep Neural Networks, to construct a multilayered model that would optimize network intrusion detection accuracy. This challenging research project employs the NSL-KDD dataset. In previous studies, the stacked model (DNN1 + DNN2) has a maximum accuracy of 97.90% for intrusion detection. However, the suggested trained model outperforms existing models by 98.40%. Additionally, the offered stacked model attains F1-score 99.2%, a FPR-false positive rate 95.6%, and a FNR-false negative rate 1.42%. In conclusion, the findings indicate that a stacked ensemble method can enhance evaluation metrics and provide more consistent performance.</p> Anichur Rahman Md. Saikat Islam Khan MD. Zunead Abedin Eidmum Pabon Shaha Bakhtiar Muiz Nahid Hasan Tanoy Debnath Dipanjali Kundu Jarin Tasnim Tamanna Mohammad Sayduzzaman Muaz Rahman Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-05-14 2025-05-14 14 1 434 453 10.19139/soic-2310-5070-2352 Enhanced Outlier Detection in Linear-Circular Regression Using Circular Distance and Mean Resultant Length http://47.88.85.238/index.php/soic/article/view/2459 <p>In the study of outlier identification in linear-circular regression, two new methods are proposed. By calculating the circular distance of each erroneous value and using the mean resultant length for outlier identification, these methods aim to enhance the precision and reliability of outlier detection. Their effectiveness will be assessed through comprehensive simulations on datasets with and without outlier contamination, comparing them with the previous method. Additionally, the methods were tested on real-world data, specifically wind speed and wind direction data, to further validate their practical applicability. Three metrics are used to evaluate their performance: the probability of correctly identifying all outliers, the masking effect, and the swamping effect. While occasional misclassification of inliers as outliers is possible, the results indicate that both proposed methods demonstrate strong overall performance.</p> Thunchanok Chaitongdee Wuttichai Srisodaphol Oktsa Rahmashari Benjawan Rattanawong Khanuengnij Prakhammin Copyright (c) 2025 Statistics, Optimization & Information Computing 2025-06-09 2025-06-09 14 1 454 468 10.19139/soic-2310-5070-2459