A Metaheuristic for Fuzzy Density Based SVM and Confidence SMOTE for Early Prediction of Diabetes

  • Asma Driouich Engineering, Mathematicals and Informatiques laboratory, Faculty of Sciences, uiz, Agadir, Morocco
  • ABDELLATIF EL OUISSARI LaR2A Laboratory, Faculty of Sciences, Abdelmalek Essaadi University, Tetouan, Morocco
  • Karim EL MOUTAOUAKIL Engineering Science Laboratory (LSI), Polydisplinary Faculty of Taza,USMBA, Morocco
  • Ismail Akharraz Engineering, Mathematicals and Informatiques laboratory, Faculty of Sciences, uiz, Agadir, Morocco
Keywords: DB-Support vector machine, Class Imbalance, Classification, Diabet, Machine learning

Abstract

Early detection of diabetes, based on observable features, plays a crucial role in preventing serious complications in diabetic patients. In this study, we propose a classification model called SMOTE Density Based Fuzzy Support Vector Machine (SMOTE-DB-FSVM), based on FSVM, to better detect diabetes. Our approach is based on five main steps: data cleaning, density-based filtering, feature selection to identify the most important attributes, calculation of a confidence score for each point in the minority class, and use of SMOTE to balance the data. In addition, we compare different versions of the kernel functions in the SVM model to optimize classification results, using metaheuristics to estimate the parameters of these kernels. The proposed SMOTE-DB-FSVM algorithm has been evaluated in diabetes datasets, including the PIMA diabetes database, and the results show a clear improvement in the early detection of diabetes with this method.

References

Li, J., Yuan, P., Hu, X., Huang, J., Cui, L., Cui, J., ... and Xu, J. (2021). A tongue features fusion approach to predicting prediabetes and diabetes with machine learning. Journal of Biomedical Informatics, 103693.

Thirunavukkarasu, U., and Umapathy, S. (2020). Classification of Prediabetes and Healthy Subjects in Plantar Infrared Thermal Imaging Using Various Machine Learning Algorithms. In Micro-Electronics and Telecommunication Engineering (pp. 85-96). Springer, Singapore.

Choi, S. B., Kim, W. J., Yoo, T. K., Park, J. S., Chung, J. W., Lee, Y. H., ... and Kim, D. W. (2014). Screening for prediabetes using machine learning models. Computational and mathematical methods in medicine, 2014.


Chen L, Magliano DJ, Zimmet PZ (2011) The worldwide epidemiology of type 2 diabetes mellitus—present and future perspectives. Nat Rev Endocrinol 8:228–236


Bounabi, M., Moutaouakil, K. E., \& Satori, K. (2020, December). The Automatic option of inference rules for the fuzzy TF-IDF. In 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS) (pp. 1-6). IEEE.

El Moutaouakil, K., \& Touhafi, A. (2020, November). A New Recurrent Neural Network Fuzzy Mean Square Clustering Method. In 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) (pp. 1-5). IEEE.


EL MOUTAOUAKIL KARIM., EL OUISSARI ABDELLATIF., Touhafi, A., \& AHARRANE, N. (2020, November). An Improved Density Based Support Vector Machine (DBSVM). In 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) (pp. 1-7). IEEE.

Kumari, V. Anuja, and R. Chitra. "Classification of diabetes disease using support vector machine." International Journal of Engineering Research and Applications 3.2 (2013): 1797-1801.

Hassan, M. M., and Amiri, N. Classification of Imbalanced Data of Diabetes Disease Using Machine Learning Algorithms. International Conference on Theoretical and Applied Computer Science and Engineering (ICTACSE, 2019) (2019), 21(81), 33-24.


Aharrane, Nabil, Karim El Moutaouakil, and Khalid Satori. "A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR." In 2015 Intelligent Systems and Computer Vision (ISCV), pp. 1-8. IEEE, 2015.


N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16 (2002) 321–357, ISSN 10769757, doi: 10.1613/ jair.953 .

H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Advances in intelligent computing 17 (12) (2005) 878–887, ISSN 1941-0506, doi: 10.1007/11538059 91 .

D. A. Cieslak, N. V. Chawla, A. Striegel, Combating imbalance in network intrusion datasets, in: IEEE International Conference on Granular Computing, 2006, IEEE, ISBN 1-4244-0134-8, 732–737, doi: 10.1109/GRC.2006.1635905 , 2006.

] I. Nekooeimehr, S. K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Systems with Applications 46 (2016) 405–416, ISSN 09574174, doi: 10.1016/j.eswa.2015.10.031 .

W.-C. Lin, C.-F. Tsai, Y.-H. Hu, J.-S. Jhang, Clustering-based undersampling in class-imbalanced data,
Information Sciences 409-410 (2017) 17–26, ISSN 0020-0255, doi: 10.1016/j.ins.2017.05.008 .

Khanam, J. J., \& Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express.


Tigga, Neha Prerna, and Shruti Garg. "Prediction of type 2 diabetes using machine learning classification methods." Procedia Computer Science 167 (2020): 706-716.

Shuja, M., Mittal, S., and Zaman, M. (2020). Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE. In Advances in computing and intelligent systems (pp. 195-211). Springer, Singapore.

Devi, R. D. H., Bai, A., and Nagarajan, N. (2020). A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms. Obesity Medicine, 17, 100152.


Ettaouil, M., Ghanou, Y., ElMoutaouakil, K., et al. Image medical compression by a new architecture optimization model for the Kohonen networks. International Journal of Computer Theory and Engineering, 2011, vol. 3, no 2, p. 204.

Aharrane, N., El Moutaouakil, K., & Satori, K. (2015). Recognition of handwritten Amazigh charactersbased on zoning methods and MLP. WSEAS transactions on Computers, 14(19), 178-185.

Bounabi, M., Moutaouakil, K. E., & Satori, K. (2019). A comparison of text classification methodsusingdifferentstemming techniques. International Journal of Computer Applications in Technology, 60(4), 298-306.

https://www.kaggle.com/uciml/pima-indians-diabetesdatabase

https://www.kaggle.com/johndasilva/diabetes
Published
2024-12-27
How to Cite
Driouich, A., EL OUISSARI, A., EL MOUTAOUAKIL, K., & Akharraz, I. (2024). A Metaheuristic for Fuzzy Density Based SVM and Confidence SMOTE for Early Prediction of Diabetes. Statistics, Optimization & Information Computing, 13(4), 1595-1609. https://doi.org/10.19139/soic-2310-5070-1348
Section
Research Articles