A Metaheuristic for Fuzzy Density Based SVM and Confidence SMOTE for Early Prediction of Diabetes
SDB-SVM
Keywords:
DB-Support vector machine, Class Imbalance, Classification, Diabet, Machine learning
Abstract
Diabetes is a chronic disease that affects millions of people worldwide. In this work, we propose a confident version of the density-based support vector machine for early detection of diabetes. The proposed method, called SMOTE Density Based Support Vector Machine (SDB-SVM), considers unbalanced data sets. First, we clean the diabetes datasets using DBSVM which has a high ability to detect harmful samples. Then, we call SMOTE to balance the datasets based on the confidence of each synthetic point. DBSVM allows SMOTE to produce synthetic data that is plausible to the minority class data. We test the proposed system on several unbalanced diabetes datasets such as PIMA and Germany datasets. In this sense, we compare our method with well-known classifiers. The experimental results show the superiority and efficiency of the proposed algorithm.References
Li, J., Yuan, P., Hu, X., Huang, J., Cui, L., Cui, J., ... and Xu, J. (2021). A tongue features fusion approach to predicting prediabetes and diabetes with machine learning. Journal of Biomedical Informatics, 103693.
Thirunavukkarasu, U., and Umapathy, S. (2020). Classification of Prediabetes and Healthy Subjects in Plantar Infrared Thermal Imaging Using Various Machine Learning Algorithms. In Micro-Electronics and Telecommunication Engineering (pp. 85-96). Springer, Singapore.
Choi, S. B., Kim, W. J., Yoo, T. K., Park, J. S., Chung, J. W., Lee, Y. H., ... and Kim, D. W. (2014). Screening for prediabetes using machine learning models. Computational and mathematical methods in medicine, 2014.
Chen L, Magliano DJ, Zimmet PZ (2011) The worldwide epidemiology of type 2 diabetes mellitus—present and future perspectives. Nat Rev Endocrinol 8:228–236
Bounabi, M., Moutaouakil, K. E., \& Satori, K. (2020, December). The Automatic option of inference rules for the fuzzy TF-IDF. In 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS) (pp. 1-6). IEEE.
El Moutaouakil, K., \& Touhafi, A. (2020, November). A New Recurrent Neural Network Fuzzy Mean Square Clustering Method. In 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) (pp. 1-5). IEEE.
EL MOUTAOUAKIL KARIM., EL OUISSARI ABDELLATIF., Touhafi, A., \& AHARRANE, N. (2020, November). An Improved Density Based Support Vector Machine (DBSVM). In 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) (pp. 1-7). IEEE.
Kumari, V. Anuja, and R. Chitra. "Classification of diabetes disease using support vector machine." International Journal of Engineering Research and Applications 3.2 (2013): 1797-1801.
Hassan, M. M., and Amiri, N. Classification of Imbalanced Data of Diabetes Disease Using Machine Learning Algorithms. International Conference on Theoretical and Applied Computer Science and Engineering (ICTACSE, 2019) (2019), 21(81), 33-24.
Aharrane, Nabil, Karim El Moutaouakil, and Khalid Satori. "A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR." In 2015 Intelligent Systems and Computer Vision (ISCV), pp. 1-8. IEEE, 2015.
N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16 (2002) 321–357, ISSN 10769757, doi: 10.1613/ jair.953 .
H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Advances in intelligent computing 17 (12) (2005) 878–887, ISSN 1941-0506, doi: 10.1007/11538059 91 .
D. A. Cieslak, N. V. Chawla, A. Striegel, Combating imbalance in network intrusion datasets, in: IEEE International Conference on Granular Computing, 2006, IEEE, ISBN 1-4244-0134-8, 732–737, doi: 10.1109/GRC.2006.1635905 , 2006.
] I. Nekooeimehr, S. K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Systems with Applications 46 (2016) 405–416, ISSN 09574174, doi: 10.1016/j.eswa.2015.10.031 .
W.-C. Lin, C.-F. Tsai, Y.-H. Hu, J.-S. Jhang, Clustering-based undersampling in class-imbalanced data,
Information Sciences 409-410 (2017) 17–26, ISSN 0020-0255, doi: 10.1016/j.ins.2017.05.008 .
Khanam, J. J., \& Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express.
Tigga, Neha Prerna, and Shruti Garg. "Prediction of type 2 diabetes using machine learning classification methods." Procedia Computer Science 167 (2020): 706-716.
Shuja, M., Mittal, S., and Zaman, M. (2020). Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE. In Advances in computing and intelligent systems (pp. 195-211). Springer, Singapore.
Devi, R. D. H., Bai, A., and Nagarajan, N. (2020). A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms. Obesity Medicine, 17, 100152.
Ettaouil, M., Ghanou, Y., ElMoutaouakil, K., et al. Image medical compression by a new architecture optimization model for the Kohonen networks. International Journal of Computer Theory and Engineering, 2011, vol. 3, no 2, p. 204.
Aharrane, N., El Moutaouakil, K., & Satori, K. (2015). Recognition of handwritten Amazigh charactersbased on zoning methods and MLP. WSEAS transactions on Computers, 14(19), 178-185.
Bounabi, M., Moutaouakil, K. E., & Satori, K. (2019). A comparison of text classification methodsusingdifferentstemming techniques. International Journal of Computer Applications in Technology, 60(4), 298-306.
https://www.kaggle.com/uciml/pima-indians-diabetesdatabase
https://www.kaggle.com/johndasilva/diabetes
Thirunavukkarasu, U., and Umapathy, S. (2020). Classification of Prediabetes and Healthy Subjects in Plantar Infrared Thermal Imaging Using Various Machine Learning Algorithms. In Micro-Electronics and Telecommunication Engineering (pp. 85-96). Springer, Singapore.
Choi, S. B., Kim, W. J., Yoo, T. K., Park, J. S., Chung, J. W., Lee, Y. H., ... and Kim, D. W. (2014). Screening for prediabetes using machine learning models. Computational and mathematical methods in medicine, 2014.
Chen L, Magliano DJ, Zimmet PZ (2011) The worldwide epidemiology of type 2 diabetes mellitus—present and future perspectives. Nat Rev Endocrinol 8:228–236
Bounabi, M., Moutaouakil, K. E., \& Satori, K. (2020, December). The Automatic option of inference rules for the fuzzy TF-IDF. In 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS) (pp. 1-6). IEEE.
El Moutaouakil, K., \& Touhafi, A. (2020, November). A New Recurrent Neural Network Fuzzy Mean Square Clustering Method. In 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) (pp. 1-5). IEEE.
EL MOUTAOUAKIL KARIM., EL OUISSARI ABDELLATIF., Touhafi, A., \& AHARRANE, N. (2020, November). An Improved Density Based Support Vector Machine (DBSVM). In 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) (pp. 1-7). IEEE.
Kumari, V. Anuja, and R. Chitra. "Classification of diabetes disease using support vector machine." International Journal of Engineering Research and Applications 3.2 (2013): 1797-1801.
Hassan, M. M., and Amiri, N. Classification of Imbalanced Data of Diabetes Disease Using Machine Learning Algorithms. International Conference on Theoretical and Applied Computer Science and Engineering (ICTACSE, 2019) (2019), 21(81), 33-24.
Aharrane, Nabil, Karim El Moutaouakil, and Khalid Satori. "A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR." In 2015 Intelligent Systems and Computer Vision (ISCV), pp. 1-8. IEEE, 2015.
N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16 (2002) 321–357, ISSN 10769757, doi: 10.1613/ jair.953 .
H. Han, W.-Y. Wang, B.-H. Mao, Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Advances in intelligent computing 17 (12) (2005) 878–887, ISSN 1941-0506, doi: 10.1007/11538059 91 .
D. A. Cieslak, N. V. Chawla, A. Striegel, Combating imbalance in network intrusion datasets, in: IEEE International Conference on Granular Computing, 2006, IEEE, ISBN 1-4244-0134-8, 732–737, doi: 10.1109/GRC.2006.1635905 , 2006.
] I. Nekooeimehr, S. K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Systems with Applications 46 (2016) 405–416, ISSN 09574174, doi: 10.1016/j.eswa.2015.10.031 .
W.-C. Lin, C.-F. Tsai, Y.-H. Hu, J.-S. Jhang, Clustering-based undersampling in class-imbalanced data,
Information Sciences 409-410 (2017) 17–26, ISSN 0020-0255, doi: 10.1016/j.ins.2017.05.008 .
Khanam, J. J., \& Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express.
Tigga, Neha Prerna, and Shruti Garg. "Prediction of type 2 diabetes using machine learning classification methods." Procedia Computer Science 167 (2020): 706-716.
Shuja, M., Mittal, S., and Zaman, M. (2020). Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE. In Advances in computing and intelligent systems (pp. 195-211). Springer, Singapore.
Devi, R. D. H., Bai, A., and Nagarajan, N. (2020). A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms. Obesity Medicine, 17, 100152.
Ettaouil, M., Ghanou, Y., ElMoutaouakil, K., et al. Image medical compression by a new architecture optimization model for the Kohonen networks. International Journal of Computer Theory and Engineering, 2011, vol. 3, no 2, p. 204.
Aharrane, N., El Moutaouakil, K., & Satori, K. (2015). Recognition of handwritten Amazigh charactersbased on zoning methods and MLP. WSEAS transactions on Computers, 14(19), 178-185.
Bounabi, M., Moutaouakil, K. E., & Satori, K. (2019). A comparison of text classification methodsusingdifferentstemming techniques. International Journal of Computer Applications in Technology, 60(4), 298-306.
https://www.kaggle.com/uciml/pima-indians-diabetesdatabase
https://www.kaggle.com/johndasilva/diabetes
Published
2024-12-27
How to Cite
Driouich, A., EL OUISSARI, A., EL MOUTAOUAKIL, K., & Akharraz, I. (2024). A Metaheuristic for Fuzzy Density Based SVM and Confidence SMOTE for Early Prediction of Diabetes. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-1348
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).