Ensemble Learning and K-means Models for Lung and Colon Cancer Classification
Keywords:
Histopathological Image, Lung and Colon Cancer, Deep Learning, Ensemble Learning, K-means Clustering, Gamma Correction
Abstract
According to World Health Organization (WHO) statistics, cancer remains one of the leading causes of death worldwide. The highest number of cancer-related deaths is caused by lung cancer, with approximately 1.8 million fatalities (18.7%), followed by colorectal cancer, responsible for around 900,000 deaths (9.3%). These death rates are increasing in developing countries, where human, material, and technological resources for early detection are sometimes limited. In Morocco, for example, according to WHO statistics, the mortality rate for lung cancer stands at 21.6%. Diagnosis of histopathological images is one of the most effective ways of confirming or denying the existence of this type of cancer. Traditionally, this analysis is done manually by pathologists, which makes this process time-consuming and the outcome largely depends on the expertise of the pathologist. Automating this process can enable early detection and considerably increase the chances of cure. Thanks to the remarkable results achieved using machine learning techniques, many research projects have attempted to capitalize on these advances and apply them to automate and improve cancer detection accuracy. Despite advancements in deep learning-based classification, achieving consistently high accuracy remains a challenge. In this paper, we propose a new approach that uses the K-means model and gamma correction function to preprocess histopathological images from the LC25000 dataset, and transfer learning and ensemble learning to enhance the classification performance. We have combined two models based on VGG16 and DenseNet pre-trained models. This approach enabled us to achieve an accuracy of 99.96%, which illustrates the importance of combining unsupervised models, transfer learning and ensemble learning to improve the accuracy of histopathological images classification.
Published
2025-08-24
How to Cite
Oubaalla, A., El Moubtahij, H., & EL AKKAD, N. (2025). Ensemble Learning and K-means Models for Lung and Colon Cancer Classification. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2833
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).