Lung Cancer Segmentation and Classification with Multi-Dataset Integration

Hozan Abdulqader; Adnan  Abdulazeez

doi:10.19139/soic-2310-5070-2814

Lung Cancer Segmentation and Classification with Multi-Dataset Integration

Hozan Abdulqader Duhok Polytechnic University, Duhok, Kurdistan Region, Iraq, Akre University for Applied Science- Technical College of Informatics- Akre- Department of Information Technology
Adnan Abdulazeez Duhok Polytechnic University- Kurdistan Region - Iraq https://orcid.org/0000-0002-4357-7331

DOI: https://doi.org/10.19139/soic-2310-5070-2814

Keywords: Lung Cancer, Segmentation, Classification, Deep Learning, Multi Dataset Integration

Abstract

Accurate computer-aided lung cancer diagnosis is based on two sequential tasks: precise nodule segmentation and reliable malignancy classification. To this end, we curated the largest open-source CT benchmark to date by unifying five public repositories, resulting in 7,061 annotated slices from 571 patients for segmentation and 17,351 slices from 1,208 patients for classification. A standardized pre-processing pipeline was developed to harmonize voxel spacing, intensity windows, and label conventions.For segmentation, six encoder–decoder architectures were evaluated, with the hybrid UNet++ achieving the highest validation performance (Dice coefficient = 98.5%), demonstrating that attention-augmented dense skip pathways enable more accurate boundary detection of lung nodules.These masks were then used to drive a two-phase classification strategy: models were initially trained using ground-truth masks, followed by fine-tuning on predicted masks to emulate real-world deployment scenarios. Our proposed NoduleHyperFusionNet a dual-stream EfficientNetV2-S architecture , achieved the best overall discrimination (Accuracy = 92%, F1-score = 89%, AUC = 91%). The EfficientNet-B3 model also performed strongly, reaching an AUC of 94%. Overall, this study demonstrates that the combination of attention-enhanced segmentation and lightweight multichannel fusion architectures can significantly improve automated lung cancer workflows, reducing diagnostic error rates without incurring prohibitive computational costs.

Published

2025-09-28

How to Cite

Abdulqader, H., & Abdulazeez , A. (2025). Lung Cancer Segmentation and Classification with Multi-Dataset Integration. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2814

Download Citation

Issue

Online First

Section

Research Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).