Abstract
Non-small cell lung cancer (NSCLC) constitutes about 85% of all lung cancers and is a leading cause of cancer-related deaths globally. Within the spectrum of lung cancer, Solitary Pulmonary Nodules (SPNs) have become a focal point of research due to their significant implications for mortality. Estimating the malignancy of SPNs is usually performed by medicine experts considering multiple screening methods (Computerised Tomography and Positron Emission Tomography). Machine Learning may simplify this time-consuming procedure and highlight potential human errors. The study presents an efficient methodology for the classification of Solitary Pulmonary Nodule malignancy, addressing critical limitations in existing SPN classification approaches by emphasizing on the synergistic use of PET and CT image features, clinical data, and validation against biopsy-confirmed data. Patient data recorded from a hybrid PET/CT scanner at the University Hospital of Patras, Greece were examined. Human readers annotated 360 SPNs, which were used to train and internally validate the proposed model. SPNs (96) with confirmed histopathological results were used as an external test set. The classification methodology relied on an XGBoost model, which uses manually-extracted SPN features from both imaging modalities. Feature selection was performed to reduce the dimensions of the data and identify the most important predictors. The proposed method exhibited an agreement of 97% with the human readers on the training and validation set. On the external set, the accuracy was 86% (81% sensitivity, 100% specificity). The SUVmax predictor exhibited lower scores (92% agreement on the training and validation set, 85% accuracy on the external set). The model was superior to the advised SUVmax threshold of 2.5 (85% accuracy, 80% sensitivity, 100% specificity).