ZHONGHUA YANGSHENG BAOJIAN ›› 2024, Vol. 42 ›› Issue (19): 173-177.

Previous Articles     Next Articles

Effects of Different Classification Models on the Clinical Effectiveness of Artificial Intelligence Lung Nodule Screening Systems

LIANG Yu, LI Jun-Lin*   

  1. Department of Imaging Medicine , Inner Mongolia Autonomous Region People's Hospital, Hohhot Inner Mongolia 010017, China
  • Online:2024-10-01 Published:2024-09-25

Abstract: Objective To explore the impact of different classification models on the performance of artificial intelligence (AI) lung nodule screening system, and to select the most suitable classification model for clinical use. Methods A retrospective study was conducted on 117 cases of chest CT plain scan. First, two experts with over 15 years of experience in reading chest CT images developed the gold standard for this study, and a total of 563 lung nodules were annotated. AI lung nodule screening system was used to detect the lung nodules, and the number of lung nodules detected by AI system configured with different classification models was recorded. Model 1 was an independently developed algorithm based on deep neural network, Model 2 was an optimized algorithm based on Model 1 to reduce false positive nodules, and Model 3 was an optimized algorithm based on Model 2 to enhance the sensitivity of ground-glass nodules. By comparing with the gold standard, the number of true positive nodules (TP), false positive nodules (FP) and false negative nodules (FN) detected by AI with different classification models were analyzed, and the corresponding sensitivity, FP/TP value, false positive rate (number of false positive nodules/CT), precision, recall, and F1 score were calculated and compared to explore the most suitable AI lung nodule screening system for clinical work. Chi-square test was used for statistical analysis of ratio indicators, and P<0.05 was considered statistically significant. Results AI Model 1 detected 1 490 nodules, including 505 TP and 985 FP, with 58 FN, and the detection sensitivity was 89.7%. The false positive rate was 8.42 FP/CT, the FP/TP value was 1.95, the precision was 33.89%, and the recall and F1 score were 89.70% and 49.20%, respectively. In contrast, the total number of nodules detected by AI Model 2 decreased significantly to 1 285, with 500 TP, 785 FP, and 63 FN. The detection sensitivity was 88.81%, the false positive rate was 6.71 FP/CT, the FP/TP value was 1.57, the precision was 40.00%, and the Recall and F1 score were 88.81% and 55.16%, respectively. When using AI Model 3 for lung nodule detection, 1 240 nodules were detected, including 493 TP and 747 FP, with 70 FN. The detection sensitivity was 87.57%, the false positive rate was 6.38 FP/CT, the FP/TP value was 1.52, the precision was 39.75%, and the recall and F1 score were 87.57% and 55.68%, respectively. It can be seen that all three AI models had high sensitivity in lung nodule detection, but Model 2 and Model 3 performed better in reducing false positive rate and improving precision. Model 2 was slightly superior to Model 3 in reducing false negatives (missed diagnosis) while maintaining a low false positive rate. Conclusion By comparing and analyzing multiple model detection indicators, it was found that Model 2 had the best overall performance and was the best choice for clinical lung nodule screening.

Key words: tomography, artificial intelligence, pulmonary nodule, deep learning

CLC Number: