欢迎您访问《中华养生保健》官方网站!

中华养生保健 ›› 2024, Vol. 42 ›› Issue (19): 173-177.

• 经验交流 • 上一篇    下一篇

人工智能肺结节筛查系统中不同分类模型对其临床使用效能的影响

梁昱, 李俊林*   

  1. 内蒙古自治区人民医院影像医学科,内蒙古 呼和浩特, 010017
  • 出版日期:2024-10-01 发布日期:2024-09-25
  • 通讯作者: *李俊林,E-mail:grefor@163.com。
  • 作者简介:梁昱(1988—),男,汉族,籍贯:内蒙古自治区呼和浩特市,硕士研究生,主治医师,研究方向:CT及MR的诊断与研究。

Effects of Different Classification Models on the Clinical Effectiveness of Artificial Intelligence Lung Nodule Screening Systems

LIANG Yu, LI Jun-Lin*   

  1. Department of Imaging Medicine , Inner Mongolia Autonomous Region People's Hospital, Hohhot Inner Mongolia 010017, China
  • Online:2024-10-01 Published:2024-09-25

摘要: 目的 探索人工智能肺结节筛查系统中不同分类模型对其效能表现的影响,进而筛选出适合临床使用的分类模型。方法 选择2018年12月—2019年4月在内蒙古自治区人民医院行胸部CT平扫的117例患者作为研究对象。首先,由两位超过15年胸部CT阅片经验的专家制订本研究的金标准,共标注563个肺结节。利用AI肺结节筛查系统进行检测,分别记录配置有不同分类模型的AI系统检测到的肺结节数。模型1是在深度神经网络基础上自主研发的算法,模型2是在模型1基础上优化后减少假阳性结节的算法,模型3是模型2基础上优化增强磨玻璃结节敏感性的算法。通过与金标准比较,分析不同分类模型的AI检测到的真阳性结节(TP)、假阳性结节(FP)和假阴性结节(FN)的数量,计算比较相应的敏感性、FP / TP值、假阳性率(假阳性结节数/CT)、精确率(Precision)、召回率(Recall)、调和平均值(F1值),探索最适合临床工作的AI肺结节筛查系统。卡方检验用于比率指标的统计分析,P<0.05表示差异有统计学意义。结果 AI模型1检测到1 490个结节,包含505个TP和985个FP,FN为58个,检测敏感度为89.7%,假阳性率为8.42FP/CT,FP/TP值为1.95,精确率为33.89%,召回率和F1值分别为89.70%和49.20%。相比之下,AI模型2检测到的总结节显著减少,降至1 285个,其中TP为500个,FP为785个,FN为63个,检测敏感度为88.81%,假阳性率为6.71FP/CT、FP/TP值为1.57,精确率为40.00%,召回率和F1值分别为88.81%和55.16%。同时,利用AI模型3进行肺结节检测时,1 240个结节被检测到,其中包含493个TP和747个FP,70个FN,检测敏感度为87.57%,假阳性率为6.38FP/CT,FP/TP值为1.52,精确率为39.75%,召回率和F1值分别为87.57%和55.68%。可见3个AI模型在肺结节检测方面均具备较高的敏感性,但模型2和模型3在降低假阳性率和提高精确率方面表现更优。模型2在减少假阴性(漏诊)方面略优于模型3,同时保持了较低的假阳性率。结论 比较分析多项模型检测指标发现,模型2的综合表现最佳,是应用于临床进行肺结节筛查的最佳选择。

关键词: 体层摄影术, 人工智能, 肺结节, 深度学习

Abstract: Objective To explore the impact of different classification models on the performance of artificial intelligence (AI) lung nodule screening system, and to select the most suitable classification model for clinical use. Methods A retrospective study was conducted on 117 cases of chest CT plain scan. First, two experts with over 15 years of experience in reading chest CT images developed the gold standard for this study, and a total of 563 lung nodules were annotated. AI lung nodule screening system was used to detect the lung nodules, and the number of lung nodules detected by AI system configured with different classification models was recorded. Model 1 was an independently developed algorithm based on deep neural network, Model 2 was an optimized algorithm based on Model 1 to reduce false positive nodules, and Model 3 was an optimized algorithm based on Model 2 to enhance the sensitivity of ground-glass nodules. By comparing with the gold standard, the number of true positive nodules (TP), false positive nodules (FP) and false negative nodules (FN) detected by AI with different classification models were analyzed, and the corresponding sensitivity, FP/TP value, false positive rate (number of false positive nodules/CT), precision, recall, and F1 score were calculated and compared to explore the most suitable AI lung nodule screening system for clinical work. Chi-square test was used for statistical analysis of ratio indicators, and P<0.05 was considered statistically significant. Results AI Model 1 detected 1 490 nodules, including 505 TP and 985 FP, with 58 FN, and the detection sensitivity was 89.7%. The false positive rate was 8.42 FP/CT, the FP/TP value was 1.95, the precision was 33.89%, and the recall and F1 score were 89.70% and 49.20%, respectively. In contrast, the total number of nodules detected by AI Model 2 decreased significantly to 1 285, with 500 TP, 785 FP, and 63 FN. The detection sensitivity was 88.81%, the false positive rate was 6.71 FP/CT, the FP/TP value was 1.57, the precision was 40.00%, and the Recall and F1 score were 88.81% and 55.16%, respectively. When using AI Model 3 for lung nodule detection, 1 240 nodules were detected, including 493 TP and 747 FP, with 70 FN. The detection sensitivity was 87.57%, the false positive rate was 6.38 FP/CT, the FP/TP value was 1.52, the precision was 39.75%, and the recall and F1 score were 87.57% and 55.68%, respectively. It can be seen that all three AI models had high sensitivity in lung nodule detection, but Model 2 and Model 3 performed better in reducing false positive rate and improving precision. Model 2 was slightly superior to Model 3 in reducing false negatives (missed diagnosis) while maintaining a low false positive rate. Conclusion By comparing and analyzing multiple model detection indicators, it was found that Model 2 had the best overall performance and was the best choice for clinical lung nodule screening.

Key words: tomography, artificial intelligence, pulmonary nodule, deep learning

中图分类号: