AUTHORS
Eun Kyung Park, SooYoung Kwak, Weonsuk Lee, Joon Suk Choi, Thijs Kooi, Eun-Kyung Kim
From Lunit Inc, 374 Gangnam-daero, Gangnam-gu, Seoul 06241, Republic of Korea (E.K.P., S.Y.K., W.L., J.S.C., T.K.); and Department of Radiology, Yongin Severance Hospital, College of Medicine, Yonsei University, Yongin, Republic of Korea (E.K.K.).
PUBLISHED
Abstract
“Just Accepted” papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.
Purpose
To develop an artificial intelligence (AI) for diagnosis of breast cancer in digital breast tomosynthesis (DBT) and investigate whether it could improve diagnostic accuracy and reduce reading time of radiologists.
Materials and Methods
A deep learning AI algorithm was developed and validated for DBT with retrospectively collected examinations (January 2010 to December 2021) from 14 institutions in the United States and South Korea. A multicenter, reader study was performed to compare the performance of 15 radiologists (7 breast specialists, 8 general radiologists) in interpreting DBT examinations from 258 women (mean, 56 years ± 13.41 [SD]), including 65 cancer cases, with and without the use of AI. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and reading time were evaluated.
Results
The AUC for standalone AI performance was 0.93 (95% CI: 0.92,0.94). With AI, radiologists’ AUC improved from 0.90 (0.86, 0.93) to 0.92 (0.88, 0.96; P = .003) in the reader study. AI showed higher specificity (89.64% (85.34, 93.94)) than radiologists (77.34% (75.82, 78.87; P < .001)). When reading with AI, radiologists’ sensitivity increased from 85.44% (83.22, 87.65) to 87.69% (85.63, 89.75; P = .04), with no evidence of a difference in specificity. Reading time decreased from 54.41 seconds (52.56, 56.27) without AI to 48.52 seconds (46.79, 50.25) with AI (P < .001). Interreader agreement measured by Fleiss kappa increased from 0.59 to 0.62, respectively.
Conclusion
The AI model showed better diagnostic accuracy than radiologists in breast cancer detection and reduced reading times. The concurrent use of AI in DBT interpretation could improve both accuracy and efficiency.