Women with dense breasts face a double risk for breast cancer; they are at a higher risk for development of breast cancer than those with less dense breasts, and there is a greater chance that mammography will miss detection of a cancer in dense breasts due to the masking effect of surrounding fibroglandular tissue. These women may be candidates for supplemental screening. In this study, a masking risk model that was previously developed is tested on a cohort of cancer-free women to assess potential efficiency of stratification.
Three masking risk models based on (1) BI-RADS density, (2) volumetric breast density (VBD), and (3) a combination of VBD and detectability were applied to stratify the mammograms of 1897 cancer-free women. The fraction of cancer-free women whose mammograms were deemed by the algorithm to be masked and who would be considered for supplemental imaging was computed as was the corresponding fraction in a screened population of interval (masked) cancers that would be potentially detected by supplemental imaging.
Of the models tested, the combined VBD/detectability model offered the highest efficiency for stratification to supplemental imaging. It predicted that 725 supplemental screens would be performed per interval cancer potentially detected, at an operating point that allowed detection of 64% of the interval cancers. In comparison, stratification based on the upper two BI-RADS density categories required 1117 supplemental screenings per interval cancer detected to capture 64% of interval cancers.
The combined VBD/detectability models perform better than BI-RADS and offer a continuum of operating points, suggesting that this model may be effective in guiding a stratified screening environment.
High mammographic density is associated with increased risk of developing breast cancer [1,2,3,4,5] and also reduces the diagnostic accuracy of mammography due to masking [6,7,8]. Compared to women with fatty breasts, women with dense breasts are at least 3.5 times more likely to have an interval cancer, one diagnosed less than 1 year after a negative screening mammogram . Thirty-six states in the USA have now enacted Density Notification Laws, most requiring that women be informed when they have dense breasts. Many include language stating that high breast density may affect the diagnostic value of their mammogram and that supplemental screening exams could be considered. A method of identifying those women for whom mammography will not provide adequate detection of breast cancer could provide a means for guiding those women toward an alternative or supplementary method that would yield better performance.
A stratification method must be both effective (identifying women where the risk of missed detection of breast cancer is high due to masking) and efficient in referring as few women as possible who do not have breast cancer for supplementary screening. Breast density is typically assessed using the Breast Imaging Reporting and Data System (BI-RADS) scale [10, 11], in which mammograms are subjectively categorized in four groups by a radiologist. Women in the two upper categories, i.e., with heterogeneously dense or extremely dense mammograms, are considered to have dense breasts, which corresponds to approximately 40% and 8% of mammograms, respectively . Thus, approximately half of screened women are potentially eligible for supplemental screening. Kerlikowske et al.  have argued that due to this high prevalence, rather than stratifying on BI-RADS density category alone, efforts should be focused to women at most risk for an interval cancer.
There are two types of risk associated with elevated density, the underlying breast cancer risk and the risk of masking. Here, we attempt to isolate the masking risk and create a model that can be used to identify women for whom mammography screening will likely be compromised due to masking. The two types of risk can then be evaluated and used separately in the optimization of strategies for screening.
We have reported previously on the development of a “masking risk” model , derived from biometric and image-based parameters that can discriminate between the mammograms associated with screen-detected (SD) cancers and those associated with non-screen-detected (NSD) cancers, i.e., those which were found by other means less than 13 months after a negative screening examination. Most women screened do not have breast cancer, and the feasibility of stratified screening would depend on identifying as few of these women as possible for supplemental screening. In this investigation, the model is applied to mammograms from cancer-free women to evaluate the effect of decision thresholds on the efficiency of stratifying women at greatest risk for interval cancer to supplemental screening.
The mammograms used in this study came from an earlier study where breast density was incorporated in a risk stratification model for breast cancer . This study had institutional review board approval. All patients underwent informed consent for participation in the original study. The need for additional consent for this study was waived, as only existing de-identified data were used in this analysis. Both studies were compliant with the Health Insurance Portability and Accountability Act.
In the original study , all women diagnosed with cancer at a single US institution between 2003 and 2013 and with a digital contralateral mammogram at the time of diagnosis were eligible as cases. These were matched to cancer-free controls, defined here as having had two consecutive negative screening mammograms (i.e., both the index mammogram and the next screening mammogram were negative). Women in the study were participant in an annual screening program. Women who agreed to participate were asked to complete a questionnaire which included age and body mass index (BMI) information. BI-RADS density category (fourth edition) was obtained from the mammography report. Volumetric breast measures including total breast volume and percent breast density by volume (VBD) were measured using automated commercial software (Volpara 1.5.0, Volpara Solutions, Wellington, NZ).
The calculations were performed on the index unprocessed, i.e., “DICOM For Processing” mammograms, which included at least one standard view (craniocaudal or mediolateral oblique). Tiled views of large breasts were excluded. Arbitrarily, the measurements were performed on the left breast by default, except when the left views were not available. When multiple views were obtained for one projection, the values were averaged. Initially, there were images from 2047 cancer-free women: 4 were excluded because of tiled views; 31 and 11 were excluded due to failure of the Volpara calculation or our detectability algorithm (described below), respectively; 104 were excluded due to missing BI-RADS data, resulting in the images of 1897 women available for analysis.
The development of the masking risk algorithm was previously described in detail , but is summarized here. (1) In a case-case analysis of 70 SD cases and 44 NSD cases, the NSD or SD status was used as a surrogate indicator that masking has occurred or not; (2) BI-RADS density category and Volpara VBD and breast volume were estimated on the mammograms; (3) “Maps” were calculated for each mammogram, showing the spatial distribution of VBD (in this case, calculated using an in-house algorithm) and local detectability as described below; (4) Statistical and texture metrics were derived from the VBD and detectability maps; (5) Stepwise multivariate logistic regressions were performed to determine which of the metrics yielded the best classification performance between NSD and SD cases. Cases in the categories were not matched for age; however, age was a covariate in the multivariate modeling. Each regression produced a predictor of masking risk: a variable-threshold classifier that rates the likelihood of a mammography exam as being masked (i.e., being an NSD case) or non-masked (an SD case).
In the modeling , the assumption was made that the NSD cancer cases correspond to interval cancers in their broad definition, which includes masked (i.e., missed or false negative) cancers and “new” cancers that are found between regular annual screens. To allow for the variability of the actual interval at which women present for their screening examinations, the window for inclusion of interval cancers was extended to 13 months. Because these cancers were found by other means less than 13 months after a negative mammogram and the mean sojourn time of breast cancer is 2.0 years (95th percentile = 150 days) [15, 16], it is estimated that nearly all of these are in the false negative category.
To predict how many women without breast cancer would be recommended for supplementary screening with each of the masking prediction models, it is necessary to run the algorithms on sets of normal cases. The three models: “BI-RADS Density,” “Adjusted Volpara,” and “DETECT+,” under consideration for use for stratification are summarized in Table 1. The ultimate covariates selected for each model are listed in the first column. DETECT+ is an in-house algorithm that was found in  to be the best performing model describing masking probability. Note that all of the models incorporate some measure of breast density. DETECT+ employs a specially designed volumetric density algorithm described by Mainprize et al. .
Our approach to stratification is to try to maximize the number of women for whom there is the potential for cancer masking who will be identified for supplemental screening while minimizing the number whose cancers (if present) are expected to be detectable on mammography. These two competing factors are computed to determine the efficiency of a simulated stratified screening environment.
The masking risk is computed on each of the images using the models described above. Each model is tested over a range of candidate thresholds of masking risk. The threshold (or operating point) distinguishes between images that are deemed to be “masked” or “non-masked.” Women identified as “masked” are the ones who would be considered as potential candidates for supplemental imaging. For those women, two fractions are determined: (1) The recruitment fraction (RF), which is the fraction of cancer-free women whose mammograms would be rated above the masking threshold, and (2) the capture fraction (CF), the fraction of women from NSD cases whose mammograms are above the same masking threshold. We examine the relationship between RF and CF at different settings of the threshold. In this analysis, the cancer-free women are used as a proxy for a screening population.
Given an underlying NSD cancer rate (assumed to be equivalent to the interval cancer rate or ICR) in a population of N individuals, the maximum number of NSD cancers for potential detection by supplemental imaging is given by N × ICR × CF. The corresponding number of women considered for supplemental imaging is N × RF. The number of women considered for supplemental imaging per interval cancer potentially detected is thus RF/(ICR × CF). This ratio represents the “cost” of the supplemental screening program, in that as more women receive the additional screen (with the benefit of detecting missed cancers), more inconvenience to women and financial costs are imposed on the health care system. The most efficient (or lowest cost) supplemental screening program will have the lowest RF:CF ratio. See the Appendix for a detailed description of the calculation of masking risk thresholds, CF and RF for the models discussed here. To evaluate performance, the C-statistic or discrimination accuracy of the masking risk models was computed. This is the probability that a model will score a randomly selected NSD exam at a higher masking risk than a randomly selected cancer-free exam.
For simulating a screening program, an ICR of 0.60 per 1000 screens was estimated using data from the Breast Cancer Surveillance Consortium (BCSC) as reported by Kerlikowske et al. . Here, N was set to 100,000. Error estimates on the values of the cost function were estimated by bootstrapping, for 1000 bootstrap replicas, using the “bootci” function in Matlab 2016b (Mathworks Inc. Natick, MA).
Table 2 shows descriptive statistics (age, BMI, BI-RADS density category, and mammography vendor) for the NSD cases and cancer-free women. Women with interval cancers were generally younger, with lower BMI, higher BI-RADS density, and imaged on GE systems compared to the cancer-free women. The p value of the difference between the two groups was computed using a two-sample t test or chi-square test for the continuous or categorical data, respectively.
Table 3 shows CF vs. RF for the three stratification models.
Table 4 shows the corresponding number of interval cancers potentially detected vs. the cost function, expressed as the number of women recommended for supplemental screening per interval cancer potentially detected, assuming a prevalence of 60 interval cancers in 100,000 women screened . This shows that the DETECT+ model generally requires the fewest supplemental exams per interval cancer detected. For example, with a threshold set to identify 38 interval cancers (CF = 64%), supplemental screening would be performed for RF = 43%, 40%, and 28% of screening participants when supplemental screening is triggered on the basis of BI-RADS 3 or 4 density category, Adjusted Volpara, or DETECT+, respectively. This corresponds to 1117, 1051, and 725 supplemental screens per interval cancer detected. Figure 1 graphically shows the cost function versus the number of interval cancers potentially found for the three stratification models. Stratification data for BI-RADS density extrapolated from Kerlikowske et al.  are also shown for comparison. The C-statistic and 95% confidence interval for the BI-RADS, Adjusted Volpara, and DETECT+ models were 0.63 [0.56–0.69], 0.66 [0.56–0.73], and 0.72 [0.65–0.79] respectively.
Figure 2 shows the contralateral mammograms of women with interval cancers with mammograms rated as BI-RADS 3 (top) and BI-RADS 4 (bottom). In each row, the calculated masking risk increases from left to right. If set as thresholds, they would correspond to CF values of approximately 50%, 30%, and 20% respectively for the DETECT+ model.
Table 5 shows data from this work and from two studies by Kerlikowske et al. [12, 18]. It is seen that the three populations have similar distributions in the high BI-RADS density categories for all women or cancer-free women, while Kerlikowske et al.  shows a higher proportion of interval cancer cases in the BI-RADS 4 category than the other two studies.
The DETECT+ model offers the lowest cost over a wide range of capture fraction (CF = 14 to 98%) of interval cancers. The cost for the Adjusted Volpara model is lower only at the relatively low capture fraction of 10% (i.e., 6 of 60 interval cancers potentially detected). For example, the DETECT+ model requires 12,588 supplemental screens (RF = 12.6%) to potentially detect 24 interval cancers (CF = 40%), representing a costs of 523 supplemental screens per interval cancer. When inverted, this costs represent 1.9 interval cancers potentially detected per 1000 supplemental screens. The Adjusted Volpara model requires, at the same operating point, 39% additional supplemental screens to potentially detect the same amount of interval cancers in 100,000 women. The discriminatory accuracy (C-statistic) between interval cancer cases and cancer-free women of the DETECT+ model was the largest at 0.72 [0.65–0.79]. Based on confidence intervals, the DETECT+ model was statistically better than the BI-RADS model, and just short of a statistical difference to the Adjusted Volpara model.
The BI-RADS-based model was considered, primarily because this system is widely used by breast radiologists; however, its major limitations are that it has higher costs and has only three operating points. For example, if the threshold for suggesting supplemental screening was for women with extremely dense breasts (BI-RADS 4) only, 9331 supplemental screens (RF = 9.3%) would be required, with 8 of the interval cancers potentially detected (CF = 13.6%). This corresponds to a cost of 1140 supplemental screens per interval cancer detected. Interestingly, at the same CF, the DETECT+ model would label only approximately 18% of the BI-RADS 4 examinations as masked, yet has a cost of only 374 supplemental screens per interval cancer (RF = 3.1%).
This implies that high masking does not necessarily occur in all BI-RADS 4 women, but at the same time high masking can occur in images with lower BI-RADS scores. Although there is a correlation between BI-RADS density and masking, there are additional subtleties that can disrupt this correlation and are revealed through the DETECT+ metric where not only the area of the breast occupied by dense tissue, but also the intensity and texture of tissue attenuation are considered. It is also worth emphasizing that in addition, there can be considerable intra- and inter-observer variability in assigning BI-RADS density categories and that the results may shift systematically if the different definitions of the BI-RADS fifth edition were used.
While the sample population used in this analysis is small, which leads to large uncertainties in the calculated rates and cost functions, it is proportioned similarly in BI-RADS density compared to larger studies [12, 18]. It is noted that small differences in those proportions can have a large impact in the cost function, due to the small-valued interval cancer rate that appears in the denominator.
Holland et al.  have performed a similar analysis using a masking model (DTMM) that is based on Volpara VBD as well as lesion size and location. The performance of the DETECT+ metric in a supplemental screening environment is similar but marginally better than their results. They also present results using Volpara VBD alone, which outperforms the Adjusted Volpara model shown in this analysis. This finding may be due to the fact that the models of Holland et al.  were created from direct comparisons between interval cancer cases and cancer-free controls, and thus may show a compound effect of masking risk and the underlying breast cancer risk, whereas our models were optimized to discriminate between interval and screen-detected cancer cases.
Kerlikowske et al.  have also performed a similar analysis, computing the discriminatory accuracy (C-statistic) between interval cancer cases and normal cancer-free controls of BI-RADS and automated BI-RADS (using Volpara VBD), reporting respective accuracies of 0.72 and 0.70. The models were adjusted for multiple breast cancer risk factors. As in the work of Holland et al. , the models likely show a compound effect of both masking and breast cancer risks.
By combining BI-RADS density and BCSC 5-year risk, Kerlikowske et al.  report a cost of 694 supplemental screens per interval cancer potentially found. After rescaling for a total population of 100,000 women, this corresponds to CF = 27.9% and RF = 11.5%. At that CF value, the DETECT+ model is about 23% more efficient, with a cost of 532 supplemental screens per interval cancer (RF = 8.9%).
We believe that it is a strength of the design of the DETECT+ model that it predicts masking risk only, i.e., when mammography is likely to be diminished in accuracy. It may then be used in conjunction with separate established breast cancer risk models to guide breast cancer screening stratification. For example, women with low masking risk would benefit from mammography screening, at possibly different intervals depending on their underlying breast cancer risk. Conversely, women with high masking risk would benefit from more sensitive screening modalities, also at possibly different intervals depending on their breast cancer risk.
An imbalance between NSD and cancer-free groups according to mammography vendor was observed with proportionally more NSD cases imaged with GE systems. A subset analysis performed using the GE data only showed the C-statistic of the DETECT+ model to be unchanged at 0.72 [0.64–0.79]; however, there were a relatively small number of Hologic NSD cases in the set. In ongoing work with a broader data set, we will determine if a system-dependent covariate will improve the model. It is noted that the GE system was introduced to the clinic earlier when experience with digital mammography was quite limited and the majority of the NSD cases come from this time period.
This study has several limitations. In the dataset, there were a small number of cancers with extremely dense BI-RADS category, very high VBD or very low detectability, resulting in the fairly wide confidence intervals seen for low numbers of interval cancers detected. These women, who have the highest masking risk, would likely be offered supplementary screening according to any of the models used.
These models are currently being tested in a larger population for further validation and to predict the costs of supplemental screening. The simplifying assumption was also made that all interval cancers would be detected by supplemental screening tests. While supplemental screening will increase cancer detection , some cancers will also be missed by those tests, and some rapidly growing cancers will always evade detection by screening.
The masking risk estimator presented in this investigations shows good potential for guiding stratification of breast cancer screening. This estimator is more efficient than using BI-RADS density and provides a continuous scale, allowing for optimizing the balance between the number of women receiving supplemental screening versus the number of interval cancers potentially detected.
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Boyd NF, Guo H, Martin LJ, Sun L, Stone J, Fishell E, et al. Mammographic density and the risk and detection of breast cancer. N Engl J Med. 2007;356(3):227–36 Available from: http://www.ncbi.nlm.nih.gov/pubmed/17229950.
Harvey JA, Bovbjerg VE. Quantitative assessment of mammographic breast density: relationship with breast cancer risk. Radiology. 2004;230(1):29 [cited 2012 Jan 9]. Available from: http://radiology.rsna.org/content/230/1/29.short.
McCormack VA, dos Santos Silva I. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarkers Prev. 2006;15(6):1159–69 Available from: http://www.ncbi.nlm.nih.gov/pubmed/16775176.
Jeffers AM, Rothstein JH, Mcguire V, Whittemore AS, Rubin DL. Breast cancer risk and mammographic density assessed with semiautomated and fully automated methods. Radiology. 2017;282(2):348–355.
Winkel RR, Von Euler-chelpin M, Nielsen M, Petersen K, Lillholm M. Mammographic density and structural features can individually and jointly contribute to breast cancer risk assessment in mammography screening: a case – control study. BMC Cancer. 2016:1–12 Available from: https://doi.org/10.1186/s12885-016-2450-7.
Mandelson MT, Oestreicher N, Porter PL, White D, Finder CA, Taplin SH, et al. Breast density as a predictor of mammographic detection: comparison of interval- and screen-detected cancers. J Natl Cancer Inst. 2000;92(13):1081–7.
Rosenberg RD, Hunt WC, Williamson MR, Gilliland FD, Wiest PW, Kelsey CA, et al. Effects of age, breast density, ethnicity, and estrogen replacement therapy on screening mammographic sensitivity and cancer stage at diagnosis: review of 183,134 screening mammograms in Albuquerque, New Mexico. Radiology. 1998;209(2):511–8 [cited 2011 Aug 24]. Available from: http://www.ncbi.nlm.nih.gov/pubmed/9807581.
Carney PA, Miglioretti DL, Yankaskas BC, Kerlikowske K, Rosenberg R, Rutter CM, et al. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann Intern Med. 2003;138(3):168–75 Available from: http://www.ncbi.nlm.nih.gov/pubmed/12558355.
Gierach GL, Ichikawa L, Kerlikowske K, Brinton LA, Farhat GN, Vacek PM, et al. Relationship between mammographic density and breast cancer death in the breast cancer surveillance consortium. J Natl Cancer Inst. 2012;104(16):1218–27.
D’Orsi CJ, Bassett LW, Berg WA, Al E. BI-RADS: mammography. In: D’Orsi C, Mendelson E, Ikeda D, editors. Breast Imaging Reporting and Data System: ACR BI-RADS—breast imaging atlas. 4th ed. Reston: American College of Radiology; 2003.
Sickles, EA, D’Orsi CJ, Bassett LW, et al. ACR BI-RADS® Mammography. In: ACR BI-RADS® Atlas, Breast Imaging Reporting and Data System. Reston, VA, American College of Radiology; 2013.
Kerlikowske K, Zhu W, Tosteson ANA, Sprague BL, Tice JA, Lehman CD, et al. Identifying women with dense breasts at high risk for interval cancer. Ann Intern Med. 2015;162(10):673 Available from: http://annals.org/article.aspx?doi=10.7326/M14-1465.
Mainprize JG, Alonzo-Proulx O, Alshafeiy TI, Patrie JT, Harvey JA, Yaffe MJ. Prediction of cancer masking in screening mammography using density and textural features. Acad Radiol. 2018:1–12 Available from: https://doi.org/10.1016/j.acra.2018.06.011.
Brentnall AR, Cohn W, Knaus W, Yaffe MJ, Cuzick J, Harvey JA. A case-control study to add volumetric or clinical mammographic density into the Tyrer-Cuzick breast cancer risk model. J Breast Imaging. 2019; In Press.
Weedon-Fekjær H, Vatten LJ, Aalen OO, Lindqvist B, Tretli S. Estimating mean sojourn time and screening test sensitivity in breast cancer mammography screening: new results. J Med Screen. 2005;12(4):172–8.
Weedon-Fekjær H, Lindqvist BH, Vatten LJ, Aalen OO, Tretli S. Breast cancer tumor growth estimated through mammography screening data. Breast Cancer Res. 2008;10(3):1–13.
Mainprize JG, Alonzo-Proulx O, Jong RA, Yaffe MJ. Quantifying masking in clinical mammograms via local detectability of simulated lesions. Med Phys. 2016;43(3):1249–58 Available from: https://doi.org/10.1118/1.4941307.
Kerlikowske K, Scott CG, Mahmoudzadeh AP, Ma L, Winham S, Jensen MR, et al. Automated and clinical breast imaging reporting and data system density measures predict risk of screen-detected and interval cancers. Ann Intern Med. 2018; Available from: http://annals.org/article.aspx?doi=10.7326/M17-3008.
Holland K, van Gils CH, Mann RM, Karssemeijer N. Quantification of masking risk in screening mammography with volumetric breast density maps. Breast Cancer Res Treat. 2017;162(3):541–8.
Berg WA, Gutierrez L, NessAiver MS, Carter WB, Bhargavan M, Lewis RS, et al. Diagnostic accuracy of mammography, clinical examination, US, and MR imaging in preoperative assessment of breast cancer. Radiology. 2004;233(3):830–49 Available from: http://www.ncbi.nlm.nih.gov/pubmed/15486214.