零膨脹統計模型研究乳癌疾病進展

指導教授：陳秀熙臺灣大學：流行病學與預防醫學研究所曹慧嫺Tsau, Huei-ShianHuei-ShianTsau2014-11-272018-06-292014-11-272018-06-292014http://ntur.lib.ntu.edu.tw//handle/246246/262381背景預測乳癌風險的脈絡因子伴隨著以乳房攝影進行乳癌篩檢計畫之發展而演變。篩檢找到的乳癌朝向變小，淋巴結陰性、分化良好的傾向，因而在建立乳癌死亡預測模式時會遇到過多零觀察值(excess zeros)所造成的低度離散（under-dispersion）這類統計上棘手的問題，同時對大規模篩檢的實務而言，其會有過度偵測的問題。目的利用瑞典乳癌篩檢世代資料，本研究欲達下列目的： (1) 利用貝氏分析方法去評估在傳統乳癌特性（腫瘤大小、淋巴結侵襲、和組織分化）和組織分布特性(histological tumour distribution/focality)的作用同時存在下，乳房攝影表徵(mammographic appearance)和三陰性(triple negative)乳癌對乳癌死亡是具有單獨的作用或交互作用 (2) 發展一系列計數模型(count model)和零-膨脹模型(zero-inflated model），在同時利用組織分化和組織分布特性作為評估是否為零的機率下，評估傳統乳癌特性及三陰性對乳癌死亡的相對危險性。 (3) 考量組織是否會分化不良和組織分布特性是否會改變的情況下，發展零-膨脹(Zero-inflated)多階段馬可夫模型和零-閾值(Zero-hurdle)多階段馬可夫模型，以乳癌存活與否為觀察的結果，以解決評估乳癌死亡風險時過多零觀察值的問題；同時應用在篩檢的領域中以探討過度偵測的問題。材料與方法本研究之對象來自瑞典Dalarna郡的Falun Central醫院之乳癌病人資料。此世代源自於瑞典自1977到1986年開始W-E二郡的乳癌篩檢隨機試驗計畫中W郡的受檢婦女，此試驗結束後為服務性的乳癌篩檢計畫迄今，但直至1995年始蒐集IHC標記及組織分布特性的資料。本研究首先利用一回溯性之世代作為研究對象，共498名於1996-1998年間於瑞典Dalarna郡的Falun Central醫院診斷為乳癌之婦女。在考慮傳統預後因子（腫瘤大小、淋巴結侵襲、和組織分化）與組織分布特性後，評估乳房攝影表徵與三陰性對於乳癌存活之影響是否為互相獨立或者存在交互作用。本研究利用貝氏方法，其中腫瘤特性及乳房攝影表徵之事前機率資訊(prior information)來自1968-1995年間此地區乳癌婦女之乳癌個案資訊。研究世代之存活情形追蹤至2011年。零-膨脹模型中，利用傳統腫瘤特性及死亡作為計數方式，利用這些計數方式評估三陰性或其他免疫組織化學標記(IHC markers) 對於計數結果之影響。一系列的零-膨脹(Zero-inflated)多階段馬可夫模型和零-閾值(Zero-hurdle)多階段馬可夫模型去建構乳癌自然病史，並進而將組織分化程度及組織分布特性納入考量去建構其自然病史。結果在調整傳統腫瘤特性下，三陰性乳癌對非三陰性乳癌對乳癌死亡的風險比在資訊不足的之先驗機率(non-informative prior)下之涉險比(HR)為2.54倍( 95%信賴區間: 1.21-5.19) 和資訊充足的先驗機率(informatics prior)下之涉險比為1.95倍( 95%信賴區間: 1.06-3.52倍) 在計數部分( count part)調整傳統腫瘤特性，並在是否為零部分(zero part)以組織分布特性預測真正零的機率，以零-膨脹模型中評估三陰性乳癌較非三陰性乳癌的乳癌死亡之相對危險性具統計上的顯著差異，相對危險性為3.05倍 (95%信賴區間:1.69-5.52倍) ，而組織分布特性在是否為零部分亦具顯著差異(β=1.28, P=0.013)，表示單一病灶(unifocality)為零膨脹(zero-inflated)的機率為0.63。在考量敏感度下，利用零-膨脹(Zero-inflated)四階段馬可夫模型估計到之進展性的臨床症前期乳癌及非進展性之臨床症前期乳癌的發生率分別為千分之2.01 (95%信賴區間: 1.78-2.25) 和千分之0.032 (95%信賴區間: 0.013- 0.051)。由零-閾值四階段模型估計出來的零閾值比為14.19% (95%信賴區間: 5.83%-22.56%)。根據四階段馬可夫模型估計出的轉移機率所模擬出每年一次的乳房攝影篩檢，在首次篩檢中每篩1070人會有一名過度偵測的個案，而在後續篩檢中漸增，至第七回的篩檢則是每篩39,526人會有一名過度偵測的個案。此種每篩多少人會有一名過度偵測個案的數目(number of screen to detect one over-detected case; NSO)會因篩檢間隔延長而增大。考量組織分化程度下利用零-膨脹六階段馬可夫模型估計，約25%的乳癌是由癌化開始時即為組織分化不良之病灶，乳房攝影術對組織分化分級為1或2的敏感度為62% (95%信賴區間: 51%-73%)。零-閾值六階段馬可夫模型估計得零-閾值的比例為6.11% (95%信賴區間: 1.64%-22.82%)，而由此模型估得的NSO值則與四階段模型的相似。利用六階段零-膨脹馬可夫模型所估計到篩檢組相對於對照組，可降低組織分化分級3 (Grade 3)的乳癌的降低程度分別為每年篩檢可降27% (相對危險性=0.73, 95%信賴區間:0.65-0.82 )；每二年篩檢可降22% (相對危險性=0.78, 95%信賴區間:0.70-0.87 ); 每三年的篩檢可降18% (相對危險性=0.82, 95%信賴區間: 0.73-0.91)。相對的，對降低多病灶或擴散性乳癌的評估，篩檢組相對於對照組，每年篩檢可降19% (相對危險性=0.81, 95%信賴區間:0.73-0.89 )；每二年的篩檢可降14% (相對危險性=0. 86, 95%信賴區間:0.78-0.95 ); 每三年的篩檢可降11% (相對危險性=0.89, 95%信賴區間:0.70-0.98)。結論當評估三陰性乳癌對於乳癌的進展和利用乳房攝影篩檢資料去了解乳癌的疾病進展時，對過多零觀察值和過度偵測情形的評估是必要的，當利用零－膨脹模式時，將組織分布特性作為評估是否為真正的零時，三陰性乳癌在計數部分所呈現的相對危險性是有意義的。在考量篩檢敏感度的情況下，依不同的組織分化程度組織分布特性所建構出來的零膨脹及零閾值模型可以解決棘手的過度偵測的問題，因其同時考量了在大規模篩檢計畫中可能的害處，而本研究結果顯示瑞典篩檢資料中所得的過度偵測情形並不嚴重。Background Contextual factors for predicting the risk of breast cancer have evolved with time in parallel with the growth of breast cancer screening with mammography. The tendency of having small, node negative, well differentiation have created a statistical thorny issue excess zeros (under-dispersion) in the language of the survival of breast cancer for breast cancer patients and over-detection in the language of theory of mass screening for underlying women. Objectives By using this Swedish screened cohort data, the objectives of this thesis are to (1) assess independent or interactive effect of mammographic appearance and triple negative breast tumour on the risk for breast cancer death making allowance for three conventional tumour attributes and also histological tumour attributes using Bayesian approach; (2) develop a series of count model and zero-inflated model to evaluate the effect of the spread of lymph nodes, tumour size, and triple negative marker on the count part and histological grade and focality on the zero part for the risk of breast cancer death; (3) develop the zero-inflated and zero-hurdle multi-state Markov models with respects to dedifferentiation and also the change of focality of breast tumour for over-detection resulting from mass screening. Materials and Methods The study subjects were derived from a consecutive series of patients diagnosed with breast cancer at Falun Central Hospital of Dalarna County in Sweden. The Dalarna county was the place of the W-county trial, one of the Swedish Two-county randomized controlled trial from 1977 to 1986. Breast cancer service screening program has been offered after the trial until now. Information on IHC markers and histological tumour distribution was collected. A retrospective cohort of 498 patients diagnosed with breast cancer at Falun Central Hospital, Sweden between 1996 and 1998 was enrolled for the assessment of independent or interactive effect of mammographic appearance and triple negative breast tumour on the risk for breast cancer death making allowance for three conventional tumour attributes and also histological tumour distribution. This cohort together with prior information on conventional tumour attributes and mammographic appearance from 1968 to 1995 was formed by Bayesian method and was followed until the end of 2011 to ascertain breast cancer death. Zero-inflated models (ZIP) for the count of advanced stage of three conventional tumour attributes and for breast cancer death, were used to evaluate the effect of triple negative or other IHC markers on the counts of three tumour attributes and breast cancer death. A series of zero-inflated and zero-hurdle multi-state Markov models were developed for the elucidation of disease temporal natural history for breast cancer, and for further consideration of disease in terms of different histological grade, and histological tumour distribution (unifocal, multifocal and diffuse). Results After adjusting for tumour size, node status, grade, and mammographic appearance, triple-negative still remained statistically significant for the risk of breast cancer death using non-informative prior (aHR=2.54, 95% CI: 1.21-5.19) and using informative prior(aHR=1.95, 95% CI: 1.06-3.52). The effect of triple-negative on breast cancer was statically significant on breast cancer death (3.05 (95% CI:1.69-5.52)) after adjusting conventional tumor attributes in the count part and the focality in the zero-inflated part was significant(β=1.28, P=0.013),indicating the zero-inflated probabilities of unifocality was 0.6340. In the zero-inflated four-state Markov model, the estimated annual pre-clinical incidence rate of progressive PCDP and non-progressive PCDP with adjustment for sensitivity were 2.01 (95% CI: 1.78-2.25) and 0.032 (95% CI: 0.013- 0.051) per thousand. The proportion of zero-hurdle estimated from the zero-hurdle four-state model was around 14.19% (95% CI: 5.83%-22.56%). Computer simulation based on the four-state Markov model estimated the number of screen to detect one over-detected case (NSO) was the lowest at first screen equal to 1070 and then increased to 39526 at the seven round screen with annual screening regime. The NSO decreased with longer inter-screening interval. As far as histological grade is concerned, the zero-inflated six-state Markov model suggests around 25% breast tumour was inherited from poor differentiation at the inception of tumour carcinogenesis. The sensitivity for detecting histological grade 1/2 was 62% (95% CI: 51%-73%). The estimated proportion of zero-hurdle was 6.11% (95% CI: 1.64%-22.82%) based on the zero-hurdle six-state model. The trends of NSO with round of screening and interval of screening were similar to that from four-state model. The computer simulation based on six-state zero-inflated Markov model further shows the reduction of breast cancer with histological grade 3 was 27% (RR=0.73, 0.65-0.82) for annual regime, 22% (RR=0.78, 0.70-0.87) for biennial regime, and 18% (RR=0.82, 0.73-0.91) for triennial regime compared with the control group. For mutifocal and diffuse type, the reduction of such cases was 19% (RR=0.81, 95% CI: 0.73-0.89) for annual regime, 14% (RR=0.86, 95% CI:0.78-0.95) for biennial regime, and 11% (RR=0.89, 95% CI: 0.80-0.98) for triennial regime compared with the control group. Conclusions While evaluating the effect of triple negative breast tumour on the prognosis of breast cancer from breast cancer patients and elucidating disease progression of breast cancer from data on breast cancer screening with mammography, excess zeros and over-detection should be evaluated. By using the proposed zero-inflated count model, triple negative breast tumour made contribution to the risk of counts while unifoical versus multi-focal/diffuse type account for true zeros, very low risk group. The zero-inflated and zero-hurdle multi-state Markov models with histological grade and focality by considering the sensitivity were further developed to solve a thorny issue of over-detection that is regarded as a harm in mass screening for breast cancer. We found over-detection is not serious about this Swedish data.口試委員會審定書 I 誌謝 II 摘要 III ABSTRACT VII CHAPTER 1 INTRODUCTION 1 CHAPTER 2 LITERATURE REVIEW 5 2.1 Prognostic factors for breast cancer 5 2.1.1 Conventional prognostic factors 5 2.1.2 Immunohistochemical (IHC) markers 5 2.1.3 Histological tumour distribution 7 2.1.4 Mammographic appearance 7 2.2 Over-dispersion and under-dispersion for count data 10 2.2.1 Zero-inflated Poisson (ZIP) regression model 10 2.2.2 Zero-hurdle Poisson (ZHP) regression model 15 2.3 Natural history model for breast cancer 17 2.3.1 Stochastic model for disease natural history and measurement error 17 2.2.2 Mover-stayer model for breast cancer with dedifferentiation 19 CHAPTER 3 DATA SOURCES 23 3.1 Study Subjects 23 3.2 Measurement of variables 26 3.2.1 Conventional tumour attributes 27 3.2.2 Large-section histopathology 27 3.2.3 Mammographic appearance 28 3.2.4 Immunohistochemistry markers and molecular phenotype 28 3.2.5 Histological tumour distribution (Focality) 29 CHAPTER 4 STATISTICAL METHODS WITH EMPHASIS ON STOCHASTIC MODELS 31 4.1 Bayesian approach for survival of TNBC 31 4.2 Zero-Inflated (ZIP) Poisson regression model for prognosis 34 4.2.1 Poisson Model 36 4.2.2 Zero-Inflated Poisson Model 37 4.2.3 Zero-Hurdle Poisson Model 38 4.2.4 Negative Binomial Model 39 4.3 Multi-state Markov model for natural history of breast cancer considering non-progressive disease 40 4.3.1 Four-state Markov model for breast cancer natural history 40 4.3.1.1 Zero-inflated four-state Markov model 40 4.3.1.2 Zero-hurdle four-state Markov model 48 4.3.2 Six-state Markov model for histological grade 49 4.3.2.1 Zero-inflated six-state Markov model 49 4.3.2.2 Zero-hurdle six-state Markov model 51 4.3.3. Zero-inflated eight-state Markov model for histological tumour distribution (focality) 51 4.3.4 Estimation of transition rates 53 4.3.5 Computer simulation based on estimated results 53 CHAPTER 5 RESULTS 55 5.1 Triple-negative breast cancer (TNBC) 55 5.1.1 The associated factors for TNBC 55 5.1.2 Long-term breast cancer survival by triple-negative and mammographic appearance 58 5.2 Zero-inflated model for tumour attributes 61 5.2.1 Association between breast cancer death and conventional tumour attributes, IHC markers, mammographic appearance and tumour phenotype 61 5.2.2 Poisson regression model 61 5.2.3 Zero-Inflated Poisson (ZIP) Model 64 5.2.4 Zero-Hurdle Poisson (ZHP) Model 65 5.3 Natural history model for breast cancer with histological tumour distribution 66 5.3.1 Estimates of the zero-inflated and the zero-hurdle four-state Markov model 66 5.3.2 The zero-inflated and zero-hurdle six-state Markov model with histological grade 67 5.3.3 The zero-inflated eight-state Markov model with focality 68 5.3.4 Simulated results for different inter-screening intervals 68 5.3.4.1 The zero-inflated four-state Markov model 69 5.3.4.2 The zero-inflated six-state Markov model with histological grade 70 5.3.4.3 The zero-inflated eight-state Markov model with focality 70 CHAPTER 6 DISCUSSION 72 6.1 Logic of developing the zero-inflated statistical model 72 6.2 Effect of Triple-negative breast cancer (TNBC) on survival of breast cancer without considering zero excess 73 6.2.1 Comparison with previous studies on triple-negative breast cancer 75 6.2.2 Combining TNBC with Mammographic Appearance 77 6.3 Zero-inflated model for the effect of triple negative or other IHC markers on the counts of three tumour attributes and breast cancer death 79 6.4 The zero-inflated and zero-hurdle Markov model 82 6.4.1 Methodological breakthrough 82 6.4.2 Impact of screening policy 83 6.4.3 Cost-effectiveness of early detection of breast cancer taking over-detected cases into account 84 6.5 Limitations and future works 86 6.5.1 Non-homogeneous stochastic process 86 6.5.2 The mixture mover-stayer model 87 CHAPTER 7 CONCLUSION 89 REFERENCES 90 Tables and Figures Table 4-1. Likelihood function and data used for the zero-inflated four-state Markov model for breast cancer natural history 97 Table 4-2. Likelihood function and data used for the zero-inflated four-state Markov model for breast cancer natural history considering sensitivity 99 Table 4-3. Likelihood and data used for the zero-hurdle four-state Markov model for breast cancer natural history 101 Table 4-4. Data used for the zero-inflated six-state Markov model for disease natural history considering histological grade 102 Table 4-5. Likelihood and data used for the zero-inflated six-state Markov model for breast cancer natural history considering histological grade and sensitivity. 104 Table 4-6. Likelihood and data used for the zero-hurdle six-state Markov model for breast cancer natural history considering histological grade 107 Table 4-7. Likelihood and data used for the zero-inflated eight-state Markov model for breast cancer natural history considering histological tumour distribution (focality) 110 Table 5-1-1. The distribution of tumour attributes, focality, and immunohistochemical markers by mammographic appearance among invasive breast cancers diagnosed between 1996 and 1998 in Dalarna 114 Table 5-1-2. The association between mammographic appearance and triple-negative, ER, PR or HER2 among invasive breast cancers diagnosed between 1996 and 1998 in Dalarna. 116 Table 5-1-3. The association between focality, tumour attributes, and triple-negative, ER, PR or HER2 among invasive breast cancers diagnosed between 1996 and 1998 in Dalarna 117 Table 5-1-4. Estimated hazard ratios for breast cancer death of tumour attributes, mammographic appearance, IHC markers and focality in 1996-1998 in Dalarna by accelerated failure time model. 118 Table 5-1-5. Estimated hazard ratios for breast cancer death of tumour attributes, mammographic feature, focality by triple negative status in 1996-1998 in Dalarna by multivariate accelerated failure time model. 119 Table 5-1-6. Estimated hazard ratios for breast cancer death of triple-negative, tumour attributes, and focality by mammographic appearance in 1996-1998 in Dalarna by multivariate accelerated failure time model. 120 Table 5-2-1. The distribution of age at diagnosis, conventional tumour attributes, focality, IHC markers (ER, PR, Her 2, Triple negative), molecular phenotype, and mammographic appearance by breast cancer death. 121 Table 5-2-2. The univariate and multivariable analysis of Poisson regression for the association between count of conventional tumour attributes and predictors 123 Table 5-2-3. The univariate and multivariable analysis of Poisson regression for predicting breast cancer death by conventional tumour attributes and other predictors. 124 Table 5-2-4. Prediction models for breast cancer death by using Zero-inflated Poisson regression model 125 Table 5-2-5. Prediction models for breast cancer death by using Zero-hurdle Poisson regression model 127 Table 5-3-1. Estimated results of zero-inflated four-state Markov model for natural history of breast cancer 129 Table 5-3-2. Estimated results of zero-inflated four-state Markov model for natural history of breast cancer considering sensitivity 130 Table 5-3-3. Estimated results of natural history for breast cancer by zero-hurdle multi-state Markov model by EM algorithm. 131 Table 5-3-4. Estimated results of natural history for breast cancer by zero-inflated six-state Markov model considering histological grade by EM algorithm. 132 Table 5-3-5. Estimated results of natural history for breast cancer by zero-inflated six-state Markov model considering histological grade and sensitivity. 133 Table 5-3-6. Estimated results of natural history for breast cancer by zero-hurdle six-state Markov model considering histological grade by EM algorithm. 134 Table 5-3-7. Estimated results of natural history for breast cancer by zero-inflated eight-state Markov model considering focality. 135 Table 5-3-8. Simulated results of number of over-detected cases of 100,000 invited women from age 40 by different inter-screening interval based on zero-inflated four-state Markov model. 136 Table 5-3-9. Simulated results of number needed to screen to detect one over-detected case (NSO) for 100,000 invited women from age 40 based on zero-inflated four-state Markov model 137 Table 5-3-10. Simulated results of breast cancer number for 100,000 invited women from age 40 by different inter-screening interval based on zero-inflated four-state Markov model. 138 Table 5-3-11. Simulated results of number of over-detected cases (ODC) of 100,000 invited women from age 40 by different inter-screening interval based on zero-inflated six-state Markov model. 139 Table 5-3-12. Simulated results of number needed to screen to detect one over-detected case (NSO) for 100,000 invited women from age 40 based on zero-inflated six-state Markov model. 140 Table 5-3-13. Simulated results of breast cancer number for 100,000 invited women from age 40 by different inter-screening interval based on zero-inflated six-state Markov model. 141 Table 5-3-14. Simulated results of number of over-detected cases (ODC) of 100,000 invited women from age 40 by different inter-screening interval based on zero-inflated eight-state Markov model 142 Table 5-3-15. Simulated results of number needed to screen to find a non-progressive breast cancer for 100,000 invited women from age 40 based on zero-inflated eight-state Markov model 143 Table 5-3-16. Simulated results of breast cancer number for 100,000 invited women from age 40 by different inter-screening interval based on zero-inflated six-state Markov model. 144 Table 6-4-1. Considering over-detected cases in the deterministic cost-effectiveness analysis. 145 Table 6-4-2. No considering over-detected cases in the deterministic cost-effectiveness analysis. 146 Table 6-4-3. Extra cost for over-detected cases among 100,000 participants. 147 Figure 2-2. The mover-stayer mixture of Markov chain model proposed by Chen et al (1977). 149 Figure 3-1. The available information of breast cancer cases, and screening experiences in Dalarna, Sweden from 1968 to 2010. 150 Figure 4-1. Zero-inflated four-state Markov model for breast cancer natural history. (a). Model for screening arm (active screening group, ASP). (b). Model for control arm (passive screening group, PSP). 151 Figure 4-2. Zero-hurdle four-state Markov model for breast cancer natural history. (a). Model for screening arm (active screening group, ASP). (b). Model for control arm (passive screening group, PSP). 152 Figure 4-3. Zero-inflated six-state Markov model considering histological grade. (a). Model for screening arm (active screening group, ASP). (b). Model for control arm (passive screening group, PSP). 153 Figure 4-4. Zero-hurdle six-state Markov model considering histological grade. (a). Model for screening arm (active screening group, ASP). (b). Model for control arm (passive screening group, PSP). 154 Figure 4-5. Zero-inflated eight state Markov model considering histological tumor distribution (focality). (a). Model for screening arm (active screening group, ASP). (b). Model for control arm (passive screening group, PSP). 155 Figure 5-1-1. Cumulative survival by mammographic appearance and triple-negative status of invasive breast cancers diagnosed in 1996-1998 in Dalarna. 1579113497 bytesapplication/pdf論文使用權限：同意有償授權(權利金給回饋本人)乳癌貝氏分析方法零-膨脹模型過度偵測疾病自然史零膨脹多階段馬可夫模型零閾值多階段馬可夫模型[SDGs]SDG3零膨脹統計模型研究乳癌疾病進展Zero-Inflated Statistical Model for Breast Cancer Progressionthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262381/1/ntu-103-D98842009-1.pdf