指導教授:陳秀熙臺灣大學:流行病學與預防醫學研究所許辰陽Hsu, Chen-YangChen-YangHsu2014-11-272018-06-292014-11-272018-06-292014http://ntur.lib.ntu.edu.tw//handle/246246/262365研究背景 傳染病觀察資料中包含個案由易感期(susceptible)受感染後進而發展至疾病症狀期並成為具感染性之宿主(infective)之過程。此種觀察資料中所具有的相關性以及潛藏之異質性造成資料在分析上之困難。例如在流行性感冒家戶資料中,每一個家戶成員在經歷數次的接觸後進而受到流行性感冒病毒感染,因此對於該家戶成員的連續觀察資料具有相關性以及階層(hierarchical)結構。除此之外,對於慢性傳染病而言,疾病發展的進程包含了由易感性宿主到潛伏感染期(latent period) 進而發展成為臨床症狀並產生感染性,此一觀察資料則須將疾病階段分為多類。在不同的疾病階段中,影響疾病在該階段進展的因素可能也有所不同,更增加傳染病資料分析上的複雜性。因此,若能建立一個可運用於傳染病觀察資料的統一模型架構,將上述的傳染病進展過程納入分析的考量中,對於運用傳染病資料釐清疾病進展的機轉與各階段的貢獻因子將有極大的助益。以隨機過程為基礎架構發展廣義線性模式有助減低傳染病中連續觀察資料中所形成的聯合機率分佈的維度,並可將上述資料中潛藏的異質性(heterogeneity)如階層相關結構與測量誤差納入考量,同時評估各個因子對於不同疾病階段所造成的影響。 方法以及應用 方法學發展 本研究首先建構離散時間二階段馬可夫模型,運用感染鍊二項模式結合條件機率建立模型以描述傳染病觀察資料所形成之二元序列。利用二階段馬可夫模型,每一個易感性宿主在成為感染性宿主前所需經過的接觸次數(即感染鍊二項模式中的單位時間)可以表示為一服從幾何分佈以及負二項分佈之隨機變數。當研究之疾病進程須納入個案處於潛伏感染期之狀態,則二階段模型即需拓展成為多階段模型。對於連續時間廣義線性隨機模式之建構上,本研究運用轉移速率矩陣(intensity matrix)結合Kolmogorov微分方程架構出多階段模型之機率形式之通式;模型並可運用對各階段疾病進展指定適當的分佈:如具有同質(homogenous)進展速率之指數分佈,以及非同質(non-homogenous)疾病進展速率的韋伯分佈(Weibull distribution)與對數常態分佈(lognormal distribution)。模型亦納入解釋變數以及測量誤差的影響。 應用於傳染病資料分析 本研究運用基於離散時間二階段馬可夫模型之廣義線性隨機模式於台灣之流行性感冒家戶監測資料。模式中納入自變數,包含性別、年齡以及是否施打疫苗並加入觀察資料間的相關結構以及隨機效應模型包含隨機截距(random intercept)與隨機斜率(random slope)進行分析。對於結核病由受到感染(LTBI)進而發展為結核病(TB)的多階段過程,本研究將離散時間三階段馬可夫模型之廣義線性隨機模式運用於彰化結核病接觸者調查資料之分析並以疾病發展過程之轉移速率與轉移機率描述結核病由易感性宿主至潛伏感染期進而發展至結核病之進程。對於連續時間三階段馬可夫模型之廣義線性隨機模式之運用,研究中分別利用不同的分佈建構同質性轉移速率與非同質性轉移速率模型進行分析。模型亦納入對於各個疾病進展階段相關因素如年齡與性別並同時納入測量誤差加以分析。 結果 在運用二階段廣義線性隨機模式於台灣流行性感冒監測資料中,若以貝氏階層模型(Bayesian hierarchical model)將疫苗保護效益在家戶間的異質性納入考量,疫苗則呈現出顯著的保護效益(勝算比: 0.50,95%可信區間:0.32-0.75,Reed-Frost model with random slope)。疫苗的保護效益在家戶間亦呈現顯著的差異。 在運用三階段廣義線性隨機模式於彰化之結核病接觸者調查資料進行分析顯示基於離散時間與連續時間馬可夫模型之估計有一致的結果。對於利用不同的分佈所建構的連續時間多階段模型之比較顯示,運用韋伯分佈與對數常態分佈所建構的非同質性轉移速率模型優於其他模型。利用韋伯分佈與對數常態分佈所建構的連續時間多階段模型分析結果可發現,結核年感染率(ARTI)隨時間而遞減並且在45-65年齡層具有最高的感染風險。對於感染後進展成為結核病之速率,老年族群程陡峭上升之型式,在5年後達到最高點,這可能與結核再活化 (re-activation)有關。而年輕族群 (30-65歲)則呈穩定上升的趨勢。偽陽性與偽陰性分別為5%-10%以及55%。 結論 本研究所發展的架構於馬可夫模型之廣義線性隨機模式可將不同的分佈型式納入於模型中加以運用,模型亦同時可考量觀察資料之相關性與結構性如流行性感冒家戶監測資料以及慢性傳染病之多階段疾病進展如結核病進程模型。此一模型對於釐清疾病進展以及分析相關因素具有相當的助益。模型亦可對於疾病不同階段之發展速率進行分析,提供發展傳染病防治策略所需要的重要訊息。Background Modelling the transmission of infectious disease for acute infectious disease and the subsequent progression to the manifestation of disease for chronic infectious disease is fraught with a series of thorny issues. Correlated property and unobserved heterogeneity are implicated in the transmission of infectious disease. The progression from susceptible to latent infection and clinical disease regarding chronic infectious disease has involved with multi-state transition and hence the multiple categorical outcomes. The joint effect of covariates further complicated these issues. It is of great interest to provide a unified but preliminary framework to solve these problems. The generalized stochastic process provides a feasible statistical tool to reduce the dimension on modelling such a correlated sequence and also accommodate above statistical issues. Methods and applications Methods A two-state Markov model with discrete time was first developed to model the observed binary sequences. Based on the Markov model, a subject remaining in the state of susceptible before he/she first enters the state of infective is expressed as a geometric form and negative binomial form as expressed in the conventional chain binomial model. The proposed two-state Markov model was then extended to multistate discrete-time Markov model. The generalized stochastic model in continuous time was also developed by using the general form for the specification of the intensity matrix with Kolmogorov differential equation. A variety of distributions including exponential distribution for the homogenous rate of transition and the Weibull and the lognormal distribution for the non-homogenous rate on disease progression were applied. The developed methodology also incorporated the effect of covariates using appropriate link function and that of measurement error Applications The proposed generalized liner stochastic model with two-state Markov underpinning was first applied to data on surveillance of influenza incorporating the effect of age, sex, and vaccination status making allowance for the correlated structure by using random intercept and random slope parameters. To evaluate the transition between susceptible, LTBI, and tuberculosis (TB) the generalized stochastic process was extended to three states. Continuous-time Markov models with homogenous and non-homogenous rates of transition were specified by using a variety of distributions including exponential, Weibull, and lognormal distributions. The joint effects of covariates and measurement error were also evaluated. Results Regarding the application of generalized linear stochastic model with two-state Markov underpinning on surveillance data of influenza in Taiwan, vaccination showed a significant protective effect in hierarchical models with random slope (OR: 0.50, 95% CI: 0.32-0.75, Reed-Frost model). The variation of vaccination across household was quantified by using random effect parameter estimated as 1.11 to 1.23. The application of three-state Markov model to data on contact tracing project of tuberculosis in ChangHua shows consistent result of applying both continuous time and discrete time frame. The comparison between models in continuous time shows the superiority of using the non-homogenous rate of disease progression for both infection rate and conversion rate using the Weibull- lognormal distribution. By using the three-state Markov model with the Weibull distribution for annual risk for TB infection (ARTI) and the lognormal distribution for the conversion from LTBI to TB, we found hazard for ARTI is not constant and decrease with follow-up time and the highest hazard rate was noted in age group 45-64. The hazard for annual conversion rate form LTBI to TB was higher in subjects older than 65 years and shows a steep increase with time, probably due to re-activation, and becomes plateau after around 5 years for the eldest group, and show a linear increase for the young age group aged 30-64 years. The false-positive and false-negative rates in terms of disease evolution were around 5%-10% and 55%, respectively. Conclusion The proposed generalized linear stochastic process with the Markov underpinning demonstrated the capability of accommodating correlated data on surveillance of influenza and also multi-state disease process of chronic infectious disease such as TB. This unified framework on generalized stochastic process is very powerful for the elucidation of transmission mode for acute infectious disease but also the quantification of the rate of progression from latent infection to the manifestation of disease for chronic infectious disease. Both are of utmost important to give a clue to the containment of various types of infectious disease.Contents 摘要 i Abstract v Chapter 1 Introduction 1 Chapter 2 Literature review 8 2.1 Literature review on methodological development of multistate model 8 2.1.1 Markov model with discrete state and discrete time 8 2.1.2 Negative binomial regression 21 2.1.3 Analysis of infectious disease data using discrete-time Markov model 34 2.1.4 Literature review on the elucidation of the preclinical detectable phase using Markov model 41 2.2 Literature review on biological background and statistical applications 47 2.2.1 The surveillance of the epidemic of influenza 47 2.2.2 The surveillance of tuberculosis 51 Chapter 3 Data sources 56 3.1 Data source of influenza epidemic 56 3.2 Data source of contact tracing project of tuberculosis 59 Chapter 4 Model specification 64 4.1 Progression of infectious disease 64 4.2 Generalized linear two-state Markov model 67 4.2.1 The chain binomial model and the geometric model 68 4.2.2 Generalized linear two-state Markov regression model 73 4.2.3 Estimation procedure of the generalized linear two-state Markov model 85 4.3 Generalized linear multi-state Markov regression model 89 4.3.1 Generalized liner k-state Markov regression model 89 4.3.2 Generalized three-states Markov regression model 90 4.3.3 Three-state geometric Markov model 92 4.3.4 Three-state negative binomial Markov model 99 4.3.5 Incorporating measurement error into the generalized stochastic model 105 4.3.6 Generalized linear three-state Markov model in discrete time 118 4.3.7 Generalized multi-state Markov model in continuous time 121 4.3.8 The relationship between Markov model in discrete time and that in continuous time 133 Chapter 5 Results 136 5.1. Results of the applying generalized linear two-state Markov model to the surveillance data of influenza 136 5.1.1 Surveillance data on the epidemic influenza with household structure 136 5.1.2 Estimated results using generalized two-state Markov model 137 5.2 Results of the application of three-state negative binomial model on the elucidation of the nature history of tuberculosis 148 5.2.1 Descriptive results of tuberculosis data 148 5.2.2 Results of generalized linear three-state Markov models to the data on contact tracing project of tuberculosis 150 Chapter 6 Discussion 167 6.1 Methodology and summary of findings on applications to influenza and TB 167 6.1.1 Methodological development 167 6.1.2 Major findings on applications 168 6.2 The application of generalized linear two-state Markov model on surveillance data on influenza 170 6.3 The application of generalized linear multi-state Markov model on surveillance data on contacts of tuberculosis 177 6.4 Implications for endogeneous and exogeneous TB 180 6.5 Risk stratification of TB contacts 182 6.6 Strengths 183 6.6.1 Applying Bayesian hierarchical model for the correlated data structure 183 6.6.2 Generalized linear sothcastic model with measurement error 185 6.6.3 Connection between continuous-time Markov model with discrete-time Markov model 187 6.7 Limitation 189 6.8 Conclusion 192 6.8.1 Methodological development of generalized linear stochastic model 192 6.8.2 Applications 193 References 194 Figures 204 Figure 1. Timeline of the progression of infectious disaese 204 Figure 2. Heterogeneity across households in terms of the proportion of influenza cases, vaccinated subjects children, and secondary attack rate. 205 Figure 3. Graphical illustration of fixed effect model, random intercept model, and random slope model 206 Figure 4. Plots of predicted and observed numbers of influenza cases and 95% CIs against household size. 207 Figure 5. Time trend of tuberculosis reported to the surveillance system in ChangHua 208 Figure 6. Time trend of mean age of tuberculosis cases in ChangHua 209 Figure 7. Proportion of the elders among tuberculosis cases 210 Figure 8. Proportion of tuberculosis cases: pulmonary, extra-pulmonary and mixed cases 211 Figure 9. Proportion of pulmonary and mixed TB cases confirmed by sputum culture 212 Figure 10. Proportion of pulmonary and mixed TB with positive sputum smear 213 Figure 11. Proportion of pulmonary and mixed TB with cavitation on chest plain film 214 Figure 12. Proportion of Susceptible, LTBI, and TB by the advance of age 215 Figure 13. Distribution of TST size of contacts 216 Figure 14. Distribution of time to conversion based on the generalized linear three-state Markov model in continuous time adjusted for measurement error (Weibull - lognormal, TST 10 mm) 217 Figure 15. Estimated rate of infection and conversion by age groups based on the generalized linear three-state Markov model adjusted for measurement error (Weibull - lognormal, TST 10) 218 Figure 16. Predictive probability of susceptible, LTBI, and tuberculosis based on discrete-time Markov model and Weibull - lognormal Markov model 219 Figure 17. Predicted probability of susceptible, LTBI, and tuberculosis stratified by age group and sex: discrete-time Markov model (Weibull - lognormal model) 220 Figure 18. Predicted probability of susceptible, LTBI, and tuberculosis stratified by age group and sex: Weibull - lognormal Markov model 221 Figure 19. Predicted probability of susceptible, LTBI, and tuberculosis based on discrete-time Markov model and continuous time Markov mode (Weibull - lognormal) with measurement error 222 Figure 20. Predicted probability of susceptible, LTBI, and tuberculosis stratified by age groups based on continuous time Markov mode (Weibull - lognormal) with measurement error 223 Figure 21 Observed and predicted cumulative incidence of TB base on continuous time Marko model (Weibull - lognormal) with measurement error 224 Figure 22 Age and sex specific cumulative incidence of TB base on continuous time Marko model (Weibull - lognormal) with measurement error 225 Tables 226 Table 1. Theoretical probability distribution of number time unit by states (LTBI and tuberculosis) 226 Table 2. The probability of observed number of time interval to develop LTBI (O1) and tuberculosis (O2) given underlying time intervals of disease progression to LTBI (Y1) 227 Table 3. The probability of observed number of time intervals to develop LTBI (O1) and tuberculosis (O2) given underlying time intervals of disease progression to tuberculosis (Y2) 228 Table 4. Characteristic of the study subjects enrolled in the surveillance of influenza 229 Table 5. Estimated results of Becker''s linear logistic models 230 Table 6. Estimated results of Bayesian hierarchical models based on the Greenwood model 231 Table 7. Estimated results of Bayesian hierarchical models based on the Reed-Frost model 232 Table 8. Comparison of DIC for models 233 Table 9. Estimated results of Bayesian hierarchical models with random slopes using sceptical priors and enthusiastic priors on the effect of vaccination 234 Table 10. Characteristics of tuberculosis contacts 235 Table 11. Frequencies of transitions between states 236 Table 12. Estimated results of the rates of disease progression 237 Table 13. The -2 log likelihood values of continuous time Marko models 238 Table 14. Estimated result of the probability of disease progression using sex and age group as covariates: discrete-time Markov model, TST 10 mm as cutoff 239 Table 15. Estimated result of the probability of disease progression using sex and age group as covariates: discrete-time Markov model, TST 15 mm as cutoff 240 Table 16. Estimated result of the rates of disease progression using sex and age group as covariates: continuous time Markov model with non-constant rates of transition between states, TST 10 mm as cutoff (Weibull-lognormal model) 241 Table 17. Estimated result of the rates of disease progression using sex and age group as covariates: continuous time Markov model with non-constant rates of transition between states, TST 15 mm as cutoff (Weibull-lognormal model) 242 Table 18. Estimated result of discrete-time Markov model with measurement error 243 Table 19. Estimated result of the probability of disease progression using sex and age group as covariates: discrete-time Markov model incorporating measurement error 244 Table 20. Estimated result of rates of disease progression using sex and age group as covariates: continuous time Markov model incorporating measurement error (Exponential - Exponential model) 245 Table 21. Estimated result of the rates of disease progression using sex and age group as covariates: continuous time Markov model incorporating measurement error (Weibull-lognormal model) 246 Table 22. Effect of age group and sex on measurement error 247 Table 23. The ratio of the conversion rates versus infection rate by age groups and years of follow up (Weibull - lognormal model adjusted for measurement error). 248 Appendix 249 Appendix A: Surveillance of Influenza from Household to Community, published article 249 Appendix B: Analysis of Household Data on Influenza Epidemic with Bayesian Hierarchical Model, article under revision. 2685677100 bytesapplication/pdf論文使用權限:同意有償授權(權利金給回饋學校)傳染病廣義線性模式隨機過程多階段馬可夫模式流行性感冒結核病[SDGs]SDG3廣義線性隨機過程於傳染病之應用Generalized Linear Stochastic Process with Applications to Infectious Diseasesthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262365/1/ntu-103-D98842014-1.pdf