.

Tuesday, December 18, 2018

'Bayesian Inference\r'

'Biostatistics (2010), 11, 3, pp. 397â€412 inside:10. 1093/biostatistics/kxp053 Advance Access publication on declination 4, 2009 Bayesian induction for generalised bi delimitatear conflate deterrent examples YOUYI FONG Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University library on April 20, 2013 Department of Biostatistics, University of Washington, Seattle, WA 98112, USA ? HAVARD RUE Department of Mathematical Sciences, The Norwegian University for Science and Techno recordy, N-7491 Trondheim, Norway JON WAKEFIELD? Departments of Statistics and Biostatistics, University of Washington, Seattle, WA 98112, USA [email protected] ashington. edu S UMMARY generalised analog intricate baffles (GLMMs) continue to work in popularity payable to their ability to directly recognise multiple levels of dependency and sit down distinguishable selective information types. For sm every(prenominal) sample sizes especially, likeliness- base infer ence whoremaster be unreliable with variance parts universe particularly vexed to estimate. A Bayesian ascend is appealing hardly has been hampered by the lack of a fast instruction execution, and the bar in thinking anterior distri thations with variance components again being particularly problematic.Here, we briefly survey previous approaches to numeration in Bayesian employations of GLMMs and illust pass judgment in detail, the in bind of integrated nested Laplace neighborhoods in this context. We get a line a military bring on of examples, c arfully de marchesinal conditionineing anterior(prenominal) distri stillions on meaningful quantities in each case. The examples c everywhither a wide flap of selective information types including those requiring smoothing everywhere time and a comparatively complicated slat regulate for which we examine our preliminary specification in terms of the imp resided degrees of independence.We conclude that Bayesian inference is now unimaginatively feasible for GLMMs and provides an win several(prenominal)(prenominal) alternative to likeliness- base approaches such as penalized quasi-likelihood. As with likelihood-based approaches, great c ar is need in the analysis of assemble binary entropy since likeness strategies whitethorn be less accurate for such selective information. Keywords: Integrated nested Laplace musical themes; longitudinal data; Penalized quasi-likelihood; Prior specification; Spline models. 1.I NTRODUCTION generalised elongate interracial models (GLMMs) combine a extrapolate linear model with customary haphazard achievementuate on the linear predictor shield, to retrovert a rich family of models that ingest been utilize in a wide innovation of applications ( retrieve, e. g. Diggle and others, 2002; Verbeke and Molenberghs, 2000, 2005; McCulloch and others, 2008). This flexibility comes at a price, however, in terms of analytical tractability, which has a ? To whom residue should be addressed. c The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals. [email protected] rg. 398 Y. F ONG AND OTHERS outlet of implications including computingal complexness, and an unknown degree to which inference is dependent on modeling assumptions. Likelihood-based inference whitethorn be carried out coitusly slowly inwardly legion(predicate) softwargon plat radiation diagrams (except perhaps for binary solutions), still inference is dependent on asymptotic sample distri exactlyion distributions of estimators, with few guidelines available as to when such opening exit produce accurate inference. A Bayesian approach is attractive, but requires the specification of front distributions which is non straight, in particular for variance components.Computation is as rise an issue since the usual implementation is via Markov chain four-card monte Carlo (MCMC), which carri es a salient computational overhead. The seminal article of Breslow and Clayton (1993) helped to pass around GLMMs and placed an emphasis on likelihood-based inference via penalized quasi-likelihood (PQL). It is the heading of this article to describe, through a series of examples (including all of those estimateed in Breslow and Clayton, 1993), how Bayesian inference whitethorn be per stocked with computation via a fast implementation and with instruction on anterior specification. The structure of this article is as follows.In constituent 2, we define nonation for the GLMM, and in region 3, we describe the integrated nested Laplace approximation (INLA) that has recently been proposed as a computationally at rest alternative to MCMC. dent 4 gives a get of prescriptions for front specification. trio examples atomic derive 18 considered in dent 5 (with special examples being reported in the supplementary stuff and nonsense available at Biostatistics online, along with a assumption necessitate that reports the put to deathance of INLA in the binary chemical reaction situation). We conclude the paper with a contendion in sectionalisation 6. 2.T HE G ENERALIZED LINEAR coalesce MODEL GLMMs extend the generalized linear model, as proposed by Nelder and Wedderburn (1972) and comprehensively depict in McCullagh and Nelder (1989), by adding unremarkably distributed hit-or-miss do on the linear predictor outdo. Suppose Yi j is of exponential function family form: Yi j |? i j , ? 1 ? p(•), where p(•) is a member of the exponential family, that is, p(yi j |? i j , ? 1 ) = exp yi j ? i j ? b(? i j ) + c(yi j , ? 1 ) , a(? 1 ) Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 for i = 1, . . . , m units (clusters) and j = 1, . . , n i , measurements per unit and where ? i j is the (scalar) ? dirty dogonical line. permit ? i j = E[Yi j |? , b i , ? 1 ] = b (? i j ) with g( ? i j ) = ? i j = x i j ? + z i j b i , where g(•) is a decreasing monotonic â€Å" merge” function, x i j is 1 ? p, and z i j is 1 ? q, with ? a p ? 1 sender of firm ? Q personal make and b i a q ? 1 transmitter of random cases, hence ? i j = ? i j (? , b i ). Assume b i |Q ? N (0, Q ? 1 ), where ? the precision hyaloplasm Q = Q (? 2 ) depends on parameters ? 2 . For some picks of model, the ground substance Q is singular; examples include random walk models (as considered in Section 5. ) and intrinsic qualified ? autoregressive models. We further abide that ? is assigned a normal introductory distribution. allow ? = (? , b ) denote the G ? 1 vector of parameters assigned Gaussian earliers. We also require earliers for ? 1 (if not a constant) and for ? 2 . Let ? = (? 1 , ? 2 ) be the variance components for which non-Gaussian antecedents atomic number 18 ? assigned, with V = dim(? ). 3. I NTEGRATED NESTED L APLACE APPROXIMATION Before the MCMC revol ution, in that respect were few examples of the applications of Bayesian GLMMs since, outside of the linear sundry(a) model, the models be analytically intractable.Kass and Steffey (1989) describe the lend oneself of Laplace approximations in Bayesian hierarchic models, while Skene and Wakefield Bayesian GLMMs 399 (1990) engrossd quantitative integrating in the context of a binary GLMM. The example of MCMC for GLMMs is particularly appealing since the conditional independencies of the model whitethorn be secondhand when the required conditional distributions ar calculated. Zeger and Karim (1991) exposit approximate Gibbs leave for GLMMs, with nonstandard conditional distributions being approximated by normal distributions.More general Metropolisâ€Hastings algorithms atomic number 18 straightforward to nominate (see, e. g. Clayton, 1996; Gamerman, 1997). The winBUGS (Spiegelhalter, Thomas, and Best, 1998) parcel example manuals contain many an(prenominal) GLMM examples. in that location are now a variety of additional software platforms for fitting GLMMs via MCMC including JAGS (Plummer, 2009) and BayesX (Fahrmeir and others, 2004). A large practical impediment to data analysis exploitation MCMC is the large computational burden. For this reason, we now briefly review the INLA computational approach upon which we revolve about.The method combines Laplace approximations and numerical integration in a very efficient air (see Rue and others, 2009, for a much extensive sermon). For the GLMM described in Section 2, the shtup is given by m Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 ? y ? ? ? ?(? , ? |y ) ? ?(? |? )? (? ) i=1 y ? p(y i |? , ? ) m i=1 1 ? ? Q ? ? b ? ?(? )? (? )|Q (? 2 )|1/2 exp ? b T Q (? 2 )b + 2 y ? log p(y i |? , ? 1 ) , where y i = (yi1 , . . . , yin i ) is the vector of observations on unit/cluster i.We wish to pick up the keister y y bare(a)s ? (? g |y ), g = 1, . . . , G, and ? (? v |y ), v = 1, . . . , V . The number of variance components, V , should not be too large for accurate inference (since these components are integrated out via Cartesian product numerical integration, which does not scale well with dimension). We write y ? (? g |y ) = which may be evaluated via the approximation y ? (? g |y ) = K ? ? y ? ?(? g |? , y ) ? ?(? |y )d? , ? ? y ? ?(? g |? , y ) ? ? (? |y )d? ? y ? ? (? g |? k , y ) ? ? (? k |y ) ? k, ? (3. 1) k=1 here Laplace (or other related analytical approximations) are applied to carry out the integrations required ? ? for paygrade of ? (? g |? , y ). To produce the grid of points {? k , k = 1, . . . , K } over which numerical inte? y gration is performed, the mode of ? (? |y ) is located, and the Hessian is approximated, from which the grid is created and exploited in (3. 1). The output of INLA consists of posterior marginal distributions, which can be summarized via means, variances, and quantile s. Importantly for model comparison, the normaly izing constant p(y ) is calculated.The evaluation of this measuring stick is not straightforward apply MCMC (DiCiccio and others, 1997; Meng and Wong, 1996). The distortion information criterion (Spiegelhalter, Best, and others, 1998) is popular as a model selection tool, but in random-effects models, the unuttered approximation in its hire is valid still when the trenchant number of parameters is much smaller than the number of independent observations (see Plummer, 2008). 400 Y. F ONG AND OTHERS 4. P RIOR DISTRIBUTIONS 4. 1 Fixed effects Recall that we strike ? is normally distributed. Often there leave alone be adapted information in the data for ? o be well estimated with a normal prior with a large variance (of course there entrust be circumstances under(a) which we would like to plant more informative priors, e. g. when there are many correlated covariates). The map of an im prim prior for ? allow for often get g oing to a ripe posterior though care should be taken. For example, Wakefield (2007) shows that a Poisson likelihood with a linear link can lead to an improper posterior if an improper prior is exercised. Hobert and Casella (1996) discuss the use up of improper priors in linear assorted effects models.If we wish to use informative priors, we may specify independent normal priors with the parameters for each component being sticked via specification of 2 quantiles with associated probabilities. For logistic and log-linear models, these quantiles may be given on the exponentiated scale since these are more interpretable (as the betting odds ratio and rate ratio, personly). If ? 1 and ? 2 are the quantiles on the exponentiated scale and p1 and p2 are the associated probabilities, wherefore the parameters of the normal prior are given by ? = ? = z 2 log(? 1 ) ? z 1 log(? 2 ) , z2 ? 1 Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 log(? 2 ) ? log(? 1 ) , z2 ? z1 where z 1 and z 2 are the p1 and p2 quantiles of a standard normal random variable. For example, in an epidemiologic context, we may wish to specify a prior on a copulation risk parameter, exp(? 1 ), which has a median of 1 and a 95% point of 3 (if we think it is unlikely that the relative risk associated with a unit improver in exposure exceeds 3). These specifications lead to ? 1 ? N (0, 0. 6682 ). 4. 2 Variance componentsWe begin by describing an approach for choosing a prior for a single random effect, based on Wakefield (2009). The basic idea is to specify a range for the more interpretable marginal distribution of bi and use this to drive specification of prior parameters. We state a trivial lemma upon which prior specification is based, but original define some notation. We write ? ? Ga(a1 , a2 ) for the da Gamma distribution with un? normalized minginess ? a1 ? 1 exp(? a2 ? ). For q-dimensional x , we write x ? Tq (? , , d) for t he scholar’s x x t distribution with unnormalized immersion [1 + (x ? ? )T ? 1 (x ? )/d]? (d+q)/2 . This distribution has military position ? , scale hyaloplasm , and degrees of immunity d. L EMMA 1 Let b|? ? N (0, ? ?1 ) and ? ? Ga(a1 , a2 ). Integration over ? gives the marginal distribution of b as T1 (0, a2 /a1 , 2a1 ). To decide upon a prior, we give a range for a generic random effect b and specify the degrees of freev d dom, d, and then solve for a1 and a2 . For the range (? R, R), we use the relationship ±t1? (1? q)/2 a2 /a1 = d ±R, where tq is the carbon ? qth quantile of a Student t random variable with d degrees of emancipation, to give d a1 = d/2 and a2 = R 2 d/2(t1? (1? q)/2 )2 .In the linear mixed effects model, b is directly interpretable, while for binomial or Poisson models, it is more appropriate to think in terms of the marginal distribution of exp(b), the residual odds and rate ratio, respectively, and this distribution is log Studentâ€℠¢s t. For example, if we choose d = 1 (to give a Cauchy marginal) and a 95% range of [0. 1, 10], we take R = log 10 and obtain a = 0. 5 and b = 0. 0164. Bayesian GLMMs 401 ?1 Another convenient choice is d = 2 to give the exponential distribution with mean a2 for ? ?2 . This leads to closed-form expressions for the more interpretable quantiles of ? o that, for example, if we 2 specify the median for ? as ? m , we obtain a2 = ? m log 2. Unfortunately, the use of Ga( , ) priors has become popular as a prior for ? ?2 in a GLMM context, arising from their use in the winBUGS examples manual. As has been pointed out many times (e. g. Kelsall and Wakefield, 1999; Gelman, 2006; Crainiceanu and others, 2008), this choice places the majority of the prior mass away from nix and leads to a marginal prior for the random effects which is Student’s t with 2 degrees of exemption (so that the tails are much heavier than even a Cauchy) and difficult to justify in any practical setting.We now specify another trivial lemma, but first establish notation for the Wishart distribution. For the q ? q nonsingular ground substance z , we write z ? Wishartq (r, S ) for the Wishart distribution with unnormalized Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 Q Lemma: Let b = (b1 , . . . , bq ), with b |Q ? iid Nq (0, Q ? 1 ), Q ? Wishartq (r, S ). Integration over Q b as Tq (0, [(r ? q + 1)S ]? 1 , r ? q + 1). S gives the marginal distribution of The margins of a multivariate Student’s t are t also, which allows r and S to be chosen as in the univariate case.Specifically, the kth element of a generic random effect, bk , follows a univariate Student t distribution with location 0, scale S kk /(r ? q + 1), and degrees of freedom d = r ? q + 1, where S kk d is element (k, k) of the inverse of S . We obtain r = d + q ? 1 and S kk = (t1? (1? q)/2 )2 /(d R 2 ). If a priori b are correlated we may specify S jk = 0 for j = k and we have no reason to intrust that elements of S kk = 1/Skk , to restore the univariate specification, recognizing that with q = 1, the univariate Wishart has parameters a1 = r/2 and a2 = 1/(2S).If we believe that elements of b are dependent then we may specify the correlations and solve for the off-diagonal elements of S . To ensure propriety of the posterior, proper priors are required for ; Zeger and Karim (1991) use an improper prior for , so that the posterior is improper also. 4. 3 trenchant degrees of freedom variance components prior z z z z density |z |(r ? q? 1)/2 exp ? 1 tr(z S ? 1 ) . This distribution has E[z ] = r S and E[z ? 1 ] = S ? 1 /(r ? q ? 1), 2 and we require r > q ? 1 for a proper distribution.In Section 5. 3, we describe the GLMM trifleation of a slat model. A generic linear slat model is given by K yi = x i ? + k=1 z ik bk + i , where x i is a p ? 1 vector of covariates with p ? 1 associated stiff effects ? , z ik denote the spline 2 basis , bk ? iid N (0, ? b ), and i ? iid N (0, ? 2 ), with bk and i independent. Specification of a prior for 2 is not straightforward, but may be of great importance since it contributes to determining the amount ? b of smoothing that is applied. Ruppert and others (2003, p. 77) raise concerns, â€Å"about the instability of automatic smoothing parameter selection even for single predictor models”, and continue, â€Å"Although we are attracted by the automatic nature of the mixed model-REML approach to fitting bilinear models, we discour ripen blind credence of whatever answer it provides and recommend looking at other amounts of smoothing”. While we would echo this general advice, we believe that a Bayesian mixed model approach, with conservatively chosen priors, can increase the stability of the mixed model representation. at that place has been 2 some raillery of choice of prior for ? in a spline context (Crainiceanu and others, 2005, 2008). More general discussion can be institute in Natarajan and Kass (2000) and Gelman (2006). In practice (e. g. Hastie and Tibshirani, 1990), smoothers are often applied with a bushel degrees of freedom. We extend this rationale by examining the prior degrees of freedom that is implied by the choice 402 Y. F ONG AND OTHERS ?2 ? b ? Ga(a1 , a2 ). For the general linear mixed model y = x ? + zb + , we have x z where C = [x |z ] is n ? ( p + K ) and C y = x ? + z b = C (C T C + 0 p? p 0K ? p )? 1 C T y , = 0 p? K 2 cov(b )? 1 b ? )? 1 C T C }, Downloaded from http://biostatistics. xfordjournals. org/ at Cornell University Library on April 20, 2013 (see, e. g. Ruppert and others, 2003, Section 8. 3). The integrality degrees of freedom associated with the model is C df = tr{(C T C + which may be decomposed into the degrees of freedom associated with ? and b , and extends easily to situations in which we have additional random effects, beyond those associated with the spline basis (such an example is considered in Section 5. 3). In each of these situations, the degrees of freedom associated C with the respective parameter is obtained by summing the appropriate diagonal elements of (C T C + )? C T C . Specifically, if we have j = 1, . . . , d sets of random-effect parameters (there are d = 2 in the model considered in Section 5. 3) then let E j be the ( p + K ) ? ( p + K ) diagonal matrix with ones in the diagonal positions identical to set j. wherefore the degrees of freedom associated with this set is E C df j = tr{E j (C T C + )? 1 C T C . Note that the powerful degrees of freedom changes as a function of K , as expected. To evaluate , ? 2 is required. If we specify a proper prior for ? 2 , then we may specify the 2 2 joint prior as ? (? b , ? 2 ) = ? (? 2 )? (? b |? 2 ).Often, however, we gull the improper prior ? (? 2 ) ? 1/? 2 since the data provide equal information with respect to ? 2 . Hence, we have ready the substitution of an estimate for ? 2 (for example, from the fitting of a spline model in a likelihood implementation) to be a practically reasonable strategy. As a simple nonspline demonstration of the derived effective degrees of freedom, consider a 1-way analysis of variance model Yi j = ? 0 + bi + i j 2 with bi ? iid N (0, ? b ), i j ? iid N (0, ? 2 ) for i = 1, . . . , m = 10 radicals and j = 1, . . . , n = 5 observa? 2 tions per group. For illustration, we assume ? ? Ga(0. 5, 0. 005). Figure 1 displays the prior distribution for ? , the implied prior distribution on the effective degrees of freedom, and the bivariate darn of these quantities. For clarity of plotting, we exclude a small number of points beyond ? > 2. 5 (4% of points). In beautify (c), we have placed dashed horizontal lines at effective degrees of freedom equal to 1 (complete smoothing) and 10 (no smoothing). From panel (b), we conclude that here the prior choice favors preferably strong smoothing. This may be contrasted with the gamma prior with parameters (0. 001, 0. 001 ), which, in this example, gives reater than 99% of the prior mass on an effective degrees of freedom greater than 9. 9, again showing the inappropriateness of this prior. It is appealing to extend the above seam to nonlinear models but unfortunately this is not straightforward. For a nonlinear model, the degrees of freedom may be approximated by C df = tr{(C T W C + where W = diag Vi? 1 d? i dh 2 )? 1 C T W C }, and h = g ? 1 denotes the inverse link function. Unfortunately, this quantity depends on ? and b , which means that in practice, we would have to use prior estimates for all of the parameters, which may not be practically possible.Fitting the model exploitation likelihood and then substituting in estimates for ? and b seems philosophically dubious. Bayesian GLMMs 403 Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 Fig. 1. Gamma prior for ? ?2 with parameters 0. 5 and 0. 005, (a) implied prior for ? , (b) implied prior for the effective degrees of freedom, and (c) effective degrees of freedom versus ? . 4. 4 Random walk models Conditionally represented smoothing models are popular for random effects in both(prenominal) lay and spatial applications (see, e. g. Besag and others, 1995; Rue and Held, 2005).For illustration, consider models of the form ? (m? r ) Q u 2 exp ? p(u |? u ) = (2? )? (m? r )/2 |Q |1/2 ? u 1 T u Qu , 2 2? u (4. 1) 404 Y. F ONG AND OTHERS where u = (u 1 , . . . , u m ) is the collection of random effects, Q is a (scaled) â€Å"precision” matrix of rank Q m ? r , whose form is situated by the application at hand, and |Q | is a generalized determinant which is the product over the m ? r nonzero eigenvalues of Q . Picking a prior for ? u is not straightforward because ? u has an interpretation as the conditional standard deviation, where the elements that are conditioned upon depends on the application.We may simulate realizations from (4. 1) to examine candidate p rior distributions. overdue to the rank deficiency, (4. 1) does not define a hazard density, and so we cannot directly simulate from this prior. However, Rue and Held (2005) give an algorithm for generating samples from (4. 1): 1. Simulate z j ? N (0, ?? 1 ), for j = m ? r + 1, . . . , m, where ? j are the eigenvalues of Q (there are j m ? r nonzero eigenvalues as Q has rank m ? r ). 2. decrease u = z m? r +1 e n? r +1 + z 3 e 3 + • • • + z n e m = E z , where e j are the corresponding eigenvectors of Q , E is the m ? (m ? ) matrix with these eigenvectors as columns, and z is the (m ? r ) ? 1 vector containing z j , j = m ? r + 1, . . . , m. The simulation algorithm is conditioned so that samples are zero in the null-space of Q ; if u is a sample and the null-space is spanned by v 1 and v 2 , then u T v 1 = u T v 2 = 0. For example, suppose Q 1 = 0 so that the null-space is spanned by 1, and the rank deficiency is 1. Then Q is improper since the eigenvalue corre sponding to 1 is zero, and samples u produced by the algorithm are such that u T 1 = 0. In Section 5. 2, we use this algorithm to evaluate opposite priors via simulation.It is also useful to note that if we wish to cast the marginal variances only, simulation is not required, as they are available as the diagonal elements of the matrix j ?? 1 e j e T . j j 5. E XAMPLES Here, we report 3 examples, with 4 others described in the supplementary material available at Biostatistics online. Together these cover all the examples in Breslow and Clayton (1993), along with an additional spline example. In the first example, results exploitation the INLA numerical/analytical approximation described in Section 3 were compared with MCMC as implemented in the JAGS software (Plummer, 2009) and found to be accurate.For the models considered in the second and third examples, the approximation was compared with the MCMC implementation contained in the INLA software. 5. 1 longitudinal data We consi der the much analyzed epilepsy data set of Thall and Vail (1990). These data concern the number ? of seizures, Yi j for patient i on catch j, with Yi j |? , b i ? ind Poisson(? i j ), i = 1, . . . , 59, j = 1, . . . , 4. We concentrate on the 3 random-effects models fitted by Breslow and Clayton (1993): log ? i j = x i j ? + b1i , (5. 1) (5. 2) (5. 3) Downloaded from http://biostatistics. oxfordjournals. rg/ at Cornell University Library on April 20, 2013 log ? i j = x i j ? + b1i + b2i V j /10, log ? i j = x i j ? + b1i + b0i j , where x i j is a 1 ? 6 vector containing a 1 (representing the intercept), an indicator for baseline measurement, a treatment indicator, the baseline by treatment interaction, which is the parameter of interest, age, and all an indicator of the fourth visit (models (5. 1) and (5. 2) and denoted V4 ) or visit number coded ? 3, ? 1, +1, +3 (model (5. 3) and denoted V j /10) and ? is the associated fixed effect. All 3 models 2 include patient-specific rand om effects b1i ? N 0, ? , while in model (5. 2), we introduce independent 2 ). warning (5. 3) includes random effects on the slope associated with â€Å"measurement errors,” b0i j ? N (0, ? 0 Bayesian GLMMs 405 Table 1. PQL and INLA summaries for the epilepsy data Variable Base Trt Base ? Trt Age V4 or V/10 ? 0 ? 1 ? 2 Model (5. 1) PQL 0. 87 ± 0. 14 ? 0. 91 ± 0. 41 0. 33 ± 0. 21 0. 47 ± 0. 36 ? 0. 16 ± 0. 05 †0. 53 ± 0. 06 †INLA 0. 88 ± 0. 15 ? 0. 94 ± 0. 44 0. 34 ± 0. 22 0. 47 ± 0. 38 ? 0. 16 ± 0. 05 †0. 56 ± 0. 08 †Model (5. 2) PQL 0. 86 ± 0. 13 ? 0. 93 ± 0. 40 0. 34 ± 0. 21 0. 47 ± 0. 35 ? 0. 10 ± 0. 09 0. 36 ± 0. 04 0. 48 ± 0. 06 †INLA 0. 8 ± 0. 15 ? 0. 96 ± 0. 44 0. 35 ± 0. 23 0. 48 ± 0. 39 ? 0. 10 ± 0. 09 0. 41 ± 0. 04 0. 53 ± 0. 07 †Model (5. 3) PQL 0. 87 ± 0. 14 ? 0. 91 ± 0. 41 0. 33 ± 0. 21 0. 46 ± 0. 36 ? 0. 26 ± 0. 16 †0. 52 ± 0. 06 0. 74 ± 0. 16 INLA 0. 88 ± 0. 14 ? 0. 94 ± 0. 44 0. 34 ± 0. 22 0. 47 ± 0. 38 ? 0. 27 ± 0. 16 †0. 56 ± 0. 06 0. 70 ± 0. 14 Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 visit, b2i with b1i b2i ? N (0, Q ? 1 ). (5. 4) We assume Q ? Wishart(r, S ) with S = S11 S12 . For prior specification, we begin with the bivariate S21 S22 model and assume that S is diagonal.We assume the upper 95% point of the priors for exp(b1i ) and exp(b2i ) are 5 and 4, respectively, and that the marginal distributions are t with 4 degrees of freedom. Following the procedure outlined in Section 4. 2, we obtain r = 5 and S = diag(0. 439, 0. 591). We take ? 2 the prior for ? 1 in model (5. 1) to be Ga(a1 , a2 ) with a1 = (r ? 1)/2 = 2 and a2 = 1/2S11 = 1. cxl (so that this prior coincides with the marginal prior obtained from the bivariate specification). In model (5. 2), ? 2 ? 2 we assume b1i and b0i j are independent, and t hat ? 0 follows the similar prior as ? , that is, Ga(2, 1. 140). We assume a flat prior on the intercept, and assume that the rate ratios, exp(? j ), j = 1, . . . , 5, lie between 0. 1 and 10 with probability 0. 95 which gives, using the approach described in Section 4. 1, a normal prior with mean 0 and variance 1. 172 . Table 1 gives PQL and INLA summaries for models (5. 1â€5. 3). There are some differences between the PQL and Bayesian analyses, with just about larger standard deviations under the latter, which probably reflects that with m = 59 clusters, a little accuracy is alienated when using asymptotic inference.There are some differences in the point estimates which is at least partly due to the nonflat priors usedâ€the priors have relatively large variances, but here the data are not so abundant so there is sensitivity to the prior. reassuringly under all 3 models inference for the baseline-treatment interaction of interest is virtually y identical and suggests no s ignificant treatment effect. We may compare models using log p(y ): for 3 models, we obtain values of ? 674. 8, ? 638. 9, and ? 665. 5, so that the second model is strongly preferred. 5. Smoothing of support cohort effects in an age-cohort model We analyze data from Breslow and Day (1975) on breast cancer evaluate in Iceland. Let Y jk be the number of breast cancer of cases in age group j (20â€24,. . . , 80â€84) and birth cohort k (1840â€1849,. . . ,1940â€1949) with j = 1, . . . , J = 13 and k = 1, . . . , K = 11. Following Breslow and Clayton (1993), we assume Y jk |? jk ? ind Poisson(? jk ) with log ? jk = log n jk + ? j + ? k + vk + u k (5. 5) and where n jk is the person-years denominator, exp(? j ), j = 1, . . . , J , represent fixed effects for age relative risks, exp(? is the relative risk associated with a one group increase in cohort group, vk ? iid 406 Y. F ONG AND OTHERS 2 N (0, ? v ) represent unstructured random effects associated with cohort k, with smoo th cohort terms u k following a second-order random-effects model with E[u k |{u i : i < k}] = 2u k? 1 ? u k? 2 and Var(u k |{u i : 2 i < k}) = ? u . This latter model is to allow the judge to vary smoothly with cohort. An equivalent representation of this model is, for 2 < k < K ? 1, 1 E[u k |{u l : l = k}] = (4u k? 1 + 4u k+1 ? u k? 2 ? u k+2 ), 6 Var(u k |{u l : l = k}) = 2 ? . 6 Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 The rank of Q in the (4. 1) representation of this model is K ? 2 reflecting that both the general level and the overall trend are aliased (hence the appearance of ? in (5. 5)). The term exp(vk ) reflects the unstructured residual relative risk and, following the argument in Section 4. 2, we specify that this quantity should lie in [0. 5, 2. 0] with probability 0. 95, with a marginal log Cauchy ? 2 distribution, to obtain the gamma prior ? v ? Ga(0. 5, 0. 00149).The term exp(u k ) refl ects the smooth component of the residual relative risk, and the specification of a 2 prior for the associated variance component ? u is more difficult, given its conditional interpretation. Using the algorithm described in Section 4. 2, we examined simulations of u for different choices of gamma ? 2 hyperparameters and decided on the choice ? u ? Ga(0. 5, 0. 001); Figure 2 shows 10 realizations from the prior. The rationale here is to examine realizations to see if they conform to our prior expectations and in particular exhibit the required amount of smoothing.All but one of the realizations vary smoothly across the 11 cohorts, as is desirable. Due to the tail of the gamma distribution, we will always have some extreme realizations. The INLA results, summarized in graphical form, are presented in Figure 2(b), on board likelihood fits in which the birth cohort effect is incorporated as a linear term and as a factor. We see that the smoothing model provides a smooth fit in birth coh ort, as we would hope. 5. 3 B-Spline nonparametric regression We demonstrate the use of INLA for nonparametric smoothing using O’Sullivan splines, which are based on a B-spline basis.We illustrate using data from Bachrach and others (1999) that concerns longitudinal measurements of spinal bone mineral density (SBMD) on 230 female subjects aged between 8 and 27, and of 1 of 4 ethnic groups: Asiatic, Black, Latino, and White. Let yi j denote the SBMD measure for subject i at occasion j, for i = 1, . . . , 230 and j = 1, . . . , n i with n i being between 1 and 4. Figure 3 shows these data, with the gray lines indicating measurements on the same woman. We assume the model K Yi j = x i ? 1 + agei j ? 2 + k=1 z i jk b1k + b2i + ij, where x i is a 1 ? vector containing an indicator for the ethnicity of individual i, with ? 1 the associated 4 ? 1 vector of fixed effects, z i jk is the kth basis associated with age, with associated parameter b1k ? 2 2 N (0, ? 1 ), and b2i ? N (0, ? 2 ) are woman-specific random effects, finally, i j ? iid N (0, ? 2 ). All random terms are faux independent. Note that the spline model is assumed common to all ethnic groups and all women, though it would be straightforward to allow a different spline for each ethnicity. Writing this model in the form y = x ? + z 1b1 + z 2b 2 + = C ? + . Bayesian GLMMs 407Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 Fig. 2. (a) Ten realizations (on the relative risk scale) from the random effects second-order random walk model in which the prior on the random-effects precision is Ga(0. 5,0. 001), (b) summaries of fitted models: the solid line corresponds to a log-linear model in birth cohort, the circles to birth cohort as a factor, and â€Å"+” to the Bayesian smoothing model. we use the method described in Section 4. 3 to examine the effective number of parameters implied by the ? 2 ? 2 priors ? 1 ? Ga(a1 , a2 ) and ? 2 ? Ga( a3 , a4 ).To fit the model, we first use the R code provided in truncheon and Ormerod (2008) to construct the basis functions, which are then stimulant drug to the INLA program. Running the REML version of the model, we obtain 2 ? = 0. 033 which we use to evaluate the effective degrees of freedoms associated with priors for ? 1 and 2 . We assume the usual improper prior, ? (? 2 ) ? 1/? 2 for ? 2 . After some experimentation, we settled ? 2 408 Y. F ONG AND OTHERS Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 Fig. 3. SBMD versus age by ethnicity. Measurements on the same woman are joined with gray lines.The solid curve corresponds to the fitted spline and the dashed lines to the individual fits. ?2 2 on the prior ? 1 ? Ga(0. 5, 5 ? 10? 6 ). For ? 2 , we wished to have a 90% separation for b2i of ±0. 3 which, ? 2 with 1 degree of freedom for the marginal distribution, leads to ? 2 ? Ga(0. 5, 0. 00113). Figure 4 shows t he priors for ? 1 and ? 2 , along with the implied effective degrees of freedom under the assumed priors. For the spline component, the 90% prior time interval for the effective degrees of freedom is [2. 4,10]. Table 2 compares estimates from REML and INLA implementations of the model, and we see close correspondence between the 2.Figure 4 also shows the posterior medians for ? 1 and ? 2 and for the 2 effective degrees of freedom. For the spline and random effects these correspond to 8 and 214, respectively. The latter figure shows that there is considerable variation between the 230 women here. This is confirmed in Figure 3 where we observe large vertical differences between the profiles. This figure also shows the fitted spline, which appears to mimic the trend in the data well. 5. 4 Timings For the 3 models in the longitudinal data example, INLA takes 1 to 2 s to run, using a single CPU.To get estimates with similar precision with MCMC, we ran JAGS for 100 000 iterations, which took 4 to 6 min. For the model in the temporal smoothing example, INLA takes 45 s to run, using 1 CPU. reveal of the INLA procedure can be executed in a parallel manner. If there are 2 CPUs available, as is the case with today’s dominant INTEL Core 2 Duo processors, INLA only takes 27 s to run. It is not currently possible to implement this model in JAGS. We ran the MCMC utility-grade built into the INLA software for 3. 6 million iterations, to obtain estimates of comparable accuracy, which took 15 h.For the model in the B-spline nonparametric regression example, INLA took 5 s to run, using a single CPU. We ran the MCMC utility built into the INLA software for 2. 5 million iterations to obtain estimates of comparable accuracy, the analysis taking 40 h. Bayesian GLMMs 409 Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 Fig. 4. Prior summaries: (a) ? 1 , the standard deviation of the spline coefficients, (b) effecti ve degrees of freedom associated with the prior for the spline coefficients, (c) effective degrees of freedom versus ? , (d) ? 2 , the standard deviation of the between-individual random effects, (e) effective degrees of freedom associated with the individual random effects, and (f) effective degrees of freedom versus ? 2 . The vertical dashed lines on panels (a), (b), (d), and (e) correspond to the posterior medians. Table 2. REML and INLA summaries for spinal bone data. Intercept corresponds to Asian group Variable Intercept Black Hispanic White Age ? 1 ? 2 ? REML 0. 560 ± 0. 029 0. 106 ± 0. 021 0. 013 ± 0. 022 0. 026 ± 0. 022 0. 021 ± 0. 002 0. 018 0. 109 0. 033 INLA 0. 563 ± 0. 031 0. 106 ± 0. 021 0. 13 ± 0. 022 0. 026 ± 0. 022 0. 021 ± 0. 002 0. 024 ± 0. 006 0. 109 ± 0. 006 0. 033 ± 0. 002 Note: For the entries marked with a standard errors were unavailable. 410 Y. F ONG AND OTHERS 6. D ISCUSSION In this paper, we have demonstrated the use of the INLA computational method for GLMMs. We have found that the approximation strategy employed by INLA is accurate in general, but less accurate for binomial data with small denominators. The supplementary material available at Biostatistics online contains an extensive simulation study, replicating that presented in Breslow and Clayton (1993).There are some suggestions in the discussion of Rue and others (2009) on how to construct an improved Gaussian approximation that does not use the mode and the curvature at the mode. It is likely that these suggestions will improve the results for binomial data with small denominators. There is an urgent need for diagnosis tools to flag when INLA is inaccurate. Conceptually, computation for nonlinear mixed effects models (Davidian and Giltinan, 1995; Pinheiro and Bates, 2000) can also be handled by INLA but this capability is not currently available. The website www. r-inla. rg contains all the data and R scripts to perform the analyses a nd simulations reported in the paper. The latest release of software to implement INLA can also be found at this site. Recently, Breslow (2005) revisited PQL and concluded that, â€Å"PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. ” We believe that INLA provides an attractive alternative to PQL for GLMMs, and we hope that this paper stimulates the greater use of Bayesian methods for this class. Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013S UPPLEMENTARY MATERIAL supplemental material is available at http://biostatistics. oxfordjournals. org. ACKNOWLEDGMENT scrap of Interest: None declared. F UNDING National Institutes of wellness (R01 CA095994) to J. W. Statistics for Innovation (sfi. nr. no) to H. R. R EFERENCES BACHRACH , L. K. , H ASTIE , T. , WANG , M. C. , NARASIMHAN , B. AND M arcus , R. (1999). Bone mineral acquisition in hale Asian, Hispanic, Black and Caucasian youth. A longitudinal study. The diary of Clinical Endocrinology and Metabolism 84, 4702â€4712. B ESAG , J. , G REEN , P. J. , H IGDON , D. AND M ENGERSEN , K. 1995). Bayesian computation and stochastic systems (with discussion). statistical Science 10, 3â€66. B RESLOW, N. E. (2005). Whither PQL? In: Lin, D. and Heagerty, P. J. (editors), minutes of the Second Seattle Symposium. New York: Springer, pp. 1â€22. B RESLOW, N. E. AND C LAYTON , D. G. (1993). Approximate inference in generalized linear mixed models. ledger of the American statistical affiliation 88, 9â€25. B RESLOW, N. E. AND DAY, N. E. (1975). Indirect standardization and increasing models for rates, with reference to the age adjustment of cancer relative incidence and relative frequency data.Journal of Chronic Diseases 28, 289â€301. C LAYTON , D. G. (1996). reason out linear mixed models. In: Gilks, W. R. , Richardson, S. and Spiegelhalter, D. J. (editors), Markov Chain Monte C arlo in Practice. capital of the United Kingdom: Chapman and Hall, pp. 275â€301. Bayesian GLMMs 411 C RAINICEANU , C. M. , D IGGLE , P. J. AND ROWLINGSON , B. (2008). Bayesian analysis for penalized spline regression using winBUGS. Journal of the American statistical crosstie 102, 21â€37. C RAINICEANU , C. M. , RUPPERT, D. AND WAND , M. P. (2005). Bayesian analysis for penalized spline regression using winBUGS. Journal of statistical software 14.DAVIDIAN , M. AND G ILTINAN , D. M. (1995). Nonlinear Models for Repeated Measurement Data. capital of the United Kingdom: Chapman and Hall. D I C ICCIO , T. J. , K cigarette , R. E. , R AFTERY, A. AND WASSERMAN , L. (1997). Computing Bayes factors by combine simulation and asymptotic approximations. Journal of the American Statistical linkup 92, 903â€915. Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 D IGGLE , P. , H EAGERTY, P. , L IANG , K. -Y. Oxford: Oxford Unive rsity Press. AND Z EGER , S. (2002). Analysis of Longitudinal Data, 2nd edition. FAHRMEIR , L. , K NEIB , T.AND L ANG , S. (2004). Penalized structured additive regression for space-time data: a Bayesian perspective. Statistica Sinica 14, 715â€745. G AMERMAN , D. (1997). Sampling from the posterior distribution in generalized linear mixed models. Statistics and Computing 7, 57â€68. G ELMAN , A. (2006). Prior distributions for variance parameters in class-conscious models. Bayesian Analysis 1, 515â€534. H ASTIE , T. J. AND T IBSHIRANI , R. J. (1990). Generalized Additive Models. London: Chapman and Hall. H OBERT, J. P. AND C ASELLA , G. (1996). The effect of improper priors on Gibbs sampling in hierarchical linear mixed models.Journal of the American Statistical Association 91, 1461â€1473. K ASS , R. E. AND S TEFFEY, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Stat istical Association 84, 717â€726. K ELSALL , J. E. AND WAKEFIELD , J. C. (1999). Discussion of â€Å"Bayesian models for spatially correlated disease and exposure data” by N. Best, I. Waller, A. Thomas, E. Conlon and R. Arnold. In: Bernardo, J. M. , Berger, J. O. , Dawid, A. P. and Smith, A. F. M. (editors), Sixth Valencia International Meeting on Bayesian Statistics. London: Oxford University Press.M C C ULLAGH , P. AND N ELDER , J. A. (1989). Generalized one-dimensional Models, 2nd edition. London: Chapman and Hall. M C C ULLOCH , C. E. , S EARLE , S. R. AND N EUHAUS , J. M. (2008). Generalized, Linear, and Mixed Models, 2nd edition. New York: John Wiley and Sons. M ENG , X. AND W ONG , W. (1996). Simulating ratios of normalizing constants via a simple identity. Statistical Sinica 6, 831â€860. NATARAJAN , R. AND K ASS , R. E. (2000). Reference Bayesian methods for generalized linear mixed models. Journal of the American Statistical Association 95, 227â€237. N ELDE R , J. AND W EDDERBURN , R. (1972). Generalized linear models.Journal of the Royal Statistical Society, serial publication A 135, 370â€384. P INHEIRO , J. C. AND BATES , D. M. (2000). Mixed-Effects Models in S and S-plus. New York: Springer. P LUMMER , M. (2008). Penalized prejudice functions for Bayesian model comparison. Biostatistics 9, 523â€539. P LUMMER , M. (2009). Jags version 1. 0. 3 manual. Technical Report. RUE , H. AND H ELD , L. (2005). Gaussian Markov Random Fields: Thoery and Application. Boca Raton: Chapman and Hall/CRC. RUE , H. , M ARTINO , S. AND C HOPIN , N. (2009). Approximate Bayesian inference for possible Gaussian models using integrated nested laplace approximations (with discussion).Journal of the Royal Statistical Society, Series B 71, 319â€392. 412 RUPPERT, D. R. , WAND , M. P. University Press. AND Y. F ONG AND OTHERS C ARROLL , R. J. (2003). Semiparametric Regression. New York: Cambridge S KENE , A. M. AND WAKEFIELD , J. C. (1990). Hierarchica l models for multi-centre binary response studies. Statistics in Medicine 9, 919â€929. S PIEGELHALTER , D. , B EST, N. , C ARLIN , B. AND VAN DER L INDE , A. (1998). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B 64, 583â€639. S PIEGELHALTER , D. J. , T HOMAS , A.AND B EST, N. G. (1998). WinBUGS drug user Manual. Version 1. 1. 1. Cambridge. T HALL , P. F. AND VAIL , S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46, 657â€671. V ERBEKE , G. V ERBEKE , G. AND AND Downloaded from http://biostatistics. oxfordjournals. org/ at Cornell University Library on April 20, 2013 M OLENBERGHS , G. (2000). Linear Mixed Models for Longitudinal Data. New York: Springer. M OLENBERGHS , G. (2005). Models for clear-cut Longitudinal Data. New York: Springer. WAKEFIELD , J. C. (2007). Disease mapping and spatial regression with count data.Biostatistics 8, 158â€183. WAKEFIEL D , J. C. (2009). Multi-level modelling, the ecologic fallacy, and hybrid study designs. International Journal of Epidemiology 38, 330â€336. WAND , M. P. AND O RMEROD , J. T. (2008). On semiparametric regression with O’Sullivan penalised splines. Australian and New Zealand Journal of Statistics 50, 179â€198. Z EGER , S. L. AND K ARIM , M. R. (1991). Generalized linear models with random effects: a Gibbs sampling approach. Journal of the American Statistical Association 86, 79â€86. [Received September 4, 2009; revise November 4, 2009; accepted for publication November 6, 2009]\r\n'

No comments:

Post a Comment