Caleb

Stats Ph.D.

Multivariate Bayesian Variable Selection and Prediction

Caleb Jin / 2019-02-10


Consider a multivariate vairable regression model, Y=XB+E, where Y=(y1,y2,,yn) is the n×q response matrix, X=(x1,,xp) is the n×p design matrix, B=(β1,,βp) is a p×q unknown coefficient matrix, and E=(e1,e2,,en) is an n×q error matrix with eii.i.dNq(0,Σ). To identify the variables selected among p regressors, (Brown and Vannucci 1998) introduced a latent binary vector ξ=(ξ1,,ξp) with ξj={1 if βj is active0 if βj is not active. The conjugate priors are considered as follows: π(Bξ|Σ,ξ)MN(0,Hξ,Σ)π(Σ)IW(Q,δ)π(ξ)indj=1pBer(ωj), where Q, Hξ and δ are hyperparameters. Then, by Bayesian theorem, the marginal posterior distribution can be obtained by m(ξ|Y)m(Y|ξ)π(ξ)=f(Y|Bξ,Σ)π(Bξ|Σ,ξ)π(Σ)dBdΣπ(ξ)=(2π)nq2|Σ|n2exp[12tr{(YXξBξ)(YXξBξ)}Σ1]×(2π)|ξ|q2|Σ||ξ|2|Hξ|q2exp[12tr{BξHξ1BξΣ1}]×|Q|δ2212δqξq1(δ2)|Σ|δ+q+12exp{12tr(QΣ1)}dBξdΣπ(ξ)(2π)|ξ|q2|Hξ|q2|Σ|12(n+|ξ|+δ+q+1)exp{12tr(QΣ1)}exp[12tr{(Bξ(XξXξ+Hξ1)Bξ2BξXξY)Σ1}]×exp[12tr{YYΣ1}]dBξdΣπ(ξ)=(2π)|ξ|q2|Hξ|q2|Σ|12(n+|ξ|+δ+q+1)exp{12tr[(YY+Q)Σ1]}×exp[12tr{(BξKξBξ2BξKξKξ1XξY)Σ1}]dBξdΣπ(ξ)=(2π)|ξ|q2|Hξ|q2|Σ|12(n+|ξ|+δ+q+1)×exp{12tr[(YYB~ξKξB~ξ+Q)Σ1]}×exp[12tr{((BξB~ξ)Kξ(BξB~ξ))Σ1}]dBξdΣπ(ξ)=(2π)|ξ|q2|Hξ|q2(2π)|ξ|q2|Kξ|q2×|Σ||ξ|2|Σ|12(n+|ξ|+δ+q+1)exp{12tr[(YYB~ξKξB~ξ+Q)Σ1]}dΣπ(ξ)=|Hξ|q2|Kξ|q2|Σ|12(n+δ+q+1)×exp{12tr[(YYYXξKξ1XξY+Q)Σ1]}dΣπ(ξ)=|HξKξ|q2|Σ|12(n+δ+q+1)×exp{12tr[(Y(InXξKξ1Xξ)Y+Q)Σ1]}dΣπ(ξ)|HξKξ|q2|Y(InXξKξ1Xξ)Y+Q|n+δ2π(ξ)(1)g(ξ|Y), where Kξ=XξXξ+Hξ1 and g(ξ|Y) is the proportional form of π(ξ|Y). (Brown and Vannucci 1998) set Q=kIq and Hξ=c(XξXξ)1 and ωj=ω, then π(ξ)=j=1pωjξj(1ωj)1ξj=ω|ξ|(1ω)p|ξ|. Apply Q=kIq and Hξ=c(XξXξ)1 into (1), hence we have logg(ξ|Y)=n+δ2log|kIq+YYcc+1YXξ(XξXξ1)XξY|q|ξ|2log(c+1)+|ξ|logω+(p|ξ|)log(1ω). Exhaustive computation of g(ξ|Y) for 2p values of ξ becomes prohibitive even in a super computer when p is larger than 40. In this circumstances, using MCMC sampling to explore marginal posterior is possible. (Brown and Vannucci 1998) use Gibbs sampler to generate each ξ value component-wise from full conditional distributions π(ξj|ξj,Y), where ξj=ξξj . It is easy to show that π(ξj=1|ξj,Y)=θjθj+1, where θj=g(ξj=1,ξj|Y)g(ξj=0,ξj|Y). The complete algorithm of (Brown and Vannucci 1998) is described in Algorithm 3:

For the hyperparameter, we follow the same setting with (Brown and Vannucci 1998), which are δ=2+q,ω=20/p,k=2 and c=10. Note that since our definition of inverse-wishart distribution is different from that of (Brown and Vannucci 1998), the values of q and k we set here are after adjustment to make them consistent with (Brown and Vannucci 1998).

Reference

Brown, P. J., and M. Vannucci. 1998. “Multivariate Bayesian Variable Selection and Prediction.” Journal of the Royal Statistical Society: Series B (Methodological), 627–41.