Multivariate Bayesian Variable Selection and Prediction
Caleb Jin
/ 2019-02-10
Consider a multivariate vairable regression model,
Y=XB+E,
where Y=(y1,y2,…,yn)⊤ is the n×q response matrix, X=(x1,…,xp) is the n×p design matrix, B=(β1,…,βp)⊤ is a p×q unknown coefficient matrix, and E=(e1,e2,…,en)⊤ is an n×q error matrix with eii.i.d∼Nq(0,Σ). To identify the variables selected among p regressors, (Brown and Vannucci 1998) introduced a latent binary vector ξ=(ξ1,…,ξp)⊤ with
ξj={1 if βj is active0 if βj is not active.
The conjugate priors are considered as follows:
π(Bξ|Σ,ξ)∼MN(0,Hξ,Σ)π(Σ)∼IW(Q,δ)π(ξ)ind∼p∏j=1Ber(ωj),
where Q, Hξ and δ are hyperparameters.
Then, by Bayesian theorem, the marginal posterior distribution can be obtained by
m(ξ|Y)∝m(Y|ξ)π(ξ)=∬f(Y|Bξ,Σ)π(Bξ|Σ,ξ)π(Σ)dBdΣπ(ξ)=∬(2π)−nq2|Σ|−n2exp[−12tr{(Y−XξBξ)⊤(Y−XξBξ)}Σ−1]×(2π)−|ξ|q2|Σ|−|ξ|2|Hξ|−q2exp[12tr{B⊤ξH−1ξBξΣ−1}]×|Q|δ22−12δqξ−1q(δ2)|Σ|−δ+q+12exp{−12tr(QΣ−1)}dBξdΣπ(ξ)∝(2π)−|ξ|q2|Hξ|−q2∬|Σ|−12(n+|ξ|+δ+q+1)exp{−12tr(QΣ−1)}exp[−12tr{(B⊤ξ(X⊤ξXξ+H−1ξ)Bξ−2B⊤ξX⊤ξY)Σ−1}]×exp[−12tr{Y⊤YΣ−1}]dBξdΣπ(ξ)=(2π)−|ξ|q2|Hξ|−q2∬|Σ|−12(n+|ξ|+δ+q+1)exp{−12tr[(Y⊤Y+Q)Σ−1]}×exp[−12tr{(B⊤ξKξBξ−2B⊤ξKξK−1ξX⊤ξY)Σ−1}]dBξdΣπ(ξ)=(2π)−|ξ|q2|Hξ|−q2∬|Σ|−12(n+|ξ|+δ+q+1)×exp{−12tr[(Y⊤Y−~B⊤ξKξ~Bξ+Q)Σ−1]}×exp[−12tr{((Bξ−~Bξ)⊤Kξ(Bξ−~Bξ))Σ−1}]dBξdΣπ(ξ)=(2π)−|ξ|q2|Hξ|−q2(2π)|ξ|q2|Kξ|−q2×∫|Σ||ξ|2|Σ|−12(n+|ξ|+δ+q+1)exp{−12tr[(Y⊤Y−~B⊤ξKξ~Bξ+Q)Σ−1]}dΣπ(ξ)=|Hξ|−q2|Kξ|−q2∫|Σ|−12(n+δ+q+1)×exp{−12tr[(Y⊤Y−Y⊤XξK−1ξX⊤ξY+Q)Σ−1]}dΣπ(ξ)=|HξKξ|−q2∫|Σ|−12(n+δ+q+1)×exp{−12tr[(Y⊤(In−XξK−1ξX⊤ξ)Y+Q)Σ−1]}dΣπ(ξ)∝|HξKξ|−q2|Y⊤(In−XξK−1ξX⊤ξ)Y+Q|−n+δ2π(ξ)≡g(ξ|Y),(1)
where Kξ=X⊤ξXξ+H−1ξ and g(ξ|Y) is the proportional form of π(ξ|Y). (Brown and Vannucci 1998) set Q=kIq and Hξ=c(X⊤ξXξ)−1 and ωj=ω, then π(ξ)=∏pj=1ωξjj(1−ωj)1−ξj=ω|ξ|(1−ω)p−|ξ|. Apply Q=kIq and Hξ=c(X⊤ξXξ)−1 into (1), hence we have
logg(ξ|Y)=−n+δ2log|kIq+Y⊤Y−cc+1Y⊤Xξ(X⊤ξX−1ξ)X⊤ξY|−q|ξ|2log(c+1)+|ξ|logω+(p−|ξ|)log(1−ω).
Exhaustive computation of g(ξ|Y) for 2p values of ξ becomes prohibitive even in a super computer when p is larger than 40. In this circumstances, using MCMC sampling to explore marginal posterior is possible. (Brown and Vannucci 1998) use Gibbs sampler to generate each ξ value component-wise from full conditional distributions π(ξj|ξ−j,Y), where ξ−j=ξ∖ξj . It is easy to show that
π(ξj=1|ξ−j,Y)=θjθj+1,
where
θj=g(ξj=1,ξ−j|Y)g(ξj=0,ξ−j|Y).
The complete algorithm of (Brown and Vannucci 1998) is described in Algorithm 3:
For the hyperparameter, we follow the same setting with (Brown and Vannucci 1998), which are δ=2+q,ω=20/p,k=2 and c=10. Note that since our definition of inverse-wishart distribution is different from that of (Brown and Vannucci 1998), the values of q and k we set here are after adjustment to make them consistent with (Brown and Vannucci 1998).
Reference
Brown, P. J., and M. Vannucci. 1998. “Multivariate Bayesian Variable Selection and Prediction.” Journal of the Royal Statistical Society: Series B (Methodological), 627–41.