1. INTRODUCTION

(Erp, Oberski, and Mulder 2018) provided an interesting comparison of different shrinkage priors in a regularized penalization task. The work visualized and described shrinkage priors well and compared them in a variety of conditions. However, the paper contained also some deficiencies regarding simulated data, convergence diagnostics and predictive performance presentation. The simulation conditions were not sparse enough to sufficiently investigate shrinkage abilities of the priors. Moreover, convergence problems were investigated only by investigating the trace plots and in some cases even by excluding simulation draws. The paper also used a large step size (1) and a low target acceptance rate (0.85). In addition, number of computed prediction mean square error (PMSE) values was inadequate in each simulation condition and their presentation difficult for a prior comparison.

The notebook presented here is a special assignment made under supervision of Prof. Aki Vehtari in Aalto university, aimed to provide tools and ideas to fix these shortcomings. The organization is as follows. Firstly, the data used is described in the section 2. The data is as presented in (Erp, Oberski, and Mulder 2018) but with more sparse representation and with a higher number of test samples. Secondly, the notebook describes how parallel coordinate and scatter plots can be used to diagnose convergence in section 3. The section concentrates on using Horseshoe and Regularized horseshoe priors as example models based on their convergence difficulties in (Erp, Oberski, and Mulder 2018). Moreover, statistics generated from multiple replications of the data is presented. In the section 4, model comparison is presented using PMSE and log pointwise predictive density (LPPD) values for test data. Originally, also the variable selection procedure described in (Erp, Oberski, and Mulder 2018) was planned to replace with projpred-package (Piironen and Vehtari 2017) but the high k-values prevented using it in this assignment. Finally, the conclusions of the assignment are discussed.

Another goal of the assignment is to provide a set of functions which can be utilized not only with the specific case described in this notebook, but also with a variety of different models and data sets by only renaming the directory paths. Moreover, the assignment details how Aalto-university students can utilize Triton in Stan model fittings in tasks which are practically infeasible with personal computers. The codes for Triton are not illustrated in the notebook but they can be attained from Triton folder along with usage instructions.

Next, let us start by including necessary libraries, functions and by choosing a working directory:

# Acquire packages
library(rstan)
library(gridExtra)
library(grid)
library(bayesplot)
library(knitr)
library(ggplot2)
library(parallel)
library("reshape2")
options(mc.cores = parallel::detectCores())
knitr::opts_knit$set(root.dir = normalizePath("~/Documents/Rbayes/Bayes_project_files/notebook"))
# Import from utilities folder
source("model_diagnostics.R")
source("data_generation.R")
source("model_generator.R")
source("choose_fits.R")

2. DATA

In (Erp, Oberski, and Mulder 2018) six different data sets, called as simulation conditions were illustrated. This notebook uses the first condition but with two modifications. Firstly, the number of coefficients is increased to \(38\) by padding \(30\) zeros in the end of the original coefficients. The reason for this is to improve the comparison of shrinkage abilities of the priors. In addition, the test set is enlarged to \(2000\) samples to attain more accurate distributions for PMSEs and LPPDs. From here on, the data set will be referred as modified simulation condition 1 and can be generated with the following code:

# Creates "project_data.RData"
data_generation()

3. STAN MODELS AND FITS

As discussed into the introduction, the special assignment has two main goals: Provide a convergence diagnosis for Horseshoe prior and to carry out a model comparison. The former is obtained by experimenting with three different parameterizations for Horseshoe prior and with Regularized Horseshoe. The parametrizations focus on reforming the Half-Cauchy distribution which is the main reason for convergence problems in Horseshoe prior. The latter is performed with the same 9 priors (including Horseshoe prior) as in (Erp, Oberski, and Mulder 2018) for simulation condition 1 with full Bayes approach. In total, 11 Stan models is created. The specifics of Horseshoe prior parametrizations are as follows:

Horseshoe 1: The same as given in (Erp, Oberski, and Mulder 2018) but with added prior for intercept \[\beta_0 \sim \mathcal{N}(0, 10) \] and with added generated quantity as test set log likelihood for LPPD computation. The original Stan code by Sara van Erp is slightly modified by using vectorized representations for speeding up computations and swapping the variable names of \(\tau\) and \(\lambda\).

Horseshoe 2. The same as Horseshoe 1 but with Tan-function parametrization for Half-Cauchy distribution as introduced by (Betancourt 2018).

Horseshoe 3. The same as Horseshoe 1 but with inverse-Gamma parametrization for Half-Cauchy distribution again described by (Betancourt 2018)

Other 8 models are otherwise the same as in (Erp, Oberski, and Mulder 2018) but as Horseshoe parametrizations, include computation of test set log-likelihoods and vectorized computing of \(y_{test}\) values. All the models can be created with following code block:

# Import from stan_files folder
stan_files <- list.files()[grep(".stan", list.files())]
model_generator(project_data, stan_files, 0)

Stan fits for all models are generated using \(sampling\) function with number of iterations as 8000, number of chains as 4, adapt_delta as 0.999 and max_treedepth ad 15. The high value of adaptation delta is necessary for not only Horseshoe prior which is prone to convergence problems but to also for the other models to ensure proper convergence. Number of iterations is higher than in (Erp, Oberski, and Mulder 2018) in order to prevent occurences of low effective sample size (\(N_{eff}\)) values. Fits can be generated exploiting again \(model.generator\) function. In this notebook, only fits for horseshoe parametrizations are necessary for divergent transition visualization and they are obtained as follows:

load("project_data.RData")
model_files <- c("horseshoe_1.SM", "horseshoe_2.SM", "horseshoe_3.SM")
model_generator(project_data, model_files, 1)

4. CONVERGENCE DIAGNOSTICS FOR HORSESHOE AND REGULARIZED HORSESHOE PRIORS

This section consists of two parts. The first demonstrates how parallel coordinate and scatter plots can be utilized in divergent transition visualization. In addition, Rhat and \(N_{eff}\) values extracted from the Stan fit are illustrated. The second part performs wider statistical diagnosis with multiple replications of the data. As previously discussed, all experiments are performed with Horseshoe prior and the main objective of the section is to used methods discussed to obtain optimal parametrization for the prior.

4.1 Experiments with 1 replication

Rhat and \(N_{eff}\) values are standard measures for convergence of Stan models (Gelman et al. 1995). However, in some cases the models may also suffer from divergent transitions which need to be examined with other tools. In (Erp, Oberski, and Mulder 2018), authors were investigating trace plots and in some cases dropping out simulations with high number of divergent transitions. This assignment proposes the use of parallel coordinate and scatter plots as suggested in (Gabry et al. 2017), in order to find out the nature of the transitions, that is, if they are false positives or not.

In parallel coordinate plots, the MCMC draws of Stan model are illustrated in two dimensions, which are parameters and their draw samples. The plots can be generated using \(mcmc.parcoord\) function and starting points of divergent transitions can be made visible using \(nuts \enspace parameters\). Generally, the parallel coordinates enable investigating divergent trajectories which are concentrated in the position of certain parameter and do not follow the same patterns as other draw trajectories. However, the parallel coordinates have trouble to visualize in the situations when the model has high number of parameters and the parameter values have high variance. Such is the case in this assignment, and in order to counter these problems, all draws in each chain are normalized between 0 and 1 and only 6 variables with lowest \(N_{eff}\) values are chosen for investigation. To ensure more clear visualization of trajectories, draw values are finally computed in logarithmic form with an added bias as 0.5 to ensure positivity. Non-divergent and divergent trajectories are plotted in black and green, respectively.

Other option for examining divergent transitions is to generate scatter plots for draw samples of two chosen parameters from distribution. As previously, \(nuts \enspace parameters\) can then be exploited to detect samples which are linked to divergent transitions. If these samples are concentrated in some location in scatter plot, model convergence could be questioned. Here, same draw values, parameters and coloration for divergent transitions as in parallel coordinate plots are presented in three different scatter plots for each Horseshoe parametrization.

fitfls <- list.files()[grep(".FIT", list.files())]
dt_visual(fitfls[1])
[1] "Model name: horseshoe_1 "
[1] "Number of RHAT values above 1.1: 0"
[1] "Number of divergent transitions: 325"
[1] "Minimum N_eff value and corresponding variable: 68, lambda[1]"