Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply self-calib to precondition prior #52

Open
wants to merge 171 commits into
base: advi_vignette
Choose a base branch
from

Conversation

hyunjimoon
Copy link
Owner

@hyunjimoon hyunjimoon commented Sep 11, 2021

We need prior and posterior to be similar ranges for valid testing (e.g. power).
For two parameter normal models, tail area of scale papermeter is honed after over twenty iteration.
image

Further improvement:

  1. all parameters are updated 1d way, but 2d (loc, scale joint) is possible with 2d binning.
  2. distance-based metropolis hastings explained here will be stashed for now as it can be
  • weak to simulation noise (e.g. possible to early stopping by chance)
  • need theoretical justification such as reversibility, detailed balance etc.

@hyunjimoon
Copy link
Owner Author

hyunjimoon commented Sep 12, 2021

Without real data that suppresses its scale, the parameter blewup as below from the three iterated parameter updates:

iter
1. $loc -0.58 ± 0.68, $scale 271 ± 500 
2. $loc -0.12 ± 0.51, $scale 6.4e+153 ± Inf 
3. $loc 319 ± 875, $scale Inf ± NaN 

(approximate_computation.rmd from the vignette with the most recent commit 978aca4)

How can I get the location and scale estimation from 2d kernel density smoothing? Some kd22d references are none of which return the actual parameters for each dimension and (correlation parameter?):

edit:
sample from kernel density could be one solution.

@hyunjimoon
Copy link
Owner Author

hyunjimoon commented Sep 16, 2021

rgamma(1, 0.01, 0.01) for shape prior in gamma regression can create havoc especially when mu is divided by small shape to get scale; much better behavior is observed with shape 1. Using the former lead to all zero generated dataset while with 1, the following came our for y.

[[10]]$Y
  [1] 4.309244e+00 2.732240e-04 1.563334e-01 1.361600e+01 4.288475e-02 2.452472e-02 3.514445e-01 6.610115e-05 1.622356e-03 9.903834e-06 5.211964e-02 2.795745e-01
 [13] 4.386543e-03 1.803132e-04 2.297929e-03 8.427759e-03 5.719066e-04 5.663498e-02 1.383374e+00 7.748814e-03 1.186217e-01 5.349593e-01 2.074221e+00 2.825991e-02
 [25] 1.127852e+01 9.391420e-03 1.942050e+00 4.454098e-03 3.152533e-02 7.617865e-02 4.714444e-01 1.854363e+01 1.519407e+02 5.129536e-05 1.514988e+01 1.211421e-02
 [37] 6.269108e-01 7.594586e-01 9.301513e+01 3.876664e-02 1.122144e-01 6.122663e-01 1.256519e-02 1.385288e-01 3.524889e+02 1.798018e+03 2.206024e+00 9.401898e-02
 [49] 1.121241e-02 3.577562e-01 8.309441e-01 1.431084e-01 3.221924e-01 5.994547e+00 4.165774e-02 1.644020e+00 4.284894e-05 9.306254e-01 1.916031e-02 1.438697e-03
 [61] 5.691771e+00 4.268056e-01 1.670348e+01 5.839476e-02 3.180200e-01 2.177965e+00 1.644341e+00 1.484512e-01 7.766534e-03 9.638333e-01 1.686290e+02 3.098333e-01
 [73] 4.996857e-02 1.217501e+01 5.589221e-04 9.279987e-01 1.040912e-01 2.145912e-02 3.458514e+00 1.231609e+02 9.703097e-02 2.603797e-02 6.242285e-02 3.671740e-01
 [85] 1.637701e+00 1.490350e-01 6.889935e-02 1.435317e+00 3.371142e-05 1.277393e-02 3.641803e-02 4.498413e-01 5.096799e-01 8.059216e-01 2.463571e-04 8.712013e-06
 [97] 2.762447e+01 9.831016e-03 2.623996e+00 1.431839e+00

With this result using (2,2) prior for the regression intercept is better than (2,5) as below:
image

@hyunjimoon
Copy link
Owner Author

@Dashadower 's log for recent commit:

Revamped self_calib(0923 ~ 0925, Dashadower)

  • only supports mixture based priors(purged the rest)
  • receives a parameter - index map as an argument. This allows for multi-parameter calibration and less hassle with directly manipulating dataset.
  • R array indexing drives me insane
  • R array indexing drives me insane

target_variables specifications

Since we now support multiple variables, multiple mixture distributions are present. We pass the means of each mixture as a vector, that is also in a vector, hence a matrix.

matrix[n_calib_parameters, n_samples] mm_mean;`  
\# of rows equal to \# of parameters, columns the sample count.

target_variables should be a named list mapping the parameter name to the row number of the mixture mean matrix(list(a = 1, b = 2)).

generator specifications:

generator <- function(S, mixture_mean_rvar, mixture_bw_rvar, ...){
    # generator here
}

self_calib will now pass as arguments:

  • S: The number of SBC iterations
  • mixture_mean_rvar A rvar with dimension <S>(n_target_variables, n_samples). This means you can just sample straight away with rvar_rng and directly do whatever calculations to your liking.
  • mixture_bw_rvar A rvar with dimension <S>(n_target_variables). Same applies with mixture_mean_rvar.

stancode specifications

  int<lower=1> n_calib_parameters;
  int SM;
  matrix[n_calib_parameters, SM] mm_mean; // mean values for gmm prior
  vector[n_calib_parameters] mm_bandwidth; // bandwidth(sd) for gmm prior
}

Since we would like to run calibration for multiple parameters, we receive mixture means for each parameter in a vector of vectors, known as a matrix.
# of rows equal to # of parameters, columns the sample count. Modify the model code to use the samples

sbc_iterations specifications

Now you can pass a function which returns S as output. The exact function signature should be:

sbc_iterations <- function(mixture_mean_rvar : rvar<S>(n_target_parameters, n_samples), mixture_bw_rvar<S>(n_target_parameters)) -> int

It receives two rvars, each the mixture mean samples and bandwidth values, and should return a single integer meaning the number of SBC iterations to use for the current calibration iteration.`


The result of me working for 4 hours trying to get consistent sampling with rvars work.

I really do not like rvars

N = 5  # number of variables
S = 2  # number of sbc iterations
K = 3  # number of samples per variable

X <- rvar(array(c(1:(S * N * K)), dim = c(S, N, K)))
#b <- rvar(array(rep(1, S * K), dim =c(S, K)))
b <- rvar_rng(rnorm, ndraws = 1, n = K, 0, 1)

This means the internal array has shape (S, N, K). So we should be a be able to directly call sample from it with rvar_apply:

samples <- rvar_apply(X, 1, function(x){sample(x, 4, replace = TRUE)}) # sample 4 values from along N

This does not work when S > 1, since reshaping the array will screw up the ordering.

After trying for 2 hours looking for a elegant solution, I gave up and went with the following:

ret_vec <- c()
for(k in 1:K){  # I truly, truly hate R array indexing
      for(n in 1:N){
        for(S in 1:S){
          ret_vec <- c(ret_vec, sample(draws_of(X)[S, n, ], 1, replace = TRUE))
        }
      }
}
rvar(array(ret_vec, dim = c(S, N, K)))

TLDR Avoid rvars when possible

> ar1 <- array(c(1:24), dim = c(4, 6))
> ar2 <- array(c(1:24), dim = c(1, 4, 6))

> array(ar1, dim = c(1, 4, 6)) == ar2
[1] TRUE

TODO

I am actually dumb. Switch from multidimensional rvar to draws_rvars. 200% better and clean solution.

@hyunjimoon
Copy link
Owner Author

  1. Why is it that you wish to make sbc_iterations as a function? Let's use nsims instead as sbc_iterations is confusing with the actual iteration of one SBC cycle. In generator, what is the benefit of receiving nsims as a function, not value? Wouldn't this function design be more better when done in self-calib.R, not in self-calibration.rmd`?

@Dashadower
Copy link
Collaborator

I wanted to accept a function to allow the user to use custom iteration rules but if it's unnecessary it's fine to just accept an integer or implement our rule in the R script.

@Dashadower
Copy link
Collaborator

I would like to clarify - within SBC, it would be right to call 'simulations'(S). However for calibration, I would like to call it 'iterations' or 'calibrations' (C) to represent we are repeatedly running calibration.

@hyunjimoon
Copy link
Owner Author

hyunjimoon commented Sep 26, 2021

  1. Not every fitting alg. use this; maybe as an arguement?
    ##' @param bandwidth the smoothing bandwidth parameter to pass to stats.density
  2. Please change to draws_rvars type. Once set,subset_draws(mixture_means_init, variables = tp) could be usable!
  3. Would it be better to implement for loop on target_parameters within cjs_distso that its signature becomes (rvar, rvar)? This would make the following more brief.
 cjs_record[[tp]] <- c(cjs_record[[tp]], cjs_dist(mixture_means_rvar[target_params[[tp]], ], mixture_means_hat_rvar[target_params[[tp]], ]))

@Dashadower
Copy link
Collaborator

  1. Would it be better to implement for loop on target_parameters within cjs_distso that its signature becomes (rvar, rvar)? This would make the following more brief.
 cjs_record[[tp]] <- c(cjs_record[[tp]], cjs_dist(mixture_means_rvar[target_params[[tp]], ], mixture_means_hat_rvar[target_params[[tp]], ]))

This is a very good idea! As much as I hate using rvars, it still makes sense to try and work with them whenever possible.

@hyunjimoon
Copy link
Owner Author

hyunjimoon commented Jun 27, 2022

Goal: differentiate mis-calibrated data2draws with the other acceptable - drawing the boundary of safe region.

Workflow

  1. Generator: select reasonable computational algorithm M (with prior + likelihood + M (as an approximation of precise data2draws M*) form joint distribution generator.
  2. Sample parameter draws {theta}_{s}^{\prime}, s=1, \ldots, S, using MCMC with M as the approximation method.
  3. Discriminator: Compute metrics: {MAE}^{M, M*} (Eq. 13) and importance weights r_{s}^{M, M*} using approximation method M*. Fit khat.
  4. Increase the accuracy of M^* and repeat Step 3 until{MAE}^{M, M*} and khat converge. If khat converges to a value larger than 0.7, increase the accuracy of M and go back to Step 2.
  5. Compute any posterior estimates using with $r_{s}$ being the values to which r_{s}^{M, M^{*}} finally converged.

Explanation and e.q. are from Juho's this paper.

  • Fit khat is explained in Section 2.4
  • In step 3, continue/stop decision for M-M* gap closing is made based on Mmetric "pareto_k", "n_eff", "r_eff", "mad_loglik", "mad_odesol" with which

4. Discriminator

The following discriminator from this function

psis_relative_eff <- function(x, y) {
  checkmate::assert_class(x, "OdeModelFit")
  checkmate::assert_class(y, "OdeModelFit")
  log_ratios <- log_ratios(x, y)
  reciproc_of_importance_ratios <- exp(-log_ratios)
  r_eff <- loo::relative_eff(x = reciproc_of_importance_ratios)
  as.numeric(r_eff)
}

The following from odemodeling vignette shows how discriminator khat < .7 can be a critic for generator: ODE solver precision parameter 4~18 as its hyperparameter.

solvers <- midpoint_list(c(4,6,8,10,12,14,16,18))
rel <- sho_fit_post$reliability(solvers = solvers)

print(rel$metrics)
#>    pareto_k    n_eff     r_eff mad_loglik mad_odesol
#> 1 0.1176621 2568.697 0.6551239   1.151924 0.05033225
#> 2 0.1163435 2516.645 0.6481001   1.383171 0.05954904
#> 3 0.1191357 2496.489 0.6455019   1.465447 0.06276090
#> 4 0.1168255 2486.756 0.6442629   1.503758 0.06424357
#> 5 0.1184567 2481.337 0.6435767   1.524628 0.06504748
#> 6 0.1155186 2478.024 0.6431573   1.537231 0.06553156
#> 7 0.1143430 2475.851 0.6428822   1.545418 0.06584543
#> 8 0.1130321 2474.350 0.6426921   1.551034 0.06606043
unlink("results")

workflow of IS-ode-solver

can give hint to the above workflow.

1. Select a reasonable approximation method $M$.
2. Sample parameter draws $\boldsymbol{\theta}_{s}^{\prime}, s=1, \ldots, S$, using MCMC with $M$ as the approximation method.
3. Compute $\mathrm{MAE}^{M, M^{*}}$ (Eq. 13) and importance weights $r_{s}^{M, M^{*}}$ (Eq. 10) using approximation method $M^{*}$. Fit $\hat{k}$ as explained in Section $2.4$.
4. Increase the accuracy of $M^{*}$ and repeat Step 3 until $\mathrm{MAE}^{M, M^{*}}$ and $\hat{k}$ converge. If $\hat{k}$ converges to a value larger than $0.7$, increase the accuracy of $M$ and go back to Step $2 .$
6. Compute any posterior estimates using Eq. 8, with $r_{s}$ being the values to which $r_{s}^{M, M^{*}}$ finally converged.

Reasoning for IS as pseudo gradient descent

is explained in Thm1. of Ryan's paper which explains gradient of expected quantities of interest ($g(\theta)$ evaluated at $\alpha_0$) can be approximated with covariance of MCMC samples.
image
Calculating LHS with IS: IS-estimate of the dependence of LHS on \alpha can be constructed with weights that depend on $\alpha$. Then, differentiating the weights with respect to \alpha provides a sample-based estimate of LHS.

image

To summarize (connecting with Data4DM/BayesSD#21 (comment))

  1. We need gradient
  2. Importance sampling which is the core of sequential Monte Carlo is equivalent to "using" MCMC samples (in practice, computed from HMC as a result of gradient-based optimization)

Could convexity of well-calibrated posterior justify consistent safe region?

If the space is parameter space (instead of hyper-parameter) it is known to be convex. ("linear interpolation between the correct posterior and the prior will pass SBC has been previously reported multiple times (e.g. Lee, Nicholls, and Ryder 2019, equation 1.3)")

@Dashadower
Copy link
Collaborator

Dashadower commented Jun 27, 2022

Algorithm vs Model(parameter) Tuning

Q. Do you view the ODE solver algorithm($M$) as part of the model?
If not, how about we separate the target of the calibration?

  1. Tune solver($M, M^*$) first given current prior/model.
  2. Given $M$, calibrate prior
  3. Repeat until 😊

Reducing number of fits required

SBC requires $L$ fits, and then ODE tuning also requires multiple fits. I feel like it's too much time required just to check well-calibration in practice T.T. Can we somehow use importance sampling to lessen the burden?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants