FR: Support for InferenceData files #166

mike-lawrence · 2021-06-24T15:00:53Z

I'm actually about to start work on this myself, being nearly done an R-based cmdstan-csv-to-ID converter, but wanted to make it a public issue for posterity. The InferenceData spec subclasses NetCDF4 which in turn subclasses HDF5. What's nice about this format is that it permits fast indexed access to on-disk arrays, making for lighter RAM footprint relative to the current status-quo of having to hold everything in RAM. Admittedly this only becomes pertinent for very large models, but I thought it would be worth implementing. I'm hoping (and advocating) that the fabled cmdstan IO rewrite will use ID, in which case this feature would become much more universally applicable.

mike-lawrence · 2021-06-25T12:09:49Z

@mjskay: The access scheme for ID files seems to me to be very much aligned with the rvars format, whereby each variable is stored as a multidimensional array (chain & draw as first and second dimensions, respectively) and reading can consist of slices of arbitrary shape. I can easily read everything into memory outright and store as an rvar, but I wonder if a "lazy-rvar" type might be possible whereby the values are actually pointers so that the data is read from the file only when a computation actually needs them. Any thoughts?

mjskay · 2021-06-25T15:35:18Z

I love the idea of a lazy rvar! Not sure the best way to do it... Maybe if there were a lazy array implementation for R backed by netcdf? Then we could allow use of that as the backing array. Not sure if that would be easier or harder than just implementing all the netcdf access in a subclass of rvar. Anyway, I'm open to it!

mike-lawrence · 2021-06-28T11:54:52Z

@mjskay not working on the lazy-rvars yet, simply trying to get regular ol' rvars working and hit a snag that I'm hoping you can help me with. I'm sure I'm just not understanding the format properly. So for a Stan variable that is a 1D array, we get a $ndraws \times nchains \times N$ array in the posterior, which converts to rvar fine:

a = array(rnorm(1e3*4*2),dim = c(1e3,4,2)) 
a_rvar = posterior::as_draws_rvars(a)

But when a Stan variable has more than one dimension, I'm getting an error:

b = array(rnorm(1e3*4*2*2),dim = c(1e3,4,2,2))
b_rvar = posterior::as_draws_rvars(b)
# Error: Don't know how to transform an object of class 'array' to any supported draws format.

I'm also fuzzy on how to set the dimnames. I have this clunky code:

dim_names = c('draw','chain','b.1','b.2')
dimnames(b) = list()
for(i in 1:length(dim_names)){
  dimnames(b)[[i]] = paste(dim_names[i],1:dim_counts[i],sep='.')
}

But I feel like that's the wrong way to name the last two dimensions, and surely there's a more straightforward way I'm missing.

mjskay · 2021-06-28T14:45:16Z

Hi Mike, happy to help - I think the issue you are running into comes down to the distinction between the rvar and the draws_rvars formats. For a single variable you will want to convert it to an rvar, not a draws_rvars (the latter is a collection of rvars). Should then be able to create a draws_rvars object with one or more rvars in it.

My guess is the docs aren't totally clear about this and we should clarify it - maybe in the rvar vignette or somewhere else

mike-lawrence · 2021-06-28T16:00:29Z

Oh shoot. I just now see that my FR is a dupe. I'll close this and continue the thread on the original.

mike-lawrence · 2021-06-28T17:24:53Z

@mjskay Maybe just to finish off the posterior-array-to-rvars thread here, after looking at the vignette, it seems all the rvars conversion options assume that either the chain & draw dimensions have been collapsed to a single longer dimension, or that the chain & draw dimensions are left alone and the variable's remaining dimensions are collapsed to a single longer dimension. That is, one can do:

nchains = 4
ndraws_per_chain = 1e3
rows <- 4
cols <- 3
x <- rvar(array(
  rnorm(ndraws_per_chain * nchains * rows * cols, mean = 1, sd = 1)
  , dim = c(
    ndraws_per_chain * nchains #note two dims being collapsed here
    , rows
    , cols
)))

And one can do:

a = posterior::example_draws('multi_normal')[,,3:12] #extracts the "Sigma" parameter entries
dim(a) #shows 10,4,12
dimnames(a) # 3rd dim has names that combine the dims of Sigma
ar = posterior::as_draws_rvars(a)

But it's not possible to go directly from a ndraws_per_chain X nchains X rows X cols array to an rvar/draws_rvar.

mjskay · 2021-06-28T17:30:18Z

But it's not possible to go directly from a ndraws_per_chain X nchains X rows X cols array to an rvar/draws_rvar.

Ah yes, there will be soon though - once #153 is merged you will be able to use the with_chains argument to do that. That reminds me the vignette will need an example of it too though.

mike-lawrence · 2021-06-28T18:00:22Z

Thanks! In case it helps anyone for any reason, I learned that this works too:

b = array(rnorm(1e3*4*2*2),dim = c(1e3,4,2,2)) #3+D array from elsewhere
#collapse first two dims
dim(b) = c(prod(dim(b)[1:2]),dim(b)[3:length(dim(b))]) #Yay, no copies to reshape! 
b_rvar = posterior::rvar(b)
#we've lost the variable name, but can get it back by using a named list input to as_draws_rvars:
b_draws_rvar = posterior::as_draws_rvars(list(b=b_rvar)) 
summary(b_draws_rvar) #shows "b" as the variable name with proper dimension indexing

mjskay · 2021-06-28T19:15:10Z

Two additional things that might (or might not) be helpful: (1) you can currently use the nchains argument to add chain info to rvars if the dims have already been collapsed (your example loses chain info) (2) if you use the constructor (draws_rvars()) instead of the conversion function (as_draws_rvars()) you can skip creating the list and just supply rvars as named arguments:

b = array(rnorm(1e3*4*2*2),dim = c(1e3,4,2,2))
dim(b) = c(prod(dim(b)[1:2]),dim(b)[3:length(dim(b))])
# so long as chains was the second dimension this will recover chain info correctly
b_rvar = posterior::rvar(b, nchains = 4)
# could also use as_draws_rvars(list(b = b_rvar)), though this is more direct:
b_draws_rvar = posterior::draws_rvars(b = b_rvar)
b_draws_rvar

# A draws_rvars: 1000 iterations, 4 chains, and 1 variables
$b: rvar<1000,4>[2,2] mean ± sd:
     [,1]         [,2]        
[1,]  0.0011 ± 1  -0.0034 ± 1 
[2,] -0.0228 ± 1   0.0195 ± 1

After #153 is merged, this will become even simpler:

b = array(rnorm(1e3*4*2*2),dim = c(1e3,4,2,2))
# with_chains = TRUE will work after the rvar-conv branch is merged:
b_rvar = posterior::rvar(b, with_chains = TRUE)
b_draws_rvar = posterior::draws_rvars(b = b_rvar)
b_draws_rvar

# A draws_rvars: 1000 iterations, 4 chains, and 1 variables
$b: rvar<1000,4>[2,2] mean ± sd:
     [,1]            [,2]           
[1,]  0.0013 ± 0.99  -0.0133 ± 1.00 
[2,] -0.0054 ± 1.01   0.0099 ± 1.00

mike-lawrence closed this as completed Jun 28, 2021

mjskay mentioned this issue Jun 28, 2021

Convergence functions for rvars #153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: Support for InferenceData files #166

FR: Support for InferenceData files #166

mike-lawrence commented Jun 24, 2021 •

edited

Loading

mike-lawrence commented Jun 25, 2021 •

edited

Loading

mjskay commented Jun 25, 2021

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mjskay commented Jun 28, 2021

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mjskay commented Jun 28, 2021

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mjskay commented Jun 28, 2021

FR: Support for InferenceData files #166

FR: Support for InferenceData files #166

Comments

mike-lawrence commented Jun 24, 2021 • edited Loading

mike-lawrence commented Jun 25, 2021 • edited Loading

mjskay commented Jun 25, 2021

mike-lawrence commented Jun 28, 2021 • edited Loading

mjskay commented Jun 28, 2021

mike-lawrence commented Jun 28, 2021 • edited Loading

mike-lawrence commented Jun 28, 2021 • edited Loading

mjskay commented Jun 28, 2021

mike-lawrence commented Jun 28, 2021 • edited Loading

mjskay commented Jun 28, 2021

mike-lawrence commented Jun 24, 2021 •

edited

Loading

mike-lawrence commented Jun 25, 2021 •

edited

Loading

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mike-lawrence commented Jun 28, 2021 •

edited

Loading

mike-lawrence commented Jun 28, 2021 •

edited

Loading