Revamping HPD#1117
Conversation
OriolAbril
left a comment
There was a problem hiding this comment.
Multimodal case is very tricky, I don't think it is possible to know the shape beforehand. i think the best way to tackle it would be to create the ufunc by hand and then use xr.apply_ufunc directly. This would avoid using make_ufunc that requires creating the output array beforehand.
However, I would first work on unimodal case and once unimodal is up and running, move onto multimodal.
|
I have made the changes. It is working fine for ndarray and datasets. Currently, I am first converting ndarray to dataset then using |
OriolAbril
left a comment
There was a problem hiding this comment.
I tried using wrap_xarray_ufuncdirectly on ndarray but the hpd is calculated is always calculated on the last dimension. Converting it to ndarray gives me control over which dimension to calculate hpd for.
Any workaround for this?
I think this is the best workaround, it gives complete control and it is quite explicit. The other option would be to manually reorder ndarray dimensions, but I would not recommend it as it is much less readable.
I think that unimodal case is nearly finished, only things missing are tests and some nits. Regarding multimodal case, I was thinking of a possible workaround, let's see what everyone thinks about this. The idea would be to have _hpd_multimodal return a result with shape (2, 10) (the 10 is completely arbitrary and could be modified, even be an argument). Of these 10 pairs of values, the first would be hpd intervals and the last ones will probably not be needed, and would be then set to nan (in _hpd_multimodal). Eventually hpd in the case of multimodal would drop nan values and return the hpd dataset with the proper shape (also, different variables may have different number of modes and this approach should solve this too).
|
Do we start the work of multimodal here or wait for everyone's opinion? Maybe create a new issue for multimodal discussion. |
Codecov Report
@@ Coverage Diff @@
## master #1117 +/- ##
==========================================
- Coverage 92.68% 92.67% -0.01%
==========================================
Files 93 93
Lines 9073 9097 +24
==========================================
+ Hits 8409 8431 +22
- Misses 664 666 +2
Continue to review full report at Codecov.
|
|
I would work on multimodal in this same PR, otherwise, behaviour of ArviZ development version between merging this and the other will be quite confusing. |
Should I start implementing this or wait? |
|
Let's go ahead with multimodal 💪 |
|
I have tried to implement the multimodal case for a single input. Currently, I am filling only the first dimension with NaNs. |
OriolAbril
left a comment
There was a problem hiding this comment.
I tried to comment everything on the related piece of code, but most of the comments are related one to the other and even apply to several places, read everything first and ask if there is anything unclear.
OriolAbril
left a comment
There was a problem hiding this comment.
Looks great!
I have one question about ndarray input with ndims > 2. This is currently not supported right?
I am not sure if we should extend the 2d behaviour (as the code does now) or assume ArviZ dimensions of (chain, draw, *shape). I am inclined towards the second but we should probably weigh the pros and cons and reach some kind of consensus.
|
I think ndarray input with ndims > 2 is not supported. |
|
You can add a comment to ignore the pylint warning, it probably misses the pop |
OriolAbril
left a comment
There was a problem hiding this comment.
I think these comments will be the last nits
| func = _hpd_multimodal if multimodal else _hpd | ||
|
|
||
| density *= dx | ||
| if isinstance(ary, np.ndarray): |
There was a problem hiding this comment.
This should be only if the array is 1d or 2d:
isarray = isinstance(ary, np.ndarray)
if isarray and ary.ndim <= 2:
If the array has 3 or more dimensions, it should assume ArviZ dim order: (chain, draw, *shape). hpd should still return a numpy array though:
...
hpd_data = _wrap_xarray_ufunc(func, ary, func_kwargs=func_kwargs, **kwargs)
hpd_data = hpd_data.dropna("mode", how="all") if multimodal else hpd_data
return hpd_data.x.values if isarray else hpd_data
| def test_hpd_multidimension(): | ||
| normal_sample = np.random.randn(12000, 10, 3) | ||
| result = hpd(normal_sample) | ||
| assert result.shape == (10, 3, 2,) |
There was a problem hiding this comment.
This line will have to be updated to check that the result shape is the desired (3, 2)
There was a problem hiding this comment.
Earlier, we were calculating hpd over one dimension only, for ndarrays. So, for backward compatibility I have set default to be calculated only over 'chain' for ndarrays. So, the result still would be (10, 3, 2,).
There was a problem hiding this comment.
The issue is that calculating hpd only over chain is a very bad default, we'll keep the behaviour (for now) in 2d array case to keep backwards compatibility, but 3d arrays are not supported, so we do not have the backwards compatibility constraint.
There was a problem hiding this comment.
Okay, I have done the changes.
|
i ran some cell with pm.stats.hpd in an example notebook and ig its removed or something cause i get the error - no attribute 'hpd'. I tried az.hpd too, same thing. im probably missing something, is the function renamed or stuff. |
|
Did you try az.hdi? Please open a new issue if that doesn't work. |
|
@ahartikainen thankyou that worked! |
Description
fixes #855 Make hpd work with multidimensional arrays.
Checklist