-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Stack array stats if possible #5572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5572 +/- ##
=======================================
Coverage 87.59% 87.59%
=======================================
Files 76 76
Lines 13694 13694
=======================================
Hits 11995 11995
Misses 1699 1699
|
OriolAbril
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good but I don't really know how it will integrate with the rest of the library, hope these two questions help a bit.
Do we have tests with compound steps similar to #4602, does this modify this?
Would it be helpful to try and provide stepper defined dims or coords?
|
This is not directly related to compound steps, but to samplers/steps that have stats that are arrays and not floats/integer/boolean etc. So far it seems that the only samplers with that kind of stats are from PGBART and MLDA. |
michaelosthege
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to try and provide stepper defined dims or coords?
Generally yes. Alongside shape information.
@aloctavodia why do you want to stack these variables at all?
They should be yielded in the correct type right away. We don't have a stats-postprocessing API and I'd prefer to undertstand what you're trying to achieve first.
michaelosthege
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check why the CI failed, but I'm quite certain about the isinstance checks I commented above.
BART (that now lives in pymc-experimental) stores 2 stats which at each point are arrays. Variable importance and bart_trees. For example variable importance is a vector with the position encoding a given covariate and an integer encoding the number of times that covariate was used in the trees.
We used to have this to stack those stats https://github.com/pymc-devs/pymc/pull/5566/files#diff-7b3470589153c7c90a2c05fc86402f9acf65555200d90f01b51a47ccef592628L637-L654 but that code is BART-specific, so here I am proposing as alternative a more general solution.