Skip to content

Conversation

@aloctavodia
Copy link
Member

@aloctavodia aloctavodia commented Mar 10, 2022

BART (that now lives in pymc-experimental) stores 2 stats which at each point are arrays. Variable importance and bart_trees. For example variable importance is a vector with the position encoding a given covariate and an integer encoding the number of times that covariate was used in the trees.

We used to have this to stack those stats https://github.com/pymc-devs/pymc/pull/5566/files#diff-7b3470589153c7c90a2c05fc86402f9acf65555200d90f01b51a47ccef592628L637-L654 but that code is BART-specific, so here I am proposing as alternative a more general solution.

@codecov
Copy link

codecov bot commented Mar 10, 2022

Codecov Report

Merging #5572 (84d9f8f) into main (44c5495) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 84d9f8f differs from pull request most recent head 3f922a3. Consider uploading reports for the commit 3f922a3 to get more accurate results

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #5572   +/-   ##
=======================================
  Coverage   87.59%   87.59%           
=======================================
  Files          76       76           
  Lines       13694    13694           
=======================================
  Hits        11995    11995           
  Misses       1699     1699           
Impacted Files Coverage Δ
pymc/backends/base.py 87.77% <100.00%> (+0.09%) ⬆️
pymc/step_methods/mlda.py 96.37% <100.00%> (-0.03%) ⬇️

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good but I don't really know how it will integrate with the rest of the library, hope these two questions help a bit.

Do we have tests with compound steps similar to #4602, does this modify this?

Would it be helpful to try and provide stepper defined dims or coords?

@aloctavodia
Copy link
Member Author

aloctavodia commented Mar 10, 2022

This is not directly related to compound steps, but to samplers/steps that have stats that are arrays and not floats/integer/boolean etc. So far it seems that the only samplers with that kind of stats are from PGBART and MLDA.

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to try and provide stepper defined dims or coords?

Generally yes. Alongside shape information.

@aloctavodia why do you want to stack these variables at all?
They should be yielded in the correct type right away. We don't have a stats-postprocessing API and I'd prefer to undertstand what you're trying to achieve first.

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check why the CI failed, but I'm quite certain about the isinstance checks I commented above.

@michaelosthege michaelosthege changed the title stack array stats Stack array stats if possible Mar 11, 2022
@aloctavodia aloctavodia merged commit 4626712 into pymc-devs:main Mar 11, 2022
@aloctavodia aloctavodia deleted the stat_array branch March 11, 2022 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants