You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've started work on a density estimation package, including a classifier-adjusted density estimation (CADE) routine that currently uses LightGBM for its default classifier. However, it occurs to me that this is almost certainly suboptimal. It would be far more efficient to directly build boosted density estimation trees, and I wonder if this could be done on top of LightGBM's existing codebase without much additional effort (for someone with more C++ skill than me).
Density estimation trees are not a new concept. For example, https://arxiv.org/abs/1607.06635 points out their computational advantages. I suspect that a boosted tree framework could overcome some of the accuracy limitations of single-tree implementations.
Unlike all(?) current LightGBM-supported learners, density estimation is an unsupervised learning method. Given a node, the choice of where and whether to make a split requires keeping track of the min/max value of each feature in that node, and efficiently tallying the number of observations to the left and right of each potential split. A split provides greatest gain when it produces child nodes with very unequal densities but not-too-unequal total mass. Several specific loss functions have been proposed; I would likely start with those encoded in astropy.
Update: I hope to eventually be able to endorse a particular loss function for training. Until then, anyone who tackles this may find my general notes on performance evaluation to be relevant.
The text was updated successfully, but these errors were encountered:
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
I've started work on a density estimation package, including a classifier-adjusted density estimation (CADE) routine that currently uses LightGBM for its default classifier. However, it occurs to me that this is almost certainly suboptimal. It would be far more efficient to directly build boosted density estimation trees, and I wonder if this could be done on top of LightGBM's existing codebase without much additional effort (for someone with more C++ skill than me).
Density estimation trees are not a new concept. For example, https://arxiv.org/abs/1607.06635 points out their computational advantages. I suspect that a boosted tree framework could overcome some of the accuracy limitations of single-tree implementations.
Unlike all(?) current LightGBM-supported learners, density estimation is an unsupervised learning method. Given a node, the choice of where and whether to make a split requires keeping track of the min/max value of each feature in that node, and efficiently tallying the number of observations to the left and right of each potential split. A split provides greatest gain when it produces child nodes with very unequal densities but not-too-unequal total mass. Several specific loss functions have been proposed; I would likely start with those encoded in astropy.
Update: I hope to eventually be able to endorse a particular loss function for training. Until then, anyone who tackles this may find my general notes on performance evaluation to be relevant.
The text was updated successfully, but these errors were encountered: