Feature request: density estimation #2056

zkurtz · 2019-03-19T02:18:23Z

I've started work on a density estimation package, including a classifier-adjusted density estimation (CADE) routine that currently uses LightGBM for its default classifier. However, it occurs to me that this is almost certainly suboptimal. It would be far more efficient to directly build boosted density estimation trees, and I wonder if this could be done on top of LightGBM's existing codebase without much additional effort (for someone with more C++ skill than me).

Density estimation trees are not a new concept. For example, https://arxiv.org/abs/1607.06635 points out their computational advantages. I suspect that a boosted tree framework could overcome some of the accuracy limitations of single-tree implementations.

Unlike all(?) current LightGBM-supported learners, density estimation is an unsupervised learning method. Given a node, the choice of where and whether to make a split requires keeping track of the min/max value of each feature in that node, and efficiently tallying the number of observations to the left and right of each potential split. A split provides greatest gain when it produces child nodes with very unequal densities but not-too-unequal total mass. Several specific loss functions have been proposed; I would likely start with those encoded in astropy.

Update: I hope to eventually be able to endorse a particular loss function for training. Until then, anyone who tackles this may find my general notes on performance evaluation to be relevant.

StrikerRUS · 2019-08-01T16:48:38Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

jameslamb added the feature request label Mar 19, 2019

StrikerRUS added the help wanted label Mar 20, 2019

guolinke mentioned this issue Aug 1, 2019

Feature Requests & Voting Hub #2302

Open

guolinke closed this as completed Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: density estimation #2056

Feature request: density estimation #2056

zkurtz commented Mar 19, 2019 •

edited

Loading

StrikerRUS commented Aug 1, 2019

Feature request: density estimation #2056

Feature request: density estimation #2056

Comments

zkurtz commented Mar 19, 2019 • edited Loading

StrikerRUS commented Aug 1, 2019

zkurtz commented Mar 19, 2019 •

edited

Loading