First classes with null / small number of events ? #354

lcrmorin · 2025-05-05T17:42:40Z

lcrmorin
May 5, 2025

I am curious to check how the first splits would be created in the context of low event numbers.

Is the binning process robust in the context of small number of event ? Or does the binning process rely on statistical tests approximation that may not hold with low event number ?
Notably, is the solution generally able to build a class with 0 event ? (infinite WoE)
Or in other words: would it be necessary to check robustness of the first splits with alternative methods ?

lcrmorin · 2025-05-12T05:52:41Z

lcrmorin
May 12, 2025
Author

Addendum: it doesn't seem to handle 0-event bins well. Typically it will throw an error if users provided split have 0 events. Is there a proper way to handle this ? (If the pure bins is at the top, remove it, bin the rest and do some testing when merging. If the pure bin is in the middle.... it is not good).

Are there approximate way to handle this ? Typically I was thinking of using some model induced WoE (log-odds of estimated probability of event) to replace infinite values.

Maybe @guillermo-navas-palencia has an input on this ?

0 replies

guillermo-navas-palencia · 2025-05-12T16:55:14Z

guillermo-navas-palencia
May 12, 2025
Maintainer

Hi @lcrmorin.

I am aware of some sort of induced WoE implemented in SAS grouping: https://documentation.sas.com/doc/en/vdmmlcdc/8.1/casstat/viyastat_binning_details02.htm. However, I don't know how to choose the correct adjustment factor. Therefore, I created the pure bins check.

If you have a good prior for the % of events in a given buckets, perhaps you can perform binning assuming a continuous target with probabilities in [0, 1].

0 replies

lcrmorin · 2025-05-12T17:46:09Z

lcrmorin
May 12, 2025
Author

Ah yes thanks @guillermo-navas-palencia. There are some adjustment with adding an x in both numerator and denominator, but I have a hard time finding a good justification for these. Seems like clipping the probability of events ? At this point why not just clip the log ?

As you mention maybe the best way to go is to perform continuous binning. As I don't know how the boundaries work (whether it is 'harder' as one get closer to 0). I am currently toying with binning the continuous probabilities transformed into log-odds (after all log-odds and WoE differs by a constant).

1 reply

guillermo-navas-palencia May 13, 2025
Maintainer

Nice. Feel free to share your findings using log-odds!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

First classes with null / small number of events ? #354

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

First classes with null / small number of events ? #354

Uh oh!

lcrmorin May 5, 2025

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

lcrmorin May 12, 2025 Author

Uh oh!

guillermo-navas-palencia May 12, 2025 Maintainer

Uh oh!

Uh oh!

lcrmorin May 12, 2025 Author

Uh oh!

guillermo-navas-palencia May 13, 2025 Maintainer

lcrmorin
May 5, 2025

Replies: 3 comments 1 reply

lcrmorin
May 12, 2025
Author

guillermo-navas-palencia
May 12, 2025
Maintainer

lcrmorin
May 12, 2025
Author

guillermo-navas-palencia May 13, 2025
Maintainer