Replies: 3 comments 1 reply
-
Addendum: it doesn't seem to handle 0-event bins well. Typically it will throw an error if users provided split have 0 events. Is there a proper way to handle this ? (If the pure bins is at the top, remove it, bin the rest and do some testing when merging. If the pure bin is in the middle.... it is not good). Are there approximate way to handle this ? Typically I was thinking of using some model induced WoE (log-odds of estimated probability of event) to replace infinite values. Maybe @guillermo-navas-palencia has an input on this ? |
Beta Was this translation helpful? Give feedback.
-
Hi @lcrmorin. I am aware of some sort of induced WoE implemented in SAS grouping: https://documentation.sas.com/doc/en/vdmmlcdc/8.1/casstat/viyastat_binning_details02.htm. However, I don't know how to choose the correct adjustment factor. Therefore, I created the pure bins check. If you have a good prior for the % of events in a given buckets, perhaps you can perform binning assuming a continuous target with probabilities in [0, 1]. |
Beta Was this translation helpful? Give feedback.
-
Ah yes thanks @guillermo-navas-palencia. There are some adjustment with adding an x in both numerator and denominator, but I have a hard time finding a good justification for these. Seems like clipping the probability of events ? At this point why not just clip the log ? As you mention maybe the best way to go is to perform continuous binning. As I don't know how the boundaries work (whether it is 'harder' as one get closer to 0). I am currently toying with binning the continuous probabilities transformed into log-odds (after all log-odds and WoE differs by a constant). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am curious to check how the first splits would be created in the context of low event numbers.
Is the binning process robust in the context of small number of event ? Or does the binning process rely on statistical tests approximation that may not hold with low event number ?
Notably, is the solution generally able to build a class with 0 event ? (infinite WoE)
Or in other words: would it be necessary to check robustness of the first splits with alternative methods ?
Beta Was this translation helpful? Give feedback.
All reactions