Allow for different functionalities of NannyML to set thier own minimum chunk size #43

nikml · 2022-03-18T22:35:26Z

Univariate Drift has it's own minimum chunk size
Mutltivariate Drift has it's own minimum chunk size
Previous minimum chunk size now only used in Performance Estimation
Refactoring of chunker to not have a minimum chunk size property. It is now an argument for the split method and specified by Monitoring Classes.
Update docs to describe approach for minimum chunk size and relevant warning.

…ptional argument. Multiple other refactors as a consequence + test adjustments.

… chunk size

# Conflicts: # nannyml/performance_estimation/confidence_based/cbpe.py # tests/performance_estimation/test_cbpe.py

- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).

…um chunk size (#43) * wip1: default min chunk size function for BaseDriftCalculator * wip2: min default chunk size for BasePerformanceEstimator * wip3 - perf est? * Big chunker refactor. Minimum chunk size moved to split function as optional argument. Multiple other refactors as a consequence + test adjustments. * wip: update BasePerfEstimator to not have functions regarding minimum chunk size * make CBPE set its own min chunk size * wip: min chunk size for multivariate * add docs for minimum chunk size * Move chunker.split to inheriting drift calculator classes * Fix missing target values during (old) _minimum_chunk_size calculation * - Move chunk splitting to PerformanceEstimator subclasses - Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based). Co-authored-by: Niels Nuyttens <[email protected]>

* Created step plot functionality, created artificial endpoint generation, separated legend label arguments form hover label arguments, small improved to legend generation, added incomplete target functionality, created reference implementation for: (1) target distribution monitoring (2) realised performance monitoring (3) correct naming of plotting elements. * multivariate drift bugfix and doc update (#37) * [skip ci] Updated the changelog * doc and testing updates (#38) * 39 continuous distribution plots scaling got wrong (#40) * fix scaling * update docs plots * Check if calibration is needed before performing CBPE estimation (#42) * - CBPE will check if calibration is beneficial during fitting. If not, calibration will not be performed. - Calibration is not required when roc_auc_score == 1 (perfect predictor) * Deal with indexing issues when using StratifiedShuffleSplit indexes on subsets * needs_calibration threshold with some margin * Debug results messing up fitting * Include realized performance in CBPE results * Plot realized performance for reference period * Don't exclude analysis data from realized performance calculation (future work) * Allow for different functionalities of NannyML to set thier own minimum chunk size (#43) * wip1: default min chunk size function for BaseDriftCalculator * wip2: min default chunk size for BasePerformanceEstimator * wip3 - perf est? * Big chunker refactor. Minimum chunk size moved to split function as optional argument. Multiple other refactors as a consequence + test adjustments. * wip: update BasePerfEstimator to not have functions regarding minimum chunk size * make CBPE set its own min chunk size * wip: min chunk size for multivariate * add docs for minimum chunk size * Move chunker.split to inheriting drift calculator classes * Fix missing target values during (old) _minimum_chunk_size calculation * - Move chunk splitting to PerformanceEstimator subclasses - Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based). Co-authored-by: Niels Nuyttens <[email protected]> * Updated CHANGELOG.md * Bump version: 0.2.0 → 0.2.1 * Update CHANGELOG.md * Update CHANGELOG.md * Feature: performance calculation (#44) * Add predicted probabilities to metadata * Predicted labels should be predicted scores for CBPE Co-authored-by: Nikolaos Perrakis <[email protected]> Co-authored-by: jakubnml <[email protected]> * typo fix (#45) * Stricter constraints for scipy * Fixes: - using predicted labels during univariate continuous drift calculation - exclude detected predicted probabilities columns from feature list during metadata extraction - use predicted probabilities during drift results plotting - use predicted probabilities during drifting features ranking * Fixes: - Still using predicted labels instead of predicted probabilities in CBPE - Added test to run CBPE with synthetic example data * Fixes: - Added check for metadata.predicted_probability_column_name in univariate drift calculator construction + test - Fix some broken tests * Replace line plots by step plots Co-authored-by: Wiljan Cools <[email protected]> Co-authored-by: Nikolaos Perrakis <[email protected]> Co-authored-by: jakubnml <[email protected]>

nikml and others added 8 commits March 18, 2022 15:40

wip1: default min chunk size function for BaseDriftCalculator

94dc0b1

wip2: min default chunk size for BasePerformanceEstimator

e3498c2

wip3 - perf est?

48713bf

Big chunker refactor. Minimum chunk size moved to split function as o…

2785f22

…ptional argument. Multiple other refactors as a consequence + test adjustments.

wip: update BasePerfEstimator to not have functions regarding minimum…

8fdea8f

… chunk size

make CBPE set its own min chunk size

2120365

wip: min chunk size for multivariate

5b59c59

add docs for minimum chunk size

9dd0217

nikml requested a review from nnansters March 18, 2022 22:35

nikml self-assigned this Mar 18, 2022

nnansters added 4 commits March 21, 2022 13:11

Move chunker.split to inheriting drift calculator classes

f250405

Fix missing target values during (old) _minimum_chunk_size calculation

63cab73

Merge remote-tracking branch 'origin/main' into separate-min-chunk1

73dac11

# Conflicts: # nannyml/performance_estimation/confidence_based/cbpe.py # tests/performance_estimation/test_cbpe.py

- Move chunk splitting to PerformanceEstimator subclasses

ea15e41

- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).

nnansters merged commit f90bf24 into main Mar 21, 2022

nnansters deleted the separate-min-chunk1 branch March 21, 2022 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for different functionalities of NannyML to set thier own minimum chunk size #43

Allow for different functionalities of NannyML to set thier own minimum chunk size #43

nikml commented Mar 18, 2022

Allow for different functionalities of NannyML to set thier own minimum chunk size #43

Allow for different functionalities of NannyML to set thier own minimum chunk size #43

Conversation

nikml commented Mar 18, 2022