-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for different functionalities of NannyML to set thier own minimum chunk size #43
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
nikml
commented
Mar 18, 2022
- Univariate Drift has it's own minimum chunk size
- Mutltivariate Drift has it's own minimum chunk size
- Previous minimum chunk size now only used in Performance Estimation
- Refactoring of chunker to not have a minimum chunk size property. It is now an argument for the split method and specified by Monitoring Classes.
- Update docs to describe approach for minimum chunk size and relevant warning.
…ptional argument. Multiple other refactors as a consequence + test adjustments.
# Conflicts: # nannyml/performance_estimation/confidence_based/cbpe.py # tests/performance_estimation/test_cbpe.py
- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).
nnansters
added a commit
that referenced
this pull request
Mar 22, 2022
…um chunk size (#43) * wip1: default min chunk size function for BaseDriftCalculator * wip2: min default chunk size for BasePerformanceEstimator * wip3 - perf est? * Big chunker refactor. Minimum chunk size moved to split function as optional argument. Multiple other refactors as a consequence + test adjustments. * wip: update BasePerfEstimator to not have functions regarding minimum chunk size * make CBPE set its own min chunk size * wip: min chunk size for multivariate * add docs for minimum chunk size * Move chunker.split to inheriting drift calculator classes * Fix missing target values during (old) _minimum_chunk_size calculation * - Move chunk splitting to PerformanceEstimator subclasses - Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based). Co-authored-by: Niels Nuyttens <[email protected]>
nnansters
added a commit
that referenced
this pull request
Mar 23, 2022
…um chunk size (#43) * wip1: default min chunk size function for BaseDriftCalculator * wip2: min default chunk size for BasePerformanceEstimator * wip3 - perf est? * Big chunker refactor. Minimum chunk size moved to split function as optional argument. Multiple other refactors as a consequence + test adjustments. * wip: update BasePerfEstimator to not have functions regarding minimum chunk size * make CBPE set its own min chunk size * wip: min chunk size for multivariate * add docs for minimum chunk size * Move chunker.split to inheriting drift calculator classes * Fix missing target values during (old) _minimum_chunk_size calculation * - Move chunk splitting to PerformanceEstimator subclasses - Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based). Co-authored-by: Niels Nuyttens <[email protected]>
nnansters
added a commit
that referenced
this pull request
Mar 23, 2022
* Created step plot functionality, created artificial endpoint generation, separated legend label arguments form hover label arguments, small improved to legend generation, added incomplete target functionality, created reference implementation for: (1) target distribution monitoring (2) realised performance monitoring (3) correct naming of plotting elements. * multivariate drift bugfix and doc update (#37) * [skip ci] Updated the changelog * doc and testing updates (#38) * 39 continuous distribution plots scaling got wrong (#40) * fix scaling * update docs plots * Check if calibration is needed before performing CBPE estimation (#42) * - CBPE will check if calibration is beneficial during fitting. If not, calibration will not be performed. - Calibration is not required when roc_auc_score == 1 (perfect predictor) * Deal with indexing issues when using StratifiedShuffleSplit indexes on subsets * needs_calibration threshold with some margin * Debug results messing up fitting * Include realized performance in CBPE results * Plot realized performance for reference period * Don't exclude analysis data from realized performance calculation (future work) * Allow for different functionalities of NannyML to set thier own minimum chunk size (#43) * wip1: default min chunk size function for BaseDriftCalculator * wip2: min default chunk size for BasePerformanceEstimator * wip3 - perf est? * Big chunker refactor. Minimum chunk size moved to split function as optional argument. Multiple other refactors as a consequence + test adjustments. * wip: update BasePerfEstimator to not have functions regarding minimum chunk size * make CBPE set its own min chunk size * wip: min chunk size for multivariate * add docs for minimum chunk size * Move chunker.split to inheriting drift calculator classes * Fix missing target values during (old) _minimum_chunk_size calculation * - Move chunk splitting to PerformanceEstimator subclasses - Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based). Co-authored-by: Niels Nuyttens <[email protected]> * Updated CHANGELOG.md * Bump version: 0.2.0 → 0.2.1 * Update CHANGELOG.md * Update CHANGELOG.md * Feature: performance calculation (#44) * Add predicted probabilities to metadata * Predicted labels should be predicted scores for CBPE Co-authored-by: Nikolaos Perrakis <[email protected]> Co-authored-by: jakubnml <[email protected]> * typo fix (#45) * Stricter constraints for scipy * Fixes: - using predicted labels during univariate continuous drift calculation - exclude detected predicted probabilities columns from feature list during metadata extraction - use predicted probabilities during drift results plotting - use predicted probabilities during drifting features ranking * Fixes: - Still using predicted labels instead of predicted probabilities in CBPE - Added test to run CBPE with synthetic example data * Fixes: - Added check for metadata.predicted_probability_column_name in univariate drift calculator construction + test - Fix some broken tests * Replace line plots by step plots Co-authored-by: Wiljan Cools <[email protected]> Co-authored-by: Nikolaos Perrakis <[email protected]> Co-authored-by: jakubnml <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.