Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for different functionalities of NannyML to set thier own minimum chunk size #43

Merged
merged 12 commits into from
Mar 21, 2022

Conversation

nikml
Copy link
Contributor

@nikml nikml commented Mar 18, 2022

  • Univariate Drift has it's own minimum chunk size
  • Mutltivariate Drift has it's own minimum chunk size
  • Previous minimum chunk size now only used in Performance Estimation
  • Refactoring of chunker to not have a minimum chunk size property. It is now an argument for the split method and specified by Monitoring Classes.
  • Update docs to describe approach for minimum chunk size and relevant warning.

@nikml nikml requested a review from nnansters March 18, 2022 22:35
@nikml nikml self-assigned this Mar 18, 2022
# Conflicts:
#	nannyml/performance_estimation/confidence_based/cbpe.py
#	tests/performance_estimation/test_cbpe.py
- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).
@nnansters nnansters merged commit f90bf24 into main Mar 21, 2022
@nnansters nnansters deleted the separate-min-chunk1 branch March 21, 2022 13:05
nnansters added a commit that referenced this pull request Mar 22, 2022
…um chunk size (#43)

* wip1: default min chunk size function for BaseDriftCalculator

* wip2: min default chunk size for BasePerformanceEstimator

* wip3 - perf est?

* Big chunker refactor. Minimum chunk size moved to split function as optional argument.

Multiple other refactors as a consequence + test adjustments.

* wip: update BasePerfEstimator to not have functions regarding minimum chunk size

* make CBPE set its own min chunk size

* wip: min chunk size for multivariate

* add docs for minimum chunk size

* Move chunker.split to inheriting drift calculator classes

* Fix missing target values during (old) _minimum_chunk_size calculation

* - Move chunk splitting to PerformanceEstimator subclasses
- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).

Co-authored-by: Niels Nuyttens <[email protected]>
nnansters added a commit that referenced this pull request Mar 23, 2022
…um chunk size (#43)

* wip1: default min chunk size function for BaseDriftCalculator

* wip2: min default chunk size for BasePerformanceEstimator

* wip3 - perf est?

* Big chunker refactor. Minimum chunk size moved to split function as optional argument.

Multiple other refactors as a consequence + test adjustments.

* wip: update BasePerfEstimator to not have functions regarding minimum chunk size

* make CBPE set its own min chunk size

* wip: min chunk size for multivariate

* add docs for minimum chunk size

* Move chunker.split to inheriting drift calculator classes

* Fix missing target values during (old) _minimum_chunk_size calculation

* - Move chunk splitting to PerformanceEstimator subclasses
- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).

Co-authored-by: Niels Nuyttens <[email protected]>
nnansters added a commit that referenced this pull request Mar 23, 2022
* Created step plot functionality, created artificial endpoint generation, separated legend label arguments form hover label arguments, small improved to legend generation, added incomplete target functionality, created reference implementation for: (1) target distribution monitoring (2) realised performance monitoring (3) correct naming of plotting elements.

* multivariate drift bugfix and doc update (#37)

* [skip ci] Updated the changelog

* doc and testing updates (#38)

* 39 continuous distribution plots scaling got wrong (#40)

* fix scaling

* update docs plots

* Check if calibration is needed before performing CBPE estimation (#42)

* - CBPE will check if calibration is beneficial during fitting. If not, calibration will not be performed.
- Calibration is not required when roc_auc_score == 1 (perfect predictor)

* Deal with indexing issues when using StratifiedShuffleSplit indexes on subsets

* needs_calibration threshold with some margin

* Debug results messing up fitting

* Include realized performance in CBPE results

* Plot realized performance for reference period

* Don't exclude analysis data from realized performance calculation (future work)

* Allow for different functionalities of NannyML to set thier own minimum chunk size (#43)

* wip1: default min chunk size function for BaseDriftCalculator

* wip2: min default chunk size for BasePerformanceEstimator

* wip3 - perf est?

* Big chunker refactor. Minimum chunk size moved to split function as optional argument.

Multiple other refactors as a consequence + test adjustments.

* wip: update BasePerfEstimator to not have functions regarding minimum chunk size

* make CBPE set its own min chunk size

* wip: min chunk size for multivariate

* add docs for minimum chunk size

* Move chunker.split to inheriting drift calculator classes

* Fix missing target values during (old) _minimum_chunk_size calculation

* - Move chunk splitting to PerformanceEstimator subclasses
- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).

Co-authored-by: Niels Nuyttens <[email protected]>

* Updated CHANGELOG.md

* Bump version: 0.2.0 → 0.2.1

* Update CHANGELOG.md

* Update CHANGELOG.md

* Feature: performance calculation (#44)

* Add predicted probabilities to metadata

* Predicted labels should be predicted scores for CBPE

Co-authored-by: Nikolaos Perrakis <[email protected]>
Co-authored-by: jakubnml <[email protected]>

* typo fix (#45)

* Stricter constraints for scipy

* Fixes:
- using predicted labels during univariate continuous drift calculation
- exclude detected predicted probabilities columns from feature list during metadata extraction
- use predicted probabilities during drift results plotting
- use predicted probabilities during drifting features ranking

* Fixes:
- Still using predicted labels instead of predicted probabilities in CBPE
- Added test to run CBPE with synthetic example data

* Fixes:
- Added check for metadata.predicted_probability_column_name in univariate drift calculator construction + test
- Fix some broken tests

* Replace line plots by step plots

Co-authored-by: Wiljan Cools <[email protected]>
Co-authored-by: Nikolaos Perrakis <[email protected]>
Co-authored-by: jakubnml <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants