Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multivariate drift bugfix and doc update #37

Merged
merged 1 commit into from
Mar 15, 2022

Conversation

nikml
Copy link
Contributor

@nikml nikml commented Mar 14, 2022

@nikml nikml requested a review from nnansters March 14, 2022 18:57
@nikml nikml self-assigned this Mar 14, 2022
@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #37 (6dcd725) into main (caa9849) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #37      +/-   ##
==========================================
+ Coverage   68.64%   68.74%   +0.09%     
==========================================
  Files          28       28              
  Lines        1263     1267       +4     
  Branches      239      243       +4     
==========================================
+ Hits          867      871       +4     
  Misses        392      392              
  Partials        4        4              
Impacted Files Coverage Δ
nannyml/drift/data_reconstruction/calculator.py 98.96% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caa9849...6dcd725. Read the comment docs.

@nnansters nnansters merged commit 6388d0e into main Mar 15, 2022
@nnansters
Copy link
Contributor

Good job on the added tests 👍

@nikml nikml deleted the multivariate-drift-update-02 branch March 15, 2022 15:19
nnansters added a commit that referenced this pull request Mar 23, 2022
* Created step plot functionality, created artificial endpoint generation, separated legend label arguments form hover label arguments, small improved to legend generation, added incomplete target functionality, created reference implementation for: (1) target distribution monitoring (2) realised performance monitoring (3) correct naming of plotting elements.

* multivariate drift bugfix and doc update (#37)

* [skip ci] Updated the changelog

* doc and testing updates (#38)

* 39 continuous distribution plots scaling got wrong (#40)

* fix scaling

* update docs plots

* Check if calibration is needed before performing CBPE estimation (#42)

* - CBPE will check if calibration is beneficial during fitting. If not, calibration will not be performed.
- Calibration is not required when roc_auc_score == 1 (perfect predictor)

* Deal with indexing issues when using StratifiedShuffleSplit indexes on subsets

* needs_calibration threshold with some margin

* Debug results messing up fitting

* Include realized performance in CBPE results

* Plot realized performance for reference period

* Don't exclude analysis data from realized performance calculation (future work)

* Allow for different functionalities of NannyML to set thier own minimum chunk size (#43)

* wip1: default min chunk size function for BaseDriftCalculator

* wip2: min default chunk size for BasePerformanceEstimator

* wip3 - perf est?

* Big chunker refactor. Minimum chunk size moved to split function as optional argument.

Multiple other refactors as a consequence + test adjustments.

* wip: update BasePerfEstimator to not have functions regarding minimum chunk size

* make CBPE set its own min chunk size

* wip: min chunk size for multivariate

* add docs for minimum chunk size

* Move chunker.split to inheriting drift calculator classes

* Fix missing target values during (old) _minimum_chunk_size calculation

* - Move chunk splitting to PerformanceEstimator subclasses
- Move roc-auc based min_chunk_size predictor to CBPE (also ROC-AUC based).

Co-authored-by: Niels Nuyttens <[email protected]>

* Updated CHANGELOG.md

* Bump version: 0.2.0 → 0.2.1

* Update CHANGELOG.md

* Update CHANGELOG.md

* Feature: performance calculation (#44)

* Add predicted probabilities to metadata

* Predicted labels should be predicted scores for CBPE

Co-authored-by: Nikolaos Perrakis <[email protected]>
Co-authored-by: jakubnml <[email protected]>

* typo fix (#45)

* Stricter constraints for scipy

* Fixes:
- using predicted labels during univariate continuous drift calculation
- exclude detected predicted probabilities columns from feature list during metadata extraction
- use predicted probabilities during drift results plotting
- use predicted probabilities during drifting features ranking

* Fixes:
- Still using predicted labels instead of predicted probabilities in CBPE
- Added test to run CBPE with synthetic example data

* Fixes:
- Added check for metadata.predicted_probability_column_name in univariate drift calculator construction + test
- Fix some broken tests

* Replace line plots by step plots

Co-authored-by: Wiljan Cools <[email protected]>
Co-authored-by: Nikolaos Perrakis <[email protected]>
Co-authored-by: jakubnml <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data Reconstruction fails when selected features doesn't include a categorical feature.
2 participants