Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix total metric changing with different num bins when using quantile binning on diabetes dataset #1233

Merged
merged 1 commit into from
Feb 18, 2022

Conversation

imatiach-msft
Copy link
Contributor

@imatiach-msft imatiach-msft commented Feb 17, 2022

Description

Fix total metric changing with different num bins when using quantile binning on diabetes dataset.
Fixes issue:
#1217

This was fixed by re-binning the quantile binned dataset, as for low precision values pd.qcut miscategorizes data. This is partly fixed by setting the PRECISION parameter to pd.qcut, but it seems for some datasets even lowest precision still has miscategorizations.
Specifically, it can bin some points incorrectly for low precision, for example a point with value
-0.0127796318808497
will be binned incorrectly in interval
(-0.0309423241359475, -0.012779631880849702]
even though it should be binned in the interval above since
-0.0127796318808497 > -0.012779631880849702

Also, during testing I discovered another error, which is also fixed in this PR. This other error is caused by the first quantile bin being incorrectly resized for negative values. For more detailed information please see PR:
#898
Which has a description of some of the limitations of pd.qcut due to the interval brackets. Calculating abs fixes this issue. For the error message please see below:

Traceback (most recent call last):
  File "c:\responsible-ai-toolbox\raiwidgets\raiwidgets\responsibleai_dashboard_input.py", line 102, in matrix
    quantile_binning, num_bins)
  File "c:\responsible-ai-toolbox\erroranalysis\erroranalysis\analyzer\error_analyzer.py", line 257, in compute_matrix
    num_bins=num_bins)
  File "c:\responsible-ai-toolbox\erroranalysis\erroranalysis\_internal\matrix_filter.py", line 221, in compute_matrix
    val_err = cut_err.cat.categories[val_err]
  File "C:\Users\ilmat\Miniconda3\envs\sh\lib\site-packages\pandas\core\indexes\extension.py", line 279, in __getitem__
    result = self._data[key]
  File "C:\Users\ilmat\Miniconda3\envs\sh\lib\site-packages\pandas\core\arrays\interval.py", line 633, in __getitem__
    key = check_array_indexer(self, key)
  File "C:\Users\ilmat\Miniconda3\envs\sh\lib\site-packages\pandas\core\indexers.py", line 573, in check_array_indexer
    raise IndexError("arrays used as indices must be of integer or boolean type")
IndexError: arrays used as indices must be of integer or boolean type

Areas changed

npm packages changed:

  • responsibleai/causality
  • responsibleai/core-ui
  • responsibleai/counterfactuals
  • responsibleai/dataset-explorer
  • responsibleai/fairness
  • responsibleai/interpret
  • responsibleai/localization
  • responsibleai/mlchartlib
  • responsibleai/model-assessment

Python packages changed:

  • raiwidgets
  • responsibleai
  • erroranalysis
  • rai_core_flask

Tests

  • No new tests required.
  • New tests for the added feature are part of this PR.
  • I validated the changes manually.

Screenshots (if appropriate):

Documentation:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2022

Codecov Report

Merging #1233 (256f5bf) into main (1756613) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1233      +/-   ##
==========================================
+ Coverage   65.99%   66.01%   +0.01%     
==========================================
  Files          91       91              
  Lines        4473     4475       +2     
==========================================
+ Hits         2952     2954       +2     
  Misses       1521     1521              
Flag Coverage Δ
unittests 66.01% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ranalysis/erroranalysis/_internal/matrix_filter.py 95.85% <100.00%> (+0.02%) ⬆️
erroranalysis/erroranalysis/version.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1756613...256f5bf. Read the comment docs.

@imatiach-msft imatiach-msft force-pushed the ilmat/fix-quantile-binning-metric-changing branch from 620166c to 256f5bf Compare February 17, 2022 16:57
1 similar comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants