-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Imputation Fix in erroranalysis
#2436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case for this scenario.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2436 +/- ##
==========================================
+ Coverage 89.74% 92.40% +2.66%
==========================================
Files 122 108 -14
Lines 6747 5415 -1332
==========================================
- Hits 6055 5004 -1051
+ Misses 692 411 -281
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Generating the multilabel & OD dashboards from notebooks would result in an uncaught error statement:
Input X contains NaN. Traceback (most recent call last): File "c:\workspace\rai\responsible-ai-toolbox\erroranalysis\erroranalysis\analyzer\error_analyzer.py", line 480, in compute_importances importances = self._compute_error_correlation( File "c:\workspace\rai\responsible-ai-toolbox\erroranalysis\erroranalysis\analyzer\error_analyzer.py", line 519, in _compute_error_correlation return mutual_info_classif( File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\utils_param_validation.py", line 211, in wrapper return func(*args, **kwargs) File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\feature_selection_mutual_info.py", line 493, in mutual_info_classif return _estimate_mi(X, y, discrete_features, True, n_neighbors, copy, random_state) File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\feature_selection_mutual_info.py", line 258, in _estimate_mi X, y = check_X_y(X, y, accept_sparse="csc", y_numeric=not discrete_target) File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\utils\validation.py", line 1147, in check_X_y X = check_array( File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\utils\validation.py", line 959, in check_array _assert_all_finite( File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\utils\validation.py", line 124, in _assert_all_finite _assert_all_finite_element_wise( File "c:\Users\agemawat\Anaconda3\envs\rai\lib\site-packages\sklearn\utils\validation.py", line 173, in _assert_all_finite_element_wise raise ValueError(msg_err) ValueError: Input X contains NaN.
This was because the imputer in
erroranalysis
was not replacing nans due to non-numeric dtype. This PR enforces a numeric dtype for successful imputation and to remove this uncaught exception.Copilot:
This pull request to the
erroranalysis/erroranalysis
repository adds code to convert theinput_data
array to thefloat
data type in thecompute_importances
method of theErrorAnalyzer
class. This change ensures error-free calculation and imputation of numerical data types.erroranalysis/erroranalysis/analyzer/error_analyzer.py
: Added code to convertinput_data
array tofloat
data type in thecompute_importances
method of theErrorAnalyzer
class.Description
Checklist