fix indexing error when creating a filtered error analysis tree view with a dataset that contains categoricals #2026
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fix indexing error when creating a filtered error analysis tree view with a dataset that contains categoricals.
Also added repro/test to validate the fix. The indexing was incorrect - we were using row_index array although the categorical indexed data had already been indexed/sampled. Replacing row_index with : resolved the error.
Original user error information:
I used my existing dashboard, created a cohort (True Y = 0) and switched to that cohort. This created the following error:
Failed to retrieve selected tree map metric: Failed to generate json tree representation,inner error: Traceback (most recent call last): File "/azureml-envs/responsibleai-0.24/lib/python3.8/site-packages/raiwidgets/responsibleai_dashboard_input.py", line 140, in debug_ml tree = self._error_analyzer.compute_error_tree_on_dataset( File "/azureml-envs/responsibleai-0.24/lib/python3.8/site-packages/erroranalysis/analyzer/error_analyzer.py", line 361, in compute_error_tree_on_dataset return _compute_error_tree_on_dataset( File "/azureml-envs/responsibleai-0.24/lib/python3.8/site-packages/erroranalysis/_internal/surrogate_error_tree.py", line 154, in compute_error_tree_on_dataset booster, dataset_indexed_df, cat_info = get_surrogate_booster_local( File "/azureml-envs/responsibleai-0.24/lib/python3.8/site-packages/erroranalysis/_internal/surrogate_error_tree.py", line 332, in get_surrogate_booster_local input_data[:, c_i] = string_indexed_data[row_index, idx] IndexError: index 760 is out of bounds for axis 0 with size 759
Checklist