We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform Multivariate Drift Analysis with Data Reconstruciton on a dataset without categorical features.
I tried to run the code from Data Reconstruction Deep Dive but the code failed with:
# Let's compute univariate drift rcerror_calculator = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_size=DPP) rcerror_calculator.fit(reference_data=reference) # let's compute (and visualize) results across all the dataset. rcerror_results = rcerror_calculator.calculate(data=data) rcerror_results.data # let's create plot with results figure = rcerror_results.plot() figure.show() figure.write_image(file="butterfly-multivariate-drift.svg") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_10304/2321058283.py in <cell line: 3>() 1 # Let's compute univariate drift 2 rcerror_calculator = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_size=DPP) ----> 3 rcerror_calculator.fit(reference_data=reference) 4 # let's compute (and visualize) results across all the dataset. 5 rcerror_results = rcerror_calculator.calculate(data=data) ~/Source/nannyml/nannyml/drift/base.py in fit(self, reference_data) 165 self.chunker = DefaultChunker(minimum_chunk_size=minimum_chunk_size) 166 --> 167 self._fit(reference_data) 168 169 def _fit(self, reference_data: pd.DataFrame): ~/Source/nannyml/nannyml/drift/data_reconstruction/calculator.py in _fit(self, reference_data) 101 # TODO: We duplicate the reference data 3 times, here. Improve to something more memory efficient? 102 imputed_reference_data = reference_data.copy(deep=True) --> 103 imputed_reference_data[selected_categorical_column_names] = self._imputer_categorical.fit_transform( 104 imputed_reference_data[selected_categorical_column_names] 105 ) ~/.cache/pypoetry/virtualenvs/nannyml-RmJkXFBz-py3.10/lib64/python3.10/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params) 850 if y is None: 851 # fit method of arity 1 (unsupervised transformation) --> 852 return self.fit(X, **fit_params).transform(X) 853 else: 854 # fit method of arity 2 (supervised transformation) ~/.cache/pypoetry/virtualenvs/nannyml-RmJkXFBz-py3.10/lib64/python3.10/site-packages/sklearn/impute/_base.py in fit(self, X, y) 317 Fitted estimator. 318 """ --> 319 X = self._validate_input(X, in_fit=True) 320 321 # default fill_value is 0 for numerical input and "missing_value" ~/.cache/pypoetry/virtualenvs/nannyml-RmJkXFBz-py3.10/lib64/python3.10/site-packages/sklearn/impute/_base.py in _validate_input(self, X, in_fit) 285 raise new_ve from None 286 else: --> 287 raise ve 288 289 _check_inputs_dtype(X, self.missing_values) ~/.cache/pypoetry/virtualenvs/nannyml-RmJkXFBz-py3.10/lib64/python3.10/site-packages/sklearn/impute/_base.py in _validate_input(self, X, in_fit) 268 269 try: --> 270 X = self._validate_data( 271 X, 272 reset=in_fit, ~/.cache/pypoetry/virtualenvs/nannyml-RmJkXFBz-py3.10/lib64/python3.10/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 564 raise ValueError("Validation should be done on X, y or both.") 565 elif not no_val_X and no_val_y: --> 566 X = check_array(X, **check_params) 567 out = X 568 elif no_val_X and not no_val_y: ~/.cache/pypoetry/virtualenvs/nannyml-RmJkXFBz-py3.10/lib64/python3.10/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 663 664 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig): --> 665 dtype_orig = np.result_type(*dtypes_orig) 666 667 if dtype_numeric: <__array_function__ internals> in result_type(*args, **kwargs) ValueError: at least one array or dtype is required
It's likely that the current implementation cannot handle the selected features not having categorical or not having continuous features included.
The text was updated successfully, but these errors were encountered:
nikml
Successfully merging a pull request may close this issue.
Description
Perform Multivariate Drift Analysis with Data Reconstruciton on a dataset without categorical features.
What I Did
I tried to run the code from Data Reconstruction Deep Dive but the code failed with:
It's likely that the current implementation cannot handle the selected features not having categorical or not having continuous features included.
The text was updated successfully, but these errors were encountered: