-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: '0', Sampling terminated, GaussianCopulaSynthesizer #2376
Comments
After more invesigation, it seems like rows containing new categories (unseen during fitting) are prone to this error. Are new categories maybe sometimes incorrectly (?) set to With KeyError: 0. Which let's me think the label is set to |
Hi @PieterKnops nice to meet you. I understand that your data is too sensitive to share. For debugging purposes, it would be nice to get other information on the code you're running to instantiate, fit, and sample your GaussianCopulaSynthesizer. For example, are you doing something like this? Are you adding any other customizations to your synthesizer (such as constraints, updating transformers, etc.)? synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample_remaining_columns(??) Additionally, are you able to share your metadata (that just contains column/table names)? You can also anonymize your metadata before sharing. This would greatly speed up the debugging process.
This makes sense, as the conditions that you provide to your synthesizer should be within the bounds of whatever was passed in during However, I'm not sure if that's the root cause of your issue. If I pass in new category values, I just see some warnings -- but I do see that synthetic data for other (valid) rows is correctly sampled.
|
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
I'm generating data with a GaussianCopulaSynthesizer, but during generation, it errors in the following line:
File rdt/transformers/categorical.py:188, in UniformEncoder._transform.<locals>.map_labels(label) dt/transformers/categorical.py:187 def map_labels(label): ---> rdt/transformers/categorical.py:188 return np.random.uniform(self.intervals[label][0], self.intervals[label][1]) KeyError: '0'
Steps to reproduce
I find this something difficult to provide, as the data is confidential and I haven't succeeded in creating a minimal example. Furthermore, the error happens in a different row every time I run this, regardless of the input not changing.
The input doesn't contain empty values, is a mix of integer, float and categorical variables.
I run generation with the sample_missing_columns.
Has anyone experienced something like this? How can I best debug this?
Edits: Formatting
Full stack trace:
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
The text was updated successfully, but these errors were encountered: