Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: ensure ordinal encoder of classes is the same in real and synthetic datasets [type:bug] #257

Merged
merged 8 commits into from
Mar 11, 2024

Conversation

bvanbreugel
Copy link
Contributor

@bvanbreugel bvanbreugel commented Feb 22, 2024

Description

Fixes #250 (comment), by ensuring the ordinal encoders are fitted to the real and synthetic data concatenated.

Affected dependencies

None

Notes

  • A quick fix that ensure the same encoder is used for real and synthetic data ordinal encodings. We may need to have a longer discussion on whether this is always preferential, and whether other optional datasets (e.g. augmented, reference) need to also be included in the encoder processing.
  • Also note, Sklearn recommends against using LabelEncoder for features, I have not changed this to keep the PR minimal.
  • When both datasets have the same classes (which was assumed by the previous code), functionality is identical

How has this been tested?

  • Ran quick testing suit, giving "2159 passed, 13 skipped, 106 deselected, 2 xfailed, 2896 warnings"

Checklist

@bvanbreugel bvanbreugel marked this pull request as draft February 22, 2024 10:29
@bvanbreugel bvanbreugel marked this pull request as ready for review February 22, 2024 11:40
@bvanbreugel
Copy link
Contributor Author

@robsdavis Could you approve this please?

@robsdavis
Copy link
Contributor

Hi @bvanbreugel, Thanks for the PR - LGTM! I'm just doing a bug fix and then I'll merge this in next

@robsdavis robsdavis merged commit 41e6e5a into vanderschaarlab:main Mar 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eval does not share encoding transformers
2 participants