Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval does not share encoding transformers #250

Closed
bvanbreugel opened this issue Jan 26, 2024 · 0 comments · Fixed by #257
Closed

eval does not share encoding transformers #250

bvanbreugel opened this issue Jan 26, 2024 · 0 comments · Fixed by #257

Comments

@bvanbreugel
Copy link
Contributor

Description

In metrics/eval.py, each dataset (e.g. X_gt, X_syn) is encoded separately. This is problematic, as this fits separate sklearn.preprocessing.LabelEncoder's. This results in unexpected behaviour if the unique elements for each column are not identical for X_gt, X_syn, as in this case the encoding of X_gt does not denote the same variable as in X_syn.

How to Reproduce

from sklearn.preprocessing import LabelEncoder
df_real = LabelEncoder.fit_transform(pd.DataFrame(["0","1", "2"])[0])
>>> [0,1,2]
df_syn = LabelEncoder.fit_transform(pd.DataFrame(["1","2", "2"])[0])
>>> [0,1,1]

Expected Behavior

Evidently, above we want the processed df_syn to be [1,2,2].

Fix

Seems like we can just get the encoders when calling X_gt.encode(), and pass this to all other encode calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant