This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
How do I ensure no data leakage in the validation split #177
Unanswered
aribornstein
asked this question in
Data / pipelines
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is your question?
I'm working on a dataset that has some identifiable information which would lead to data leakage if not the data is not split properly. Currently the validation split is hard coded in each respective DataModules.
Ideally I'd like a flag that enables me to ensure there is no overlap in the train and validation data on these fields by rebalancing any overlap. Due to the way dataset initialization is hardcoded once the validation dataset is created it becomes immutable.
What is the best way to handle such a check, in Flash?
Beta Was this translation helpful? Give feedback.
All reactions