Skip to content

Commit

Permalink
Update data/check_smiles_split.py
Browse files Browse the repository at this point in the history
Co-authored-by: Kevin M Jablonka <[email protected]>
  • Loading branch information
MicPie and kjappelbaum authored Feb 1, 2024
1 parent 5807e44 commit 68b20dc
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion data/check_smiles_split.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
"""This script checks for data leakage in the splits of a tabular dataset."""
"""This script checks for data leakage in the splits of a tabular dataset.
The checks in this script are more general and focus on the `identifier` defined in the `meta.yaml` files.
Errors will be thrown if there are identical identifier values in train/val train/test or val/test sets.
This script uses dask. This might cause some errors with mismatching data types, for which there are currently a few fallbacks.
"""
import os
from glob import glob
from pathlib import Path
Expand Down

0 comments on commit 68b20dc

Please sign in to comment.