You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Function save_time_based_splits in data_utils.py does not support CPU mode correctly. In particular, function _save_time_based_splits_cpu assumes using Rapids libraries, moreover Dask Dataframe seems incorrectly imported.
Bug description
Function save_time_based_splits in data_utils.py does not support CPU mode correctly. In particular, function _save_time_based_splits_cpu assumes using Rapids libraries, moreover Dask Dataframe seems incorrectly imported.
Steps/Code to reproduce bug
Using code from examples, just with option CPU set to True (https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/main/examples/getting-started-session-based/01-ETL-with-NVTabular.ipynb)
sessions_gdf = df.read_parquet(BASE_PATH / "processed_nvt/part_0.parquet")
from transformers4rec.utils.data_utils import save_time_based_splits
save_time_based_splits(
data=nvt.Dataset(sessions_gdf),
output_dir=BASE_PATH / f"session_by_day",
partition_col="day-first",
timestamp_col="session_id",
cpu=True
)
Expected behavior
No exception is thrown and data are splitted.
Environment details
Additional context
The text was updated successfully, but these errors were encountered: