Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Replace SASRec TF with PyTorch version #2111

Open
1 of 3 tasks
miguelgfierro opened this issue Jun 17, 2024 · 7 comments
Open
1 of 3 tasks

[FEATURE] Replace SASRec TF with PyTorch version #2111

miguelgfierro opened this issue Jun 17, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@miguelgfierro
Copy link
Collaborator

miguelgfierro commented Jun 17, 2024

Description

SASRec tests are disabled: https://github.com/recommenders-team/recommenders/blob/main/tests/ci/azureml_tests/test_groups.py#L410
We could replace the TF algo with https://github.com/microsoft/UniRec/blob/main/unirec/model/sequential/sasrec.py

Expected behavior with the suggested feature

Branch: https://github.com/recommenders-team/recommenders/tree/miguel/sasrec_unirec

Tasks:

  • Create unit tests of the classes
  • Make sure the original script runs
  • Create a functional test with the minimal parts of the code training movielens

Other Comments

@miguelgfierro miguelgfierro added the enhancement New feature or request label Jun 17, 2024
@miguelgfierro
Copy link
Collaborator Author

pytest -s tests/unit/recommenders/models/test_unirec_model.py::test_sasrec_train --disable-warnings

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Jul 5, 2024

import cvxpy as cp
E   ModuleNotFoundError: No module named 'cvxpy'

solved with pip install cvxpy

another error:

FAILED tests/unit/recommenders/models/test_unirec_model.py::test_sasrec_train - ModuleNotFoundError: No module named 'feather'

solved by installing install feather-format

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Jul 5, 2024

FIXED

 @pytest.mark.gpu
    def test_sasrec_train(base_config, unirec_config_path):
        # config = copy.deepcopy(base_config)
        # yaml_file = os.path.join(unirec_config_path, "model", "SASRec.yaml")
        # config.update(load_yaml(yaml_file))
    
        # model = SASRec(config)
        import copy
        import datetime
        from recommenders.models.unirec.main import main
    
        GLOBAL_CONF = {
            # "config_dir": f"{os.path.join(unirec_config_path, 'unirec', 'config')}",
            "config_dir": unirec_config_path,
            "exp_name": "pytest",
            "checkpoint_dir": f'{datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}',
            "model": "",
            "dataloader": "SeqRecDataset",
            "dataset": "",
            "dataset_path": os.path.join(unirec_config_path, "tests/.temp/data"),
            "output_path": "",
            "learning_rate": 0.001,
            "dropout_prob": 0.0,
            "embedding_size": 32,
            "hidden_size": 32,
            "use_pre_item_emb": 0,
            "loss_type": "bce",
            "max_seq_len": 10,
            "has_user_bias": 1,
            "has_item_bias": 1,
            "epochs": 1,
            "early_stop": -1,
            "batch_size": 512,
            "n_sample_neg_train": 9,
            "valid_protocol": "one_vs_all",
            "test_protocol": "one_vs_all",
            "grad_clip_value": 0.1,
            "weight_decay": 1e-6,
            "history_mask_mode": "autoagressive",
            "user_history_filename": "user_history",
            "metrics": "['hit@5;10', 'ndcg@5;10']",
            "key_metric": "ndcg@5",
            "num_workers": 4,
            "num_workers_test": 0,
            "verbose": 2,
            "neg_by_pop_alpha": 0.0,
            "conv_size": 10,  # for ConvFormer-series
        }
        config = copy.deepcopy(GLOBAL_CONF)
        config["task"] = "train"
        config["dataset_path"] = os.path.join(config["dataset_path"], "ml-100k")
        config["dataset"] = "ml-100k"
        config["model"] = "SASRec"
        config["output_path"] = os.path.join(unirec_config_path, f"tests/.temp/output/")
>       result = main.run(config)

tests/unit/recommenders/models/test_unirec_model.py:146: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
recommenders/models/unirec/main/main.py:676: in run
    res = main(config, accelerator)
recommenders/models/unirec/main/main.py:357: in main
    user2history, user2history_time = get_user_history(
recommenders/models/unirec/main/main.py:137: in get_user_history
    user2history, user2history_time = general.load_user_history(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

file_path = '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/data/ml-100k', file_name = 'user_history', n_users = 940, format = 'user-item_seq', time_seq = 0

    def load_user_history(
        file_path, file_name, n_users=None, format="user-item", time_seq=0
    ):
        if os.path.exists(os.path.join(file_path, file_name + ".ftr")):
            df = pd.read_feather(os.path.join(file_path, file_name + ".ftr"))
        elif os.path.exists(os.path.join(file_path, file_name + ".pkl")):
            df = load_pkl_obj(os.path.join(file_path, file_name + ".pkl"))
        else:
>           raise NotImplementedError(
                "Unsupported user history file type: {0}".format(file_name)
            )
E           NotImplementedError: Unsupported user history file type: user_history

recommenders/models/unirec/utils/general.py:134: NotImplementedError
----------------------------------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------------------------------
INFO     SASRec-pytest:logger.py:61 config={'gpu_id': 0, 'use_gpu': True, 'seed': 2022, 'state': 'INFO', 'verbose': 2, 'saved': True, 'use_tensorboard': False, 'use_wandb': False, 'init_method': 'normal', 'init_std': 0.02, 'init_mean': 0.0, 'scheduler': 'reduce', 'scheduler_factor': 0.1, 'time_seq': 0, 'seq_last': False, 'has_user_emb': False, 'has_user_bias': 1, 'has_item_bias': 1, 'use_features': False, 'use_text_emb': False, 'use_position_emb': True, 'load_pretrained_model': False, 'embedding_size': 32, 'hidden_size': 32, 'inner_size': 512, 'dropout_prob': 0.0, 'epochs': 1, 'batch_size': 512, 'learning_rate': 0.001, 'optimizer': 'adam', 'eval_step': 1, 'early_stop': -1, 'clip_grad_norm': None, 'weight_decay': 1e-06, 'num_workers': 4, 'persistent_workers': False, 'pin_memory': False, 'shuffle_train': False, 'use_pre_item_emb': 0, 'loss_type': 'bce', 'ccl_w': 150, 'ccl_m': 0.4, 'distance_type': 'dot', 'metrics': "['hit@5;10', 'ndcg@5;10']", 'key_metric': 'ndcg@5', 'test_protocol': 'one_vs_all', 'valid_protocol': 'one_vs_all', 'test_batch_size': 100, 'model': 'SASRec', 'dataloader': 'SeqRecDataset', 'max_seq_len': 10, 'history_mask_mode': 'autoagressive', 'tau': 1.0, 'enable_morec': 0, 'morec_objectives': ['fairness', 'alignment', 'revenue'], 'morec_objective_controller': 'PID', 'morec_ngroup': [10, 10, -1], 'morec_alpha': 0.1, 'morec_lambda': 0.2, 'morec_expect_loss': 0.2, 'morec_beta_min': 0.6, 'morec_beta_max': 1.3, 'morec_K_p': 0.01, 'morec_K_i': 0.001, 'morec_objective_weights': '[0.3,0.3,0.4]', 'n_layers': 2, 'n_heads': 16, 'hidden_dropout_prob': 0.5, 'attn_dropout_prob': 0.5, 'hidden_act': 'swish', 'layer_norm_eps': '1e-10', 'group_size': -1, 'n_items': 1017, 'n_neg_test_from_sampling': 0, 'n_neg_train_from_sampling': 0, 'n_neg_valid_from_sampling': 0, 'n_users': 940, 'test_file_format': 'user-item', 'train_file_format': 'user-item', 'user_history_file_format': 'user-item_seq', 'valid_file_format': 'user-item', 'base_model': 'GRU', 'freeze': 0, 'train_type': 'Base', 'config_dir': PosixPath('/home/u/MS/recommenders/recommenders/models/unirec/config'), 'exp_name': 'SASRec-pytest', 'checkpoint_dir': '2024-07-05_12-25-03', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/data/ml-100k', 'output_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/output/', 'n_sample_neg_train': 9, 'grad_clip_value': 0.1, 'user_history_filename': 'user_history', 'num_workers_test': 0, 'neg_by_pop_alpha': 0.0, 'conv_size': 10, 'task': 'train', 'cmd_args': {'base_model': 'GRU', 'freeze': 0, 'train_type': 'Base', 'config_dir': PosixPath('/home/u/MS/recommenders/recommenders/models/unirec/config'), 'exp_name': 'SASRec-pytest', 'checkpoint_dir': '2024-07-05_12-25-03', 'model': 'SASRec', 'dataloader': 'SeqRecDataset', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/data/ml-100k', 'output_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/output/', 'learning_rate': 0.001, 'dropout_prob': 0.0, 'embedding_size': 32, 'hidden_size': 32, 'use_pre_item_emb': 0, 'loss_type': 'bce', 'max_seq_len': 10, 'has_user_bias': 1, 'has_item_bias': 1, 'epochs': 1, 'early_stop': -1, 'batch_size': 512, 'n_sample_neg_train': 9, 'valid_protocol': 'one_vs_all', 'test_protocol': 'one_vs_all', 'grad_clip_value': 0.1, 'weight_decay': 1e-06, 'history_mask_mode': 'autoagressive', 'user_history_filename': 'user_history', 'metrics': "['hit@5;10', 'ndcg@5;10']", 'key_metric': 'ndcg@5', 'num_workers': 4, 'num_workers_test': 0, 'verbose': 2, 'neg_by_pop_alpha': 0.0, 'conv_size': 10, 'task': 'train', 'logger_time_str': '2024-07-05_122503', 'logger_rand': 91}, 'device': device(type='cpu'), 'logger_time_str': '2024-07-05_122503', 'logger_rand': 91}
INFO     SASRec-pytest:main.py:136 Loading user history from user_history ...
================================================================================================== short test summary info ==================================================================================================
FAILED tests/unit/recommenders/models/test_unirec_model.py::test_sasrec_train - NotImplementedError: Unsupported user history file type: user_history

If I download the original repo and run the tests, I get the same error:

TOL = 0.05
ABS_TOL = 0.05

GLOBAL_CONF = {
    "config_dir": f"{os.path.join(UNIREC_PATH, 'unirec', 'config')}",
    "exp_name": "pytest",
    "checkpoint_dir": f'{datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}',
    "model": "",
    "dataloader": "SeqRecDataset",
    "dataset": "",
    "dataset_path": os.path.join(UNIREC_PATH, "tests/.temp/data"),
    "output_path": "",
    "learning_rate": 0.001,
    "dropout_prob": 0.0,
    "embedding_size": 32,
    "hidden_size": 32,
    "use_pre_item_emb": 0,
    "loss_type": "bce",
    "max_seq_len": 10,
    "has_user_bias": 1,
    "has_item_bias": 1,
    "epochs": 1,  # 3, # MIG
    "early_stop": -1,
    "batch_size": 512,
    "n_sample_neg_train": 9,
    "valid_protocol": "one_vs_all",
    "test_protocol": "one_vs_all",
    "grad_clip_value": 0.1,
    "weight_decay": 1e-6,
    "history_mask_mode": "autoagressive",
    "user_history_filename": "user_history",
    "metrics": "['hit@5;10', 'ndcg@5;10']",
    "key_metric": "ndcg@5",
    "num_workers": 4,
    "num_workers_test": 0,
    "verbose": 2,
    "neg_by_pop_alpha": 0.0,
    "conv_size": 10,  # for ConvFormer-series
}

# SEQ_MODELS = ["SVDPlusPlus", "FASTConvFormer", "ConvFormer", "SASRec", "AvgHist", "GRU", "AttHist"]  # Each test is ordered according to the list
# SEQ_MODELS = ["SASRec"]
SEQ_MODELS = ["SVDPlusPlus"]
LOSS_TYPES = ["bce", "bpr", "softmax", "ccl", "fullsoftmax"]
EXPECTED_METRICS = {
    "SVDPlusPlus": {"hit@5": 0.04792, "ndcg@5": 0.03394},
    "FASTConvFormer": {"hit@5": 0.05005, "ndcg@5": 0.03355},
    "ConvFormer": {"hit@5": 0.05005, "ndcg@5": 0.03538},
    "SASRec": {"hit@5": 0.04792, "ndcg@5": 0.03184},
    "AvgHist": {"hit@5": 0.05005, "ndcg@5": 0.03423},
    "GRU": {"hit@5": 0.04686, "ndcg@5": 0.03197},
    "AttHist": {"hit@5": 0.04686, "ndcg@5": 0.03221},
    "SASRec_bce": {"hit@5": 0.04792, "ndcg@5": 0.03184},
    "SASRec_bpr": {"hit@5": 0.04686, "ndcg@5": 0.03122},
    "SASRec_softmax": {"hit@5": 0.04686, "ndcg@5": 0.03066},
    "SASRec_ccl": {"hit@5": 0.02449, "ndcg@5": 0.01318},
    "SASRec_fullsoftmax": {"hit@5": 0.04792, "ndcg@5": 0.03155},
    "SASRec_with_text_emb": {"hit@5": 0.04686, "ndcg@5": 0.03219},
    "SASRec_with_max_len": {"hit@5": 0.04686, "ndcg@5": 0.03122},
}


# >>>>>> Test train pipeline of sequential models and check the performance
# Note: the test instance should be put in the first place because the model checkpoint files generated here are required in following tests
@pytest.mark.parametrize(
    "data, models, expected_values", [("ml-100k", SEQ_MODELS, EXPECTED_METRICS)]
)
def test_train_pipeline(data, models, expected_values):
    all_result = {}
    # finish all training first for following evaluation and infer test
    for model in models:
        config = copy.deepcopy(GLOBAL_CONF)
        config["task"] = "train"
        config["dataset_path"] = os.path.join(config["dataset_path"], data)
        config["dataset"] = data
        config["model"] = model
        config["output_path"] = os.path.join(
            UNIREC_PATH, f"tests/.temp/output/{data}/{model}"
        )
        result = main.run(config)
        all_result[model] = result

    # check the performance
    failed_models = []
    for model in models:
        exp_value = expected_values[model]
        result = all_result[model]
        for k, v in exp_value.items():
            if not result[k] == pytest.approx(v, rel=TOL, abs=ABS_TOL):
                failed_models.append(model)
                break
    assert (
        len(failed_models) == 0
    ), f"performance of [{', '.join(failed_models)}] not correct."


$ tests/test_model/test_seq_model_mig.py F                                                                               [100%]

========================================================================================================= FAILURES =========================================================================================================
__________________________________________________________________________________ test_train_pipeline[ml-100k-models0-expected_values0] ___________________________________________________________________________________

data = 'ml-100k', models = ['SVDPlusPlus']
expected_values = {'AttHist': {'hit@5': 0.04686, 'ndcg@5': 0.03221}, 'AvgHist': {'hit@5': 0.05005, 'ndcg@5': 0.03423}, 'ConvFormer': {'hit@5': 0.05005, 'ndcg@5': 0.03538}, 'FASTConvFormer': {'hit@5': 0.05005, 'ndcg@5': 0.03355}, ...}

    @pytest.mark.parametrize(
        "data, models, expected_values", [("ml-100k", SEQ_MODELS, EXPECTED_METRICS)]
    )
    def test_train_pipeline(data, models, expected_values):
        all_result = {}
        # finish all training first for following evaluation and infer test
        for model in models:
            config = copy.deepcopy(GLOBAL_CONF)
            config["task"] = "train"
            config["dataset_path"] = os.path.join(config["dataset_path"], data)
            config["dataset"] = data
            config["model"] = model
            config["output_path"] = os.path.join(
                UNIREC_PATH, f"tests/.temp/output/{data}/{model}"
            )
>           result = main.run(config)

tests/test_model/test_seq_model_mig.py:97: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
unirec/main/main.py:492: in run
    res = main(config, accelerator)
unirec/main/main.py:272: in main
    user2history, user2history_time = get_user_history(user2history, user2history_time, config, DATA_TRAIN_NAME)
unirec/main/main.py:116: in get_user_history
    user2history, user2history_time = general.load_user_history(file_path, _user_history_filename, config['n_users'], _user_history_data_format, config['time_seq'])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

file_path = '/home/u/MS/UniRec/tests/.temp/data/ml-100k', file_name = 'user_history', n_users = 940, format = 'user-item_seq', time_seq = 0

    def load_user_history(file_path, file_name, n_users=None, format='user-item', time_seq=0):
        if os.path.exists(os.path.join(file_path, file_name + '.ftr')):
            df = pd.read_feather(os.path.join(file_path, file_name + '.ftr'))
        elif os.path.exists(os.path.join(file_path, file_name + '.pkl')):
            df = load_pkl_obj(os.path.join(file_path, file_name + '.pkl'))
        else:
>           raise NotImplementedError("Unsupported user history file type: {0}".format(file_name) )
E           NotImplementedError: Unsupported user history file type: user_history

unirec/utils/general.py:117: NotImplementedError

We need to download the dataset python download_split_ml100k.py and preprocess it sh preprocess_ml100k.sh before being able to run the training.

@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Aug 26, 2024

Work so far: staging...miguel/sasrec_unirec

Next step is to create a unit test called test_sasrec_train which should train sasrec with the minimum set of options on a dummy dataset. We should first make sure that the code with result = main.run(config) runs, and then, replace it with the minimum set of functions.

The steps should follow the structure of https://github.com/recommenders-team/recommenders/blob/main/examples/00_quick_start/sar_movielens.ipynb:

  • Data loading
  • split train and test iterators
  • instantiate the model
  • train the model

if we want, we can also do:

  • evaluate
  • get metrics

After this, we will create a notebook explaining an end 2 end case with a real dataset, and we will replace the TF notebook.

Tasks:

  • Understand the extra dependencies I need to add: only accelerate and cvxpy. Both seem strong libraries.
  • Understand python download_split_ml100k.py.
  • Understand sh preprocess_ml100k.sh.
  • Understand result = main.run(config)
  • Replicate and adapt python download_split_ml100k.py.
  • Replicate and adapt sh preprocess_ml100k.sh.
  • Replicate and adapt result = main.run(config)

@miguelgfierro miguelgfierro self-assigned this Nov 18, 2024
@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Jan 10, 2025

FIXED

pytest tests/unit/recommenders/datasets/test_pandas_df_utils.py

Review test test_filter_k_interactions:


@pytest.fixture(scope="function")
def sample_df():
    return pd.DataFrame({"user_id": [1, 9, 3, 5, 5, 1], "item_id": [1, 6, 7, 6, 8, 9]})


def test_filter_k_interactions(sample_df):
    # Test with simple filtering
    result = filter_k_interactions(
        sample_df, user_k=2, item_k=2, user_col="user_id", item_col="item_id"
    )
    assert result.shape == (4, 2)  # Only users 1, 5 and items 6, 1 should remain
    assert set(result["user_id"].unique()) == {1, 5}
    assert set(result["item_id"].unique()) == {1, 6}

    # # No change expected
    # result = filter_k_interactions(
    #     sample_df, user_k=1, item_k=1, user_col="user_id", item_col="item_id"
    # )
    # pd.testing.assert_frame_equal(result, sample_df)

    # # High thresholds should result in an empty DataFrame
    # result = filter_k_interactions(
    #     sample_df, user_k=5, item_k=5, user_col="user_id", item_col="item_id"
    # )
    # assert result.empty

    # # Test with max iterations
    # result = filter_k_interactions(
    #     sample_df,
    #     user_k=2,
    #     item_k=2,
    #     max_iter=1,
    #     user_col="user_id",
    #     item_col="item_id",
    # )
    # # Since we're limited to one iteration, only initial filtering occurs
    # assert result.shape == (4, 2)

    # # Test with very few data points
    # small_df = sample_df.iloc[:3]
    # result = filter_k_interactions(
    #     small_df, user_k=2, item_k=2, user_col="user_id", item_col="item_id"
    # )
    # assert result.empty  # Since no user or item has 2 interactions

@miguelgfierro
Copy link
Collaborator Author

How to improve the performance if pytorch training and data loading: https://x.com/akshay_pachaar/status/1886089698709541357?t=JIFInvkryY_fpZLrT4FOUQ&s=08

  • Set pin_memory=True in the DataLoader object.
  • During data transfer, use: .to(device, non_blocking=True)

@miguelgfierro
Copy link
Collaborator Author

The trainer works, with 2 epochs:

$ sh train_seq_model_ml100k_mig.sh
2025-02-06 14:10:43.004901: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-06 14:10:43.005076: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-06 14:10:43.156582: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-06 14:10:43.385174: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-06 14:10:45.294745: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Load configuration files from unirec/config
Writing logs to /home/u/MS/UniRec/output/ml-100k/SASRec/train/SASRec-SASRec-ml-100k.2025-02-06_141050.20.txt
[INFO] SASRec-SASRec-ml-100k: config={'gpu_id': 0, 'use_gpu': True, 'seed': 2022, 'state': 'INFO', 'verbose': 2, 'saved': True, 'use_tensorboard': False, 'use_wandb': 0, 'init_method': 'normal', 'init_std': 0.02, 'init_mean': 0.0, 'scheduler': 'reduce', 'scheduler_factor': 0.1, 'time_seq': 0, 'seq_last': False, 'has_user_emb': False, 'has_user_bias': 0, 'has_item_bias': 0, 'use_features': False, 'use_text_emb': False, 'use_position_emb': True, 'load_pretrained_model': False, 'embedding_size': 64, 'inner_size': 512, 'dropout_prob': 0.0, 'epochs': 3, 'batch_size': 512, 'learning_rate': 0.001, 'optimizer': 'adam', 'eval_step': 1, 'early_stop': 10, 'clip_grad_norm': None, 'weight_decay': 1e-06, 'num_workers': 4, 'persistent_workers': False, 'pin_memory': False, 'shuffle_train': False, 'use_pre_item_emb': 0, 'loss_type': 'bce', 'ccl_w': 150, 'ccl_m': 0.4, 'distance_type': 'dot', 'metrics': "['hit@10;20', 'ndcg@10;20']", 'key_metric': 'ndcg@10', 'test_protocol': 'one_vs_all', 'valid_protocol': 'one_vs_all', 'test_batch_size': 100, 'model': 'SASRec', 'dataloader': 'SeqRecDataset', 'max_seq_len': 10, 'history_mask_mode': 'autoregressive', 'tau': 1.0, 'enable_morec': 0, 'morec_objectives': ['fairness', 'alignment', 'revenue'], 'morec_objective_controller': 'PID', 'morec_ngroup': [10, 10, -1], 'morec_alpha': 0.1, 'morec_lambda': 0.2, 'morec_expect_loss': 0.2, 'morec_beta_min': 0.6, 'morec_beta_max': 1.3, 'morec_K_p': 0.01, 'morec_K_i': 0.001, 'morec_objective_weights': '[0.3,0.3,0.4]', 'n_layers': 2, 'n_heads': 16, 'hidden_dropout_prob': 0.5, 'attn_dropout_prob': 0.5, 'hidden_act': 'swish', 'layer_norm_eps': '1e-10', 'group_size': -1, 'n_items': 1017, 'n_neg_test_from_sampling': 0, 'n_neg_train_from_sampling': 0, 'n_neg_valid_from_sampling': 0, 'n_users': 940, 'test_file_format': 'user-item', 'train_file_format': 'user-item', 'user_history_file_format': 'user-item_seq', 'valid_file_format': 'user-item', 'base_model': 'GRU', 'config_dir': 'unirec/config', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/UniRec/data/ml-100k', 'exp_name': 'SASRec-SASRec-ml-100k', 'freeze': 0, 'grad_clip_value': -1.0, 'hidden_size': 64, 'n_sample_neg_train': 5, 'neg_by_pop_alpha': 1.0, 'num_workers_test': 0, 'output_path': '/home/u/MS/UniRec/output/ml-100k/SASRec/train', 'train_type': 'Base', 'user_history_filename': 'user_history', 'cmd_args': {'base_model': 'GRU', 'batch_size': 512, 'config_dir': 'unirec/config', 'dataloader': 'SeqRecDataset', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/UniRec/data/ml-100k', 'dropout_prob': 0.0, 'early_stop': 10, 'embedding_size': 64, 'epochs': 3, 'exp_name': 'SASRec-SASRec-ml-100k', 'freeze': 0, 'grad_clip_value': -1.0, 'has_item_bias': 0, 'has_user_bias': 0, 'hidden_size': 64, 'history_mask_mode': 'autoregressive', 'key_metric': 'ndcg@10', 'learning_rate': 0.001, 'loss_type': 'bce', 'max_seq_len': 10, 'metrics': "['hit@10;20', 'ndcg@10;20']", 'model': 'SASRec', 'n_sample_neg_train': 5, 'neg_by_pop_alpha': 1.0, 'num_workers': 4, 'num_workers_test': 0, 'output_path': '/home/u/MS/UniRec/output/ml-100k/SASRec/train', 'test_protocol': 'one_vs_all', 'train_type': 'Base', 'use_pre_item_emb': 0, 'use_wandb': 0, 'user_history_file_format': 'user-item_seq', 'user_history_filename': 'user_history', 'valid_protocol': 'one_vs_all', 'verbose': 2, 'weight_decay': 1e-06, 'logger_time_str': '2025-02-06_141050', 'logger_rand': 20}, 'device': device(type='cuda'), 'task': 'train', 'logger_time_str': '2025-02-06_141050', 'logger_rand': 20}
[INFO] SASRec-SASRec-ml-100k: Loading user history from user_history ...
[INFO] SASRec-SASRec-ml-100k: Done. 940 of users have history.
[INFO] SASRec-SASRec-ml-100k: Constructing dataset of task type: train
[DEBUG] SASRec-SASRec-ml-100k: loading train at 06/02/2025 14:10:50
[DEBUG] SASRec-SASRec-ml-100k: Finished loading train at 06/02/2025 14:10:50
[INFO] SASRec-SASRec-ml-100k: Finished initializing <class 'unirec.data.dataset.seqrecdataset.SeqRecDataset'>
[INFO] SASRec-SASRec-ml-100k: Constructing dataset of task type: valid
[DEBUG] SASRec-SASRec-ml-100k: loading valid at 06/02/2025 14:10:50
[DEBUG] SASRec-SASRec-ml-100k: Finished loading valid at 06/02/2025 14:10:50
[INFO] SASRec-SASRec-ml-100k: Finished initializing <class 'unirec.data.dataset.seqrecdataset.SeqRecDataset'>
[INFO] SASRec-SASRec-ml-100k: SASRec(
  (scorer_layers): InnerProductScorer()
  (item_embedding): Embedding(1017, 64, padding_idx=0)
  (position_embedding): Embedding(11, 64)
  (trm_encoder): TransformerEncoder(
    (layer): ModuleList(
      (0-1): 2 x TransformerLayer(
        (multi_head_attention): MultiHeadAttention(
          (query): Linear(in_features=64, out_features=64, bias=True)
          (key): Linear(in_features=64, out_features=64, bias=True)
          (value): Linear(in_features=64, out_features=64, bias=True)
          (attn_dropout): Dropout(p=0.5, inplace=False)
          (dense): Linear(in_features=64, out_features=64, bias=True)
          (LayerNorm): LayerNorm((64,), eps=1e-10, elementwise_affine=True)
          (out_dropout): Dropout(p=0.5, inplace=False)
        )
        (feed_forward): FeedForward(
          (dense_1): Linear(in_features=64, out_features=512, bias=True)
          (intermediate_act_fn): SiLU()
          (dense_2): Linear(in_features=512, out_features=64, bias=True)
          (LayerNorm): LayerNorm((64,), eps=1e-10, elementwise_affine=True)
          (dropout): Dropout(p=0.5, inplace=False)
        )
      )
    )
  )
  (LayerNorm): LayerNorm((64,), eps=1e-10, elementwise_affine=True)
  (dropout): Dropout(p=0.5, inplace=False)
)
Trainable parameter number: 231936
All trainable parameters:
item_embedding.weight : torch.Size([1017, 64])
position_embedding.weight : torch.Size([11, 64])
trm_encoder.layer.0.multi_head_attention.query.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.query.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.key.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.key.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.value.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.value.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.dense.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.dense.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.LayerNorm.bias : torch.Size([64])
trm_encoder.layer.0.feed_forward.dense_1.weight : torch.Size([512, 64])
trm_encoder.layer.0.feed_forward.dense_1.bias : torch.Size([512])
trm_encoder.layer.0.feed_forward.dense_2.weight : torch.Size([64, 512])
trm_encoder.layer.0.feed_forward.dense_2.bias : torch.Size([64])
trm_encoder.layer.0.feed_forward.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.0.feed_forward.LayerNorm.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.query.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.query.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.key.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.key.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.value.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.value.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.dense.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.dense.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.LayerNorm.bias : torch.Size([64])
trm_encoder.layer.1.feed_forward.dense_1.weight : torch.Size([512, 64])
trm_encoder.layer.1.feed_forward.dense_1.bias : torch.Size([512])
trm_encoder.layer.1.feed_forward.dense_2.weight : torch.Size([64, 512])
trm_encoder.layer.1.feed_forward.dense_2.bias : torch.Size([64])
trm_encoder.layer.1.feed_forward.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.1.feed_forward.LayerNorm.bias : torch.Size([64])
LayerNorm.weight : torch.Size([64])
LayerNorm.bias : torch.Size([64])
[DEBUG] SASRec-SASRec-ml-100k: >> Valid before training...
[INFO] SASRec-SASRec-ml-100k: one_vs_all
Evaluate: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.03s/it]
[INFO] SASRec-SASRec-ml-100k: epoch 0 evaluating [time: 4.19s, ndcg@10: 0.004780]
[INFO] SASRec-SASRec-ml-100k: complete scores on valid set:
hit@10:0.013844515441959531 hit@20:0.027689030883919063 ndcg@10:0.004779694781511818 ndcg@20:0.008302492357862928
[INFO] SASRec-SASRec-ml-100k: Saving best model at epoch 0 to /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth
[INFO] SASRec-SASRec-ml-100k:
>> epoch 1
Train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:05<00:00, 29.67it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 1 training [time: 5.19s, train loss: 74.5613]
[INFO] SASRec-SASRec-ml-100k: one_vs_all
Evaluate: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 10.08it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 1 evaluating [time: 0.20s, ndcg@10: 0.029712]
[INFO] SASRec-SASRec-ml-100k: complete scores on valid set:
hit@10:0.054313099041533544 hit@20:0.09052183173588925 ndcg@10:0.029712144917684428 ndcg@20:0.03871867505038797
[INFO] SASRec-SASRec-ml-100k: Saving best model at epoch 1 to /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth
[INFO] SASRec-SASRec-ml-100k: epoch: 1, learning rate: 0.001
[INFO] SASRec-SASRec-ml-100k:
>> epoch 2
Train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:05<00:00, 30.54it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 2 training [time: 5.04s, train loss: 62.4723]
[INFO] SASRec-SASRec-ml-100k: one_vs_all
Evaluate: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.81it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 2 evaluating [time: 0.21s, ndcg@10: 0.072690]
[INFO] SASRec-SASRec-ml-100k: complete scores on valid set:
hit@10:0.1288604898828541 hit@20:0.2161874334398296 ndcg@10:0.07268984035761394 ndcg@20:0.09433535815642462
[INFO] SASRec-SASRec-ml-100k: Saving best model at epoch 2 to /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth
[INFO] SASRec-SASRec-ml-100k: epoch: 2, learning rate: 0.001
[INFO] SASRec-SASRec-ml-100k:
>> epoch 3
Train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:04<00:00, 31.77it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 3 training [time: 4.85s, train loss: 54.1195]
[INFO] SASRec-SASRec-ml-100k: Constructing dataset of task type: test
[DEBUG] SASRec-SASRec-ml-100k: loading test at 06/02/2025 14:11:12
[DEBUG] SASRec-SASRec-ml-100k: Finished loading test at 06/02/2025 14:11:12
[INFO] SASRec-SASRec-ml-100k: Finished initializing <class 'unirec.data.dataset.seqrecdataset.SeqRecDataset'>
[INFO] SASRec-SASRec-ml-100k: one_vs_all
[INFO] SASRec-SASRec-ml-100k: Loading model from /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth. The best epoch was 2
Evaluate: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 27.06it/s]
[INFO] SASRec-SASRec-ml-100k: best valid : {'hit@10': 0.1288604898828541, 'hit@20': 0.2161874334398296, 'ndcg@10': 0.07268984035761394, 'ndcg@20': 0.09433535815642462}
[INFO] SASRec-SASRec-ml-100k: test result: {'hit@10': 0.1395101171458999, 'hit@20': 0.22364217252396165, 'ndcg@10': 0.07270366471075033, 'ndcg@20': 0.09373327528448706}
[INFO] SASRec-SASRec-ml-100k: Saving test result to /home/u/MS/UniRec/output/ml-100k/SASRec/train/result_SASRec-SASRec-ml-100k.2025-02-06_141050.20.tsv ...
[INFO] SASRec-SASRec-ml-100k: Mission complete. Time elapsed: 0.39 minutes.
Logger close successfully.
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant