[FEATURE] Replace SASRec TF with PyTorch version #2111

miguelgfierro · 2024-06-17T14:56:04Z

Description

SASRec tests are disabled: https://github.com/recommenders-team/recommenders/blob/main/tests/ci/azureml_tests/test_groups.py#L410
We could replace the TF algo with https://github.com/microsoft/UniRec/blob/main/unirec/model/sequential/sasrec.py

Expected behavior with the suggested feature

Branch: https://github.com/recommenders-team/recommenders/tree/miguel/sasrec_unirec

Tasks:

Create unit tests of the classes
Make sure the original script runs
Create a functional test with the minimal parts of the code training movielens

Other Comments

miguelgfierro · 2024-07-05T09:45:17Z

pytest -s tests/unit/recommenders/models/test_unirec_model.py::test_sasrec_train --disable-warnings

miguelgfierro · 2024-07-05T09:46:29Z

import cvxpy as cp
E   ModuleNotFoundError: No module named 'cvxpy'

solved with pip install cvxpy

another error:

FAILED tests/unit/recommenders/models/test_unirec_model.py::test_sasrec_train - ModuleNotFoundError: No module named 'feather'

solved by installing install feather-format

miguelgfierro · 2024-07-05T10:25:54Z

FIXED

 @pytest.mark.gpu
    def test_sasrec_train(base_config, unirec_config_path):
        # config = copy.deepcopy(base_config)
        # yaml_file = os.path.join(unirec_config_path, "model", "SASRec.yaml")
        # config.update(load_yaml(yaml_file))
    
        # model = SASRec(config)
        import copy
        import datetime
        from recommenders.models.unirec.main import main
    
        GLOBAL_CONF = {
            # "config_dir": f"{os.path.join(unirec_config_path, 'unirec', 'config')}",
            "config_dir": unirec_config_path,
            "exp_name": "pytest",
            "checkpoint_dir": f'{datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}',
            "model": "",
            "dataloader": "SeqRecDataset",
            "dataset": "",
            "dataset_path": os.path.join(unirec_config_path, "tests/.temp/data"),
            "output_path": "",
            "learning_rate": 0.001,
            "dropout_prob": 0.0,
            "embedding_size": 32,
            "hidden_size": 32,
            "use_pre_item_emb": 0,
            "loss_type": "bce",
            "max_seq_len": 10,
            "has_user_bias": 1,
            "has_item_bias": 1,
            "epochs": 1,
            "early_stop": -1,
            "batch_size": 512,
            "n_sample_neg_train": 9,
            "valid_protocol": "one_vs_all",
            "test_protocol": "one_vs_all",
            "grad_clip_value": 0.1,
            "weight_decay": 1e-6,
            "history_mask_mode": "autoagressive",
            "user_history_filename": "user_history",
            "metrics": "['hit@5;10', 'ndcg@5;10']",
            "key_metric": "ndcg@5",
            "num_workers": 4,
            "num_workers_test": 0,
            "verbose": 2,
            "neg_by_pop_alpha": 0.0,
            "conv_size": 10,  # for ConvFormer-series
        }
        config = copy.deepcopy(GLOBAL_CONF)
        config["task"] = "train"
        config["dataset_path"] = os.path.join(config["dataset_path"], "ml-100k")
        config["dataset"] = "ml-100k"
        config["model"] = "SASRec"
        config["output_path"] = os.path.join(unirec_config_path, f"tests/.temp/output/")
>       result = main.run(config)

tests/unit/recommenders/models/test_unirec_model.py:146: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
recommenders/models/unirec/main/main.py:676: in run
    res = main(config, accelerator)
recommenders/models/unirec/main/main.py:357: in main
    user2history, user2history_time = get_user_history(
recommenders/models/unirec/main/main.py:137: in get_user_history
    user2history, user2history_time = general.load_user_history(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

file_path = '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/data/ml-100k', file_name = 'user_history', n_users = 940, format = 'user-item_seq', time_seq = 0

    def load_user_history(
        file_path, file_name, n_users=None, format="user-item", time_seq=0
    ):
        if os.path.exists(os.path.join(file_path, file_name + ".ftr")):
            df = pd.read_feather(os.path.join(file_path, file_name + ".ftr"))
        elif os.path.exists(os.path.join(file_path, file_name + ".pkl")):
            df = load_pkl_obj(os.path.join(file_path, file_name + ".pkl"))
        else:
>           raise NotImplementedError(
                "Unsupported user history file type: {0}".format(file_name)
            )
E           NotImplementedError: Unsupported user history file type: user_history

recommenders/models/unirec/utils/general.py:134: NotImplementedError
----------------------------------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------------------------------
INFO     SASRec-pytest:logger.py:61 config={'gpu_id': 0, 'use_gpu': True, 'seed': 2022, 'state': 'INFO', 'verbose': 2, 'saved': True, 'use_tensorboard': False, 'use_wandb': False, 'init_method': 'normal', 'init_std': 0.02, 'init_mean': 0.0, 'scheduler': 'reduce', 'scheduler_factor': 0.1, 'time_seq': 0, 'seq_last': False, 'has_user_emb': False, 'has_user_bias': 1, 'has_item_bias': 1, 'use_features': False, 'use_text_emb': False, 'use_position_emb': True, 'load_pretrained_model': False, 'embedding_size': 32, 'hidden_size': 32, 'inner_size': 512, 'dropout_prob': 0.0, 'epochs': 1, 'batch_size': 512, 'learning_rate': 0.001, 'optimizer': 'adam', 'eval_step': 1, 'early_stop': -1, 'clip_grad_norm': None, 'weight_decay': 1e-06, 'num_workers': 4, 'persistent_workers': False, 'pin_memory': False, 'shuffle_train': False, 'use_pre_item_emb': 0, 'loss_type': 'bce', 'ccl_w': 150, 'ccl_m': 0.4, 'distance_type': 'dot', 'metrics': "['hit@5;10', 'ndcg@5;10']", 'key_metric': 'ndcg@5', 'test_protocol': 'one_vs_all', 'valid_protocol': 'one_vs_all', 'test_batch_size': 100, 'model': 'SASRec', 'dataloader': 'SeqRecDataset', 'max_seq_len': 10, 'history_mask_mode': 'autoagressive', 'tau': 1.0, 'enable_morec': 0, 'morec_objectives': ['fairness', 'alignment', 'revenue'], 'morec_objective_controller': 'PID', 'morec_ngroup': [10, 10, -1], 'morec_alpha': 0.1, 'morec_lambda': 0.2, 'morec_expect_loss': 0.2, 'morec_beta_min': 0.6, 'morec_beta_max': 1.3, 'morec_K_p': 0.01, 'morec_K_i': 0.001, 'morec_objective_weights': '[0.3,0.3,0.4]', 'n_layers': 2, 'n_heads': 16, 'hidden_dropout_prob': 0.5, 'attn_dropout_prob': 0.5, 'hidden_act': 'swish', 'layer_norm_eps': '1e-10', 'group_size': -1, 'n_items': 1017, 'n_neg_test_from_sampling': 0, 'n_neg_train_from_sampling': 0, 'n_neg_valid_from_sampling': 0, 'n_users': 940, 'test_file_format': 'user-item', 'train_file_format': 'user-item', 'user_history_file_format': 'user-item_seq', 'valid_file_format': 'user-item', 'base_model': 'GRU', 'freeze': 0, 'train_type': 'Base', 'config_dir': PosixPath('/home/u/MS/recommenders/recommenders/models/unirec/config'), 'exp_name': 'SASRec-pytest', 'checkpoint_dir': '2024-07-05_12-25-03', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/data/ml-100k', 'output_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/output/', 'n_sample_neg_train': 9, 'grad_clip_value': 0.1, 'user_history_filename': 'user_history', 'num_workers_test': 0, 'neg_by_pop_alpha': 0.0, 'conv_size': 10, 'task': 'train', 'cmd_args': {'base_model': 'GRU', 'freeze': 0, 'train_type': 'Base', 'config_dir': PosixPath('/home/u/MS/recommenders/recommenders/models/unirec/config'), 'exp_name': 'SASRec-pytest', 'checkpoint_dir': '2024-07-05_12-25-03', 'model': 'SASRec', 'dataloader': 'SeqRecDataset', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/data/ml-100k', 'output_path': '/home/u/MS/recommenders/recommenders/models/unirec/config/tests/.temp/output/', 'learning_rate': 0.001, 'dropout_prob': 0.0, 'embedding_size': 32, 'hidden_size': 32, 'use_pre_item_emb': 0, 'loss_type': 'bce', 'max_seq_len': 10, 'has_user_bias': 1, 'has_item_bias': 1, 'epochs': 1, 'early_stop': -1, 'batch_size': 512, 'n_sample_neg_train': 9, 'valid_protocol': 'one_vs_all', 'test_protocol': 'one_vs_all', 'grad_clip_value': 0.1, 'weight_decay': 1e-06, 'history_mask_mode': 'autoagressive', 'user_history_filename': 'user_history', 'metrics': "['hit@5;10', 'ndcg@5;10']", 'key_metric': 'ndcg@5', 'num_workers': 4, 'num_workers_test': 0, 'verbose': 2, 'neg_by_pop_alpha': 0.0, 'conv_size': 10, 'task': 'train', 'logger_time_str': '2024-07-05_122503', 'logger_rand': 91}, 'device': device(type='cpu'), 'logger_time_str': '2024-07-05_122503', 'logger_rand': 91}
INFO     SASRec-pytest:main.py:136 Loading user history from user_history ...
================================================================================================== short test summary info ==================================================================================================
FAILED tests/unit/recommenders/models/test_unirec_model.py::test_sasrec_train - NotImplementedError: Unsupported user history file type: user_history

If I download the original repo and run the tests, I get the same error:

TOL = 0.05
ABS_TOL = 0.05

GLOBAL_CONF = {
    "config_dir": f"{os.path.join(UNIREC_PATH, 'unirec', 'config')}",
    "exp_name": "pytest",
    "checkpoint_dir": f'{datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")}',
    "model": "",
    "dataloader": "SeqRecDataset",
    "dataset": "",
    "dataset_path": os.path.join(UNIREC_PATH, "tests/.temp/data"),
    "output_path": "",
    "learning_rate": 0.001,
    "dropout_prob": 0.0,
    "embedding_size": 32,
    "hidden_size": 32,
    "use_pre_item_emb": 0,
    "loss_type": "bce",
    "max_seq_len": 10,
    "has_user_bias": 1,
    "has_item_bias": 1,
    "epochs": 1,  # 3, # MIG
    "early_stop": -1,
    "batch_size": 512,
    "n_sample_neg_train": 9,
    "valid_protocol": "one_vs_all",
    "test_protocol": "one_vs_all",
    "grad_clip_value": 0.1,
    "weight_decay": 1e-6,
    "history_mask_mode": "autoagressive",
    "user_history_filename": "user_history",
    "metrics": "['hit@5;10', 'ndcg@5;10']",
    "key_metric": "ndcg@5",
    "num_workers": 4,
    "num_workers_test": 0,
    "verbose": 2,
    "neg_by_pop_alpha": 0.0,
    "conv_size": 10,  # for ConvFormer-series
}

# SEQ_MODELS = ["SVDPlusPlus", "FASTConvFormer", "ConvFormer", "SASRec", "AvgHist", "GRU", "AttHist"]  # Each test is ordered according to the list
# SEQ_MODELS = ["SASRec"]
SEQ_MODELS = ["SVDPlusPlus"]
LOSS_TYPES = ["bce", "bpr", "softmax", "ccl", "fullsoftmax"]
EXPECTED_METRICS = {
    "SVDPlusPlus": {"hit@5": 0.04792, "ndcg@5": 0.03394},
    "FASTConvFormer": {"hit@5": 0.05005, "ndcg@5": 0.03355},
    "ConvFormer": {"hit@5": 0.05005, "ndcg@5": 0.03538},
    "SASRec": {"hit@5": 0.04792, "ndcg@5": 0.03184},
    "AvgHist": {"hit@5": 0.05005, "ndcg@5": 0.03423},
    "GRU": {"hit@5": 0.04686, "ndcg@5": 0.03197},
    "AttHist": {"hit@5": 0.04686, "ndcg@5": 0.03221},
    "SASRec_bce": {"hit@5": 0.04792, "ndcg@5": 0.03184},
    "SASRec_bpr": {"hit@5": 0.04686, "ndcg@5": 0.03122},
    "SASRec_softmax": {"hit@5": 0.04686, "ndcg@5": 0.03066},
    "SASRec_ccl": {"hit@5": 0.02449, "ndcg@5": 0.01318},
    "SASRec_fullsoftmax": {"hit@5": 0.04792, "ndcg@5": 0.03155},
    "SASRec_with_text_emb": {"hit@5": 0.04686, "ndcg@5": 0.03219},
    "SASRec_with_max_len": {"hit@5": 0.04686, "ndcg@5": 0.03122},
}


# >>>>>> Test train pipeline of sequential models and check the performance
# Note: the test instance should be put in the first place because the model checkpoint files generated here are required in following tests
@pytest.mark.parametrize(
    "data, models, expected_values", [("ml-100k", SEQ_MODELS, EXPECTED_METRICS)]
)
def test_train_pipeline(data, models, expected_values):
    all_result = {}
    # finish all training first for following evaluation and infer test
    for model in models:
        config = copy.deepcopy(GLOBAL_CONF)
        config["task"] = "train"
        config["dataset_path"] = os.path.join(config["dataset_path"], data)
        config["dataset"] = data
        config["model"] = model
        config["output_path"] = os.path.join(
            UNIREC_PATH, f"tests/.temp/output/{data}/{model}"
        )
        result = main.run(config)
        all_result[model] = result

    # check the performance
    failed_models = []
    for model in models:
        exp_value = expected_values[model]
        result = all_result[model]
        for k, v in exp_value.items():
            if not result[k] == pytest.approx(v, rel=TOL, abs=ABS_TOL):
                failed_models.append(model)
                break
    assert (
        len(failed_models) == 0
    ), f"performance of [{', '.join(failed_models)}] not correct."


$ tests/test_model/test_seq_model_mig.py F                                                                               [100%]

========================================================================================================= FAILURES =========================================================================================================
__________________________________________________________________________________ test_train_pipeline[ml-100k-models0-expected_values0] ___________________________________________________________________________________

data = 'ml-100k', models = ['SVDPlusPlus']
expected_values = {'AttHist': {'hit@5': 0.04686, 'ndcg@5': 0.03221}, 'AvgHist': {'hit@5': 0.05005, 'ndcg@5': 0.03423}, 'ConvFormer': {'hit@5': 0.05005, 'ndcg@5': 0.03538}, 'FASTConvFormer': {'hit@5': 0.05005, 'ndcg@5': 0.03355}, ...}

    @pytest.mark.parametrize(
        "data, models, expected_values", [("ml-100k", SEQ_MODELS, EXPECTED_METRICS)]
    )
    def test_train_pipeline(data, models, expected_values):
        all_result = {}
        # finish all training first for following evaluation and infer test
        for model in models:
            config = copy.deepcopy(GLOBAL_CONF)
            config["task"] = "train"
            config["dataset_path"] = os.path.join(config["dataset_path"], data)
            config["dataset"] = data
            config["model"] = model
            config["output_path"] = os.path.join(
                UNIREC_PATH, f"tests/.temp/output/{data}/{model}"
            )
>           result = main.run(config)

tests/test_model/test_seq_model_mig.py:97: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
unirec/main/main.py:492: in run
    res = main(config, accelerator)
unirec/main/main.py:272: in main
    user2history, user2history_time = get_user_history(user2history, user2history_time, config, DATA_TRAIN_NAME)
unirec/main/main.py:116: in get_user_history
    user2history, user2history_time = general.load_user_history(file_path, _user_history_filename, config['n_users'], _user_history_data_format, config['time_seq'])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

file_path = '/home/u/MS/UniRec/tests/.temp/data/ml-100k', file_name = 'user_history', n_users = 940, format = 'user-item_seq', time_seq = 0

    def load_user_history(file_path, file_name, n_users=None, format='user-item', time_seq=0):
        if os.path.exists(os.path.join(file_path, file_name + '.ftr')):
            df = pd.read_feather(os.path.join(file_path, file_name + '.ftr'))
        elif os.path.exists(os.path.join(file_path, file_name + '.pkl')):
            df = load_pkl_obj(os.path.join(file_path, file_name + '.pkl'))
        else:
>           raise NotImplementedError("Unsupported user history file type: {0}".format(file_name) )
E           NotImplementedError: Unsupported user history file type: user_history

unirec/utils/general.py:117: NotImplementedError

We need to download the dataset python download_split_ml100k.py and preprocess it sh preprocess_ml100k.sh before being able to run the training.

miguelgfierro · 2024-08-26T14:58:08Z

Work so far: staging...miguel/sasrec_unirec

Next step is to create a unit test called test_sasrec_train which should train sasrec with the minimum set of options on a dummy dataset. We should first make sure that the code with result = main.run(config) runs, and then, replace it with the minimum set of functions.

The steps should follow the structure of https://github.com/recommenders-team/recommenders/blob/main/examples/00_quick_start/sar_movielens.ipynb:

Data loading
split train and test iterators
instantiate the model
train the model

if we want, we can also do:

evaluate
get metrics

After this, we will create a notebook explaining an end 2 end case with a real dataset, and we will replace the TF notebook.

Tasks:

Understand the extra dependencies I need to add: only accelerate and cvxpy. Both seem strong libraries.
Understand python download_split_ml100k.py.
Understand sh preprocess_ml100k.sh.
Understand result = main.run(config)
Replicate and adapt python download_split_ml100k.py.
Replicate and adapt sh preprocess_ml100k.sh.
Replicate and adapt result = main.run(config)

miguelgfierro · 2025-01-10T11:39:48Z

FIXED

pytest tests/unit/recommenders/datasets/test_pandas_df_utils.py

Review test test_filter_k_interactions:


@pytest.fixture(scope="function")
def sample_df():
    return pd.DataFrame({"user_id": [1, 9, 3, 5, 5, 1], "item_id": [1, 6, 7, 6, 8, 9]})


def test_filter_k_interactions(sample_df):
    # Test with simple filtering
    result = filter_k_interactions(
        sample_df, user_k=2, item_k=2, user_col="user_id", item_col="item_id"
    )
    assert result.shape == (4, 2)  # Only users 1, 5 and items 6, 1 should remain
    assert set(result["user_id"].unique()) == {1, 5}
    assert set(result["item_id"].unique()) == {1, 6}

    # # No change expected
    # result = filter_k_interactions(
    #     sample_df, user_k=1, item_k=1, user_col="user_id", item_col="item_id"
    # )
    # pd.testing.assert_frame_equal(result, sample_df)

    # # High thresholds should result in an empty DataFrame
    # result = filter_k_interactions(
    #     sample_df, user_k=5, item_k=5, user_col="user_id", item_col="item_id"
    # )
    # assert result.empty

    # # Test with max iterations
    # result = filter_k_interactions(
    #     sample_df,
    #     user_k=2,
    #     item_k=2,
    #     max_iter=1,
    #     user_col="user_id",
    #     item_col="item_id",
    # )
    # # Since we're limited to one iteration, only initial filtering occurs
    # assert result.shape == (4, 2)

    # # Test with very few data points
    # small_df = sample_df.iloc[:3]
    # result = filter_k_interactions(
    #     small_df, user_k=2, item_k=2, user_col="user_id", item_col="item_id"
    # )
    # assert result.empty  # Since no user or item has 2 interactions

miguelgfierro · 2025-02-03T20:54:15Z

How to improve the performance if pytorch training and data loading: https://x.com/akshay_pachaar/status/1886089698709541357?t=JIFInvkryY_fpZLrT4FOUQ&s=08

Set pin_memory=True in the DataLoader object.
During data transfer, use: .to(device, non_blocking=True)

miguelgfierro · 2025-02-06T13:12:59Z

The trainer works, with 2 epochs:

$ sh train_seq_model_ml100k_mig.sh
2025-02-06 14:10:43.004901: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-06 14:10:43.005076: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-06 14:10:43.156582: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-06 14:10:43.385174: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-06 14:10:45.294745: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Load configuration files from unirec/config
Writing logs to /home/u/MS/UniRec/output/ml-100k/SASRec/train/SASRec-SASRec-ml-100k.2025-02-06_141050.20.txt
[INFO] SASRec-SASRec-ml-100k: config={'gpu_id': 0, 'use_gpu': True, 'seed': 2022, 'state': 'INFO', 'verbose': 2, 'saved': True, 'use_tensorboard': False, 'use_wandb': 0, 'init_method': 'normal', 'init_std': 0.02, 'init_mean': 0.0, 'scheduler': 'reduce', 'scheduler_factor': 0.1, 'time_seq': 0, 'seq_last': False, 'has_user_emb': False, 'has_user_bias': 0, 'has_item_bias': 0, 'use_features': False, 'use_text_emb': False, 'use_position_emb': True, 'load_pretrained_model': False, 'embedding_size': 64, 'inner_size': 512, 'dropout_prob': 0.0, 'epochs': 3, 'batch_size': 512, 'learning_rate': 0.001, 'optimizer': 'adam', 'eval_step': 1, 'early_stop': 10, 'clip_grad_norm': None, 'weight_decay': 1e-06, 'num_workers': 4, 'persistent_workers': False, 'pin_memory': False, 'shuffle_train': False, 'use_pre_item_emb': 0, 'loss_type': 'bce', 'ccl_w': 150, 'ccl_m': 0.4, 'distance_type': 'dot', 'metrics': "['hit@10;20', 'ndcg@10;20']", 'key_metric': 'ndcg@10', 'test_protocol': 'one_vs_all', 'valid_protocol': 'one_vs_all', 'test_batch_size': 100, 'model': 'SASRec', 'dataloader': 'SeqRecDataset', 'max_seq_len': 10, 'history_mask_mode': 'autoregressive', 'tau': 1.0, 'enable_morec': 0, 'morec_objectives': ['fairness', 'alignment', 'revenue'], 'morec_objective_controller': 'PID', 'morec_ngroup': [10, 10, -1], 'morec_alpha': 0.1, 'morec_lambda': 0.2, 'morec_expect_loss': 0.2, 'morec_beta_min': 0.6, 'morec_beta_max': 1.3, 'morec_K_p': 0.01, 'morec_K_i': 0.001, 'morec_objective_weights': '[0.3,0.3,0.4]', 'n_layers': 2, 'n_heads': 16, 'hidden_dropout_prob': 0.5, 'attn_dropout_prob': 0.5, 'hidden_act': 'swish', 'layer_norm_eps': '1e-10', 'group_size': -1, 'n_items': 1017, 'n_neg_test_from_sampling': 0, 'n_neg_train_from_sampling': 0, 'n_neg_valid_from_sampling': 0, 'n_users': 940, 'test_file_format': 'user-item', 'train_file_format': 'user-item', 'user_history_file_format': 'user-item_seq', 'valid_file_format': 'user-item', 'base_model': 'GRU', 'config_dir': 'unirec/config', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/UniRec/data/ml-100k', 'exp_name': 'SASRec-SASRec-ml-100k', 'freeze': 0, 'grad_clip_value': -1.0, 'hidden_size': 64, 'n_sample_neg_train': 5, 'neg_by_pop_alpha': 1.0, 'num_workers_test': 0, 'output_path': '/home/u/MS/UniRec/output/ml-100k/SASRec/train', 'train_type': 'Base', 'user_history_filename': 'user_history', 'cmd_args': {'base_model': 'GRU', 'batch_size': 512, 'config_dir': 'unirec/config', 'dataloader': 'SeqRecDataset', 'dataset': 'ml-100k', 'dataset_path': '/home/u/MS/UniRec/data/ml-100k', 'dropout_prob': 0.0, 'early_stop': 10, 'embedding_size': 64, 'epochs': 3, 'exp_name': 'SASRec-SASRec-ml-100k', 'freeze': 0, 'grad_clip_value': -1.0, 'has_item_bias': 0, 'has_user_bias': 0, 'hidden_size': 64, 'history_mask_mode': 'autoregressive', 'key_metric': 'ndcg@10', 'learning_rate': 0.001, 'loss_type': 'bce', 'max_seq_len': 10, 'metrics': "['hit@10;20', 'ndcg@10;20']", 'model': 'SASRec', 'n_sample_neg_train': 5, 'neg_by_pop_alpha': 1.0, 'num_workers': 4, 'num_workers_test': 0, 'output_path': '/home/u/MS/UniRec/output/ml-100k/SASRec/train', 'test_protocol': 'one_vs_all', 'train_type': 'Base', 'use_pre_item_emb': 0, 'use_wandb': 0, 'user_history_file_format': 'user-item_seq', 'user_history_filename': 'user_history', 'valid_protocol': 'one_vs_all', 'verbose': 2, 'weight_decay': 1e-06, 'logger_time_str': '2025-02-06_141050', 'logger_rand': 20}, 'device': device(type='cuda'), 'task': 'train', 'logger_time_str': '2025-02-06_141050', 'logger_rand': 20}
[INFO] SASRec-SASRec-ml-100k: Loading user history from user_history ...
[INFO] SASRec-SASRec-ml-100k: Done. 940 of users have history.
[INFO] SASRec-SASRec-ml-100k: Constructing dataset of task type: train
[DEBUG] SASRec-SASRec-ml-100k: loading train at 06/02/2025 14:10:50
[DEBUG] SASRec-SASRec-ml-100k: Finished loading train at 06/02/2025 14:10:50
[INFO] SASRec-SASRec-ml-100k: Finished initializing <class 'unirec.data.dataset.seqrecdataset.SeqRecDataset'>
[INFO] SASRec-SASRec-ml-100k: Constructing dataset of task type: valid
[DEBUG] SASRec-SASRec-ml-100k: loading valid at 06/02/2025 14:10:50
[DEBUG] SASRec-SASRec-ml-100k: Finished loading valid at 06/02/2025 14:10:50
[INFO] SASRec-SASRec-ml-100k: Finished initializing <class 'unirec.data.dataset.seqrecdataset.SeqRecDataset'>
[INFO] SASRec-SASRec-ml-100k: SASRec(
  (scorer_layers): InnerProductScorer()
  (item_embedding): Embedding(1017, 64, padding_idx=0)
  (position_embedding): Embedding(11, 64)
  (trm_encoder): TransformerEncoder(
    (layer): ModuleList(
      (0-1): 2 x TransformerLayer(
        (multi_head_attention): MultiHeadAttention(
          (query): Linear(in_features=64, out_features=64, bias=True)
          (key): Linear(in_features=64, out_features=64, bias=True)
          (value): Linear(in_features=64, out_features=64, bias=True)
          (attn_dropout): Dropout(p=0.5, inplace=False)
          (dense): Linear(in_features=64, out_features=64, bias=True)
          (LayerNorm): LayerNorm((64,), eps=1e-10, elementwise_affine=True)
          (out_dropout): Dropout(p=0.5, inplace=False)
        )
        (feed_forward): FeedForward(
          (dense_1): Linear(in_features=64, out_features=512, bias=True)
          (intermediate_act_fn): SiLU()
          (dense_2): Linear(in_features=512, out_features=64, bias=True)
          (LayerNorm): LayerNorm((64,), eps=1e-10, elementwise_affine=True)
          (dropout): Dropout(p=0.5, inplace=False)
        )
      )
    )
  )
  (LayerNorm): LayerNorm((64,), eps=1e-10, elementwise_affine=True)
  (dropout): Dropout(p=0.5, inplace=False)
)
Trainable parameter number: 231936
All trainable parameters:
item_embedding.weight : torch.Size([1017, 64])
position_embedding.weight : torch.Size([11, 64])
trm_encoder.layer.0.multi_head_attention.query.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.query.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.key.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.key.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.value.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.value.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.dense.weight : torch.Size([64, 64])
trm_encoder.layer.0.multi_head_attention.dense.bias : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.0.multi_head_attention.LayerNorm.bias : torch.Size([64])
trm_encoder.layer.0.feed_forward.dense_1.weight : torch.Size([512, 64])
trm_encoder.layer.0.feed_forward.dense_1.bias : torch.Size([512])
trm_encoder.layer.0.feed_forward.dense_2.weight : torch.Size([64, 512])
trm_encoder.layer.0.feed_forward.dense_2.bias : torch.Size([64])
trm_encoder.layer.0.feed_forward.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.0.feed_forward.LayerNorm.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.query.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.query.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.key.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.key.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.value.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.value.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.dense.weight : torch.Size([64, 64])
trm_encoder.layer.1.multi_head_attention.dense.bias : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.1.multi_head_attention.LayerNorm.bias : torch.Size([64])
trm_encoder.layer.1.feed_forward.dense_1.weight : torch.Size([512, 64])
trm_encoder.layer.1.feed_forward.dense_1.bias : torch.Size([512])
trm_encoder.layer.1.feed_forward.dense_2.weight : torch.Size([64, 512])
trm_encoder.layer.1.feed_forward.dense_2.bias : torch.Size([64])
trm_encoder.layer.1.feed_forward.LayerNorm.weight : torch.Size([64])
trm_encoder.layer.1.feed_forward.LayerNorm.bias : torch.Size([64])
LayerNorm.weight : torch.Size([64])
LayerNorm.bias : torch.Size([64])
[DEBUG] SASRec-SASRec-ml-100k: >> Valid before training...
[INFO] SASRec-SASRec-ml-100k: one_vs_all
Evaluate: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.03s/it]
[INFO] SASRec-SASRec-ml-100k: epoch 0 evaluating [time: 4.19s, ndcg@10: 0.004780]
[INFO] SASRec-SASRec-ml-100k: complete scores on valid set:
hit@10:0.013844515441959531 hit@20:0.027689030883919063 ndcg@10:0.004779694781511818 ndcg@20:0.008302492357862928
[INFO] SASRec-SASRec-ml-100k: Saving best model at epoch 0 to /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth
[INFO] SASRec-SASRec-ml-100k:
>> epoch 1
Train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:05<00:00, 29.67it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 1 training [time: 5.19s, train loss: 74.5613]
[INFO] SASRec-SASRec-ml-100k: one_vs_all
Evaluate: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 10.08it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 1 evaluating [time: 0.20s, ndcg@10: 0.029712]
[INFO] SASRec-SASRec-ml-100k: complete scores on valid set:
hit@10:0.054313099041533544 hit@20:0.09052183173588925 ndcg@10:0.029712144917684428 ndcg@20:0.03871867505038797
[INFO] SASRec-SASRec-ml-100k: Saving best model at epoch 1 to /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth
[INFO] SASRec-SASRec-ml-100k: epoch: 1, learning rate: 0.001
[INFO] SASRec-SASRec-ml-100k:
>> epoch 2
Train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:05<00:00, 30.54it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 2 training [time: 5.04s, train loss: 62.4723]
[INFO] SASRec-SASRec-ml-100k: one_vs_all
Evaluate: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.81it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 2 evaluating [time: 0.21s, ndcg@10: 0.072690]
[INFO] SASRec-SASRec-ml-100k: complete scores on valid set:
hit@10:0.1288604898828541 hit@20:0.2161874334398296 ndcg@10:0.07268984035761394 ndcg@20:0.09433535815642462
[INFO] SASRec-SASRec-ml-100k: Saving best model at epoch 2 to /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth
[INFO] SASRec-SASRec-ml-100k: epoch: 2, learning rate: 0.001
[INFO] SASRec-SASRec-ml-100k:
>> epoch 3
Train: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:04<00:00, 31.77it/s]
[INFO] SASRec-SASRec-ml-100k: epoch 3 training [time: 4.85s, train loss: 54.1195]
[INFO] SASRec-SASRec-ml-100k: Constructing dataset of task type: test
[DEBUG] SASRec-SASRec-ml-100k: loading test at 06/02/2025 14:11:12
[DEBUG] SASRec-SASRec-ml-100k: Finished loading test at 06/02/2025 14:11:12
[INFO] SASRec-SASRec-ml-100k: Finished initializing <class 'unirec.data.dataset.seqrecdataset.SeqRecDataset'>
[INFO] SASRec-SASRec-ml-100k: one_vs_all
[INFO] SASRec-SASRec-ml-100k: Loading model from /home/u/MS/UniRec/output/ml-100k/SASRec/train/checkpoint_2025-02-06_141050_20/SASRec-SASRec-ml-100k.pth. The best epoch was 2
Evaluate: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 27.06it/s]
[INFO] SASRec-SASRec-ml-100k: best valid : {'hit@10': 0.1288604898828541, 'hit@20': 0.2161874334398296, 'ndcg@10': 0.07268984035761394, 'ndcg@20': 0.09433535815642462}
[INFO] SASRec-SASRec-ml-100k: test result: {'hit@10': 0.1395101171458999, 'hit@20': 0.22364217252396165, 'ndcg@10': 0.07270366471075033, 'ndcg@20': 0.09373327528448706}
[INFO] SASRec-SASRec-ml-100k: Saving test result to /home/u/MS/UniRec/output/ml-100k/SASRec/train/result_SASRec-SASRec-ml-100k.2025-02-06_141050.20.tsv ...
[INFO] SASRec-SASRec-ml-100k: Mission complete. Time elapsed: 0.39 minutes.
Logger close successfully.
```

miguelgfierro added the enhancement New feature or request label Jun 17, 2024

This was referenced Aug 12, 2024

[BUG] tensorflow-estimator is removed from tensorflow 2.16.1 #2072

Open

Changing Function Name to reflect new Tensorflow interface. #2143

Merged

Updated function signatures to comply with new tensorflow requirements #2146

Merged

miguelgfierro self-assigned this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Replace SASRec TF with PyTorch version #2111

[FEATURE] Replace SASRec TF with PyTorch version #2111

miguelgfierro commented Jun 17, 2024 •

edited

Loading

miguelgfierro commented Jul 5, 2024

miguelgfierro commented Jul 5, 2024 •

edited

Loading

miguelgfierro commented Jul 5, 2024 •

edited

Loading

miguelgfierro commented Aug 26, 2024 •

edited

Loading

miguelgfierro commented Jan 10, 2025 •

edited

Loading

miguelgfierro commented Feb 3, 2025

miguelgfierro commented Feb 6, 2025

[FEATURE] Replace SASRec TF with PyTorch version #2111

[FEATURE] Replace SASRec TF with PyTorch version #2111

Comments

miguelgfierro commented Jun 17, 2024 • edited Loading

Description

Expected behavior with the suggested feature

Other Comments

miguelgfierro commented Jul 5, 2024

miguelgfierro commented Jul 5, 2024 • edited Loading

miguelgfierro commented Jul 5, 2024 • edited Loading

miguelgfierro commented Aug 26, 2024 • edited Loading

miguelgfierro commented Jan 10, 2025 • edited Loading

miguelgfierro commented Feb 3, 2025

miguelgfierro commented Feb 6, 2025

miguelgfierro commented Jun 17, 2024 •

edited

Loading

miguelgfierro commented Jul 5, 2024 •

edited

Loading

miguelgfierro commented Jul 5, 2024 •

edited

Loading

miguelgfierro commented Aug 26, 2024 •

edited

Loading

miguelgfierro commented Jan 10, 2025 •

edited

Loading