-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAR sparse multiplcation modification due to a breaking change in scipy #2083
Conversation
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
with scipy 1.11.1, and python 3.9 it works:
With scipy 1.13.0 and python 3.9:
For benchmarking purpose, using the previous version of scipy 1.10.1, it is slower:
|
Error in python 3.11:
|
Signed-off-by: Scott Graham <[email protected]>
…ssue flattening matrix so dataframe can be built correctly
@gramhagen it seems we still have the same error |
oh, i see. I didn't realize that only occurred on scipy 1.13. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace lines 356-363 of sar_singlenode.py with this (adding toarray())
user_min_scores = (
np.tile(counts.min(axis=1).toarray(), test_scores.shape[1])
* self.rating_min
)
user_max_scores = (
np.tile(counts.max(axis=1).toarray(), test_scores.shape[1])
* self.rating_max
)
Signed-off-by: miguelgfierro <[email protected]>
@@ -593,7 +593,7 @@ def predict(self, test): | |||
{ | |||
self.col_user: test[self.col_user].values, | |||
self.col_item: test[self.col_item].values, | |||
self.col_prediction: test_scores[user_ids, item_ids], | |||
self.col_prediction: test_scores[user_ids, item_ids].getA1(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got an error here:
=================================== FAILURES ===================================
_________________________ test_predict[jaccard-False] __________________________
similarity_type = 'jaccard', timedecay_formula = False
train_test_dummy_timestamp = [ UserId MovieId Rating Timestamp
4 2 5 5.0 1535133522
9 2 10 5.0 1535133622
..., UserId MovieId Rating Timestamp
2 1 3 3.0 1535133482
8 2 9 4.0 1535133602]
header = ***'col_item': 'MovieId', 'col_rating': 'Rating', 'col_timestamp': 'Timestamp', 'col_user': 'UserId'***
@pytest.mark.parametrize(
"similarity_type, timedecay_formula", [("jaccard", False), ("lift", True)]
)
def test_predict(
similarity_type, timedecay_formula, train_test_dummy_timestamp, header
):
model = SAR(
similarity_type=similarity_type, timedecay_formula=timedecay_formula, **header
)
trainset, testset = train_test_dummy_timestamp
model.fit(trainset)
> preds = model.predict(testset)
tests/unit/recommenders/models/test_sar_singlenode.py:53:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <recommenders.models.sar.sar_singlenode.SARSingleNode object at 0x1509b192c400>
test = UserId MovieId Rating Timestamp
2 1 3 3.0 1535133482
8 2 9 4.0 1535133602
def predict(self, test):
"""Output SAR scores for only the users-items pairs which are in the test set
Args:
test (pandas.DataFrame): DataFrame that contains users and items to test
Returns:
pandas.DataFrame: DataFrame contains the prediction results
"""
test_scores = self.score(test)
user_ids = np.asarray(
list(
map(
lambda user: self.user2index.get(user, np.NaN),
test[self.col_user].values,
)
)
)
# create mapping of new items to zeros
item_ids = np.asarray(
list(
map(
lambda item: self.item2index.get(item, np.NaN),
test[self.col_item].values,
)
)
)
nans = np.isnan(item_ids)
if any(nans):
logger.warning(
"Items found in test not seen during training, new items will have score of 0"
)
test_scores = np.append(test_scores, np.zeros((self.n_users, 1)), axis=1)
item_ids[nans] = self.n_items
item_ids = item_ids.astype("int64")
df = pd.DataFrame(
***
self.col_user: test[self.col_user].values,
self.col_item: test[self.col_item].values,
> self.col_prediction: test_scores[user_ids, item_ids].getA1(),
***
)
E AttributeError: 'numpy.ndarray' object has no attribute 'getA1'
recommenders/models/sar/sar_singlenode.py:596: AttributeError
Signed-off-by: miguelgfierro <[email protected]>
New error:
|
Trying with @anargyri and @SimonYansenZhao to roll back and change the array casting
Small example:
|
Signed-off-by: Simon Zhao <[email protected]>
Signed-off-by: Simon Zhao <[email protected]>
@miguelgfierro @anargyri I think numpy or scipy has different behaviors with Python 3.8 and other Python versions, because the same tests in And I found that if testing with Python 3.8, the latest supported scipy version is 1.10 which worked with |
Signed-off-by: Simon Zhao <[email protected]>
@SimonYansenZhao the world is falling if one can't trust numpy or scipy anymore. |
Signed-off-by: Simon Zhao <[email protected]>
Signed-off-by: Simon Zhao <[email protected]>
Awesome! @SimonYansenZhao @anargyri @loomlike can you guys accept the PR? since I started it, I can't accept it |
Description
Related Issues
#1954
References
Checklist:
git commit -s -m "your commit message"
.staging branch
AND NOT TOmain branch
.