Skip to content

Commit be6e931

Browse files
committed
addressed comments
1 parent eaed879 commit be6e931

File tree

2 files changed

+34
-22
lines changed

2 files changed

+34
-22
lines changed

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,9 @@ class ALS extends Estimator[ALSModel] with ALSParams {
278278
/** @group setParam */
279279
def setCheckpointInterval(value: Int): this.type = set(checkpointInterval, value)
280280

281+
/** @group setParam */
282+
def setSeed(value: Long): this.type = set(seed, value)
283+
281284
/**
282285
* Sets both numUserBlocks and numItemBlocks to the specific value.
283286
* @group setParam

python/pyspark/ml/recommendation.py

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -29,30 +29,39 @@ class ALS(JavaEstimator, HasCheckpointInterval, HasMaxIter, HasPredictionCol, Ha
2929
"""
3030
Alternating Least Squares (ALS) matrix factorization.
3131
32-
ALS attempts to estimate the ratings matrix `R` as the product of two lower-rank matrices,
33-
`X` and `Y`, i.e. `X * Yt = R`. Typically these approximations are called 'factor' matrices.
34-
The general approach is iterative. During each iteration, one of the factor matrices is held
35-
constant, while the other is solved for using least squares. The newly-solved factor matrix is
36-
then held constant while solving for the other factor matrix.
37-
38-
This is a blocked implementation of the ALS factorization algorithm that groups the two sets
39-
of factors (referred to as "users" and "products") into blocks and reduces communication by only
40-
sending one copy of each user vector to each product block on each iteration, and only for the
41-
product blocks that need that user's feature vector. This is achieved by pre-computing some
42-
information about the ratings matrix to determine the "out-links" of each user (which blocks of
43-
products it will contribute to) and "in-link" information for each product (which of the feature
44-
vectors it receives from each user block it will depend on). This allows us to send only an
45-
array of feature vectors between each user block and product block, and have the product block
46-
find the users' ratings and update the products based on these messages.
32+
ALS attempts to estimate the ratings matrix `R` as the product of
33+
two lower-rank matrices, `X` and `Y`, i.e. `X * Yt = R`. Typically
34+
these approximations are called 'factor' matrices. The general
35+
approach is iterative. During each iteration, one of the factor
36+
matrices is held constant, while the other is solved for using least
37+
squares. The newly-solved factor matrix is then held constant while
38+
solving for the other factor matrix.
39+
40+
This is a blocked implementation of the ALS factorization algorithm
41+
that groups the two sets of factors (referred to as "users" and
42+
"products") into blocks and reduces communication by only sending
43+
one copy of each user vector to each product block on each
44+
iteration, and only for the product blocks that need that user's
45+
feature vector. This is achieved by pre-computing some information
46+
about the ratings matrix to determine the "out-links" of each user
47+
(which blocks of products it will contribute to) and "in-link"
48+
information for each product (which of the feature vectors it
49+
receives from each user block it will depend on). This allows us to
50+
send only an array of feature vectors between each user block and
51+
product block, and have the product block find the users' ratings
52+
and update the products based on these messages.
4753
4854
For implicit preference data, the algorithm used is based on
49-
"Collaborative Filtering for Implicit Feedback Datasets", available at
50-
`http://dx.doi.org/10.1109/ICDM.2008.22`, adapted for the blocked approach used here.
51-
52-
Essentially instead of finding the low-rank approximations to the rating matrix `R`,
53-
this finds the approximations for a preference matrix `P` where the elements of `P` are 1 if
54-
r > 0 and 0 if r <= 0. The ratings then act as 'confidence' values related to strength of
55-
indicated user preferences rather than explicit ratings given to items.
55+
"Collaborative Filtering for Implicit Feedback Datasets", available
56+
at `http://dx.doi.org/10.1109/ICDM.2008.22`, adapted for the blocked
57+
approach used here.
58+
59+
Essentially instead of finding the low-rank approximations to the
60+
rating matrix `R`, this finds the approximations for a preference
61+
matrix `P` where the elements of `P` are 1 if r > 0 and 0 if r <= 0.
62+
The ratings then act as 'confidence' values related to strength of
63+
indicated user preferences rather than explicit ratings given to
64+
items.
5665
5766
>>> als = ALS(rank=10, maxIter=5)
5867
>>> model = als.fit(df)

0 commit comments

Comments
 (0)