@@ -29,30 +29,39 @@ class ALS(JavaEstimator, HasCheckpointInterval, HasMaxIter, HasPredictionCol, Ha
2929 """
3030 Alternating Least Squares (ALS) matrix factorization.
3131
32- ALS attempts to estimate the ratings matrix `R` as the product of two lower-rank matrices,
33- `X` and `Y`, i.e. `X * Yt = R`. Typically these approximations are called 'factor' matrices.
34- The general approach is iterative. During each iteration, one of the factor matrices is held
35- constant, while the other is solved for using least squares. The newly-solved factor matrix is
36- then held constant while solving for the other factor matrix.
37-
38- This is a blocked implementation of the ALS factorization algorithm that groups the two sets
39- of factors (referred to as "users" and "products") into blocks and reduces communication by only
40- sending one copy of each user vector to each product block on each iteration, and only for the
41- product blocks that need that user's feature vector. This is achieved by pre-computing some
42- information about the ratings matrix to determine the "out-links" of each user (which blocks of
43- products it will contribute to) and "in-link" information for each product (which of the feature
44- vectors it receives from each user block it will depend on). This allows us to send only an
45- array of feature vectors between each user block and product block, and have the product block
46- find the users' ratings and update the products based on these messages.
32+ ALS attempts to estimate the ratings matrix `R` as the product of
33+ two lower-rank matrices, `X` and `Y`, i.e. `X * Yt = R`. Typically
34+ these approximations are called 'factor' matrices. The general
35+ approach is iterative. During each iteration, one of the factor
36+ matrices is held constant, while the other is solved for using least
37+ squares. The newly-solved factor matrix is then held constant while
38+ solving for the other factor matrix.
39+
40+ This is a blocked implementation of the ALS factorization algorithm
41+ that groups the two sets of factors (referred to as "users" and
42+ "products") into blocks and reduces communication by only sending
43+ one copy of each user vector to each product block on each
44+ iteration, and only for the product blocks that need that user's
45+ feature vector. This is achieved by pre-computing some information
46+ about the ratings matrix to determine the "out-links" of each user
47+ (which blocks of products it will contribute to) and "in-link"
48+ information for each product (which of the feature vectors it
49+ receives from each user block it will depend on). This allows us to
50+ send only an array of feature vectors between each user block and
51+ product block, and have the product block find the users' ratings
52+ and update the products based on these messages.
4753
4854 For implicit preference data, the algorithm used is based on
49- "Collaborative Filtering for Implicit Feedback Datasets", available at
50- `http://dx.doi.org/10.1109/ICDM.2008.22`, adapted for the blocked approach used here.
51-
52- Essentially instead of finding the low-rank approximations to the rating matrix `R`,
53- this finds the approximations for a preference matrix `P` where the elements of `P` are 1 if
54- r > 0 and 0 if r <= 0. The ratings then act as 'confidence' values related to strength of
55- indicated user preferences rather than explicit ratings given to items.
55+ "Collaborative Filtering for Implicit Feedback Datasets", available
56+ at `http://dx.doi.org/10.1109/ICDM.2008.22`, adapted for the blocked
57+ approach used here.
58+
59+ Essentially instead of finding the low-rank approximations to the
60+ rating matrix `R`, this finds the approximations for a preference
61+ matrix `P` where the elements of `P` are 1 if r > 0 and 0 if r <= 0.
62+ The ratings then act as 'confidence' values related to strength of
63+ indicated user preferences rather than explicit ratings given to
64+ items.
5665
5766 >>> als = ALS(rank=10, maxIter=5)
5867 >>> model = als.fit(df)
0 commit comments