-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20109][MLlib] Rewrote toBlockMatrix method on IndexedRowMatrix #17459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
0614ebc
Wrote alternative toBlockMatrix method for IndexedRowMatrix
johnc1231 c3b9b8a
Added to tests, cleaned up code for clarity
johnc1231 25f4989
Added toBlockMatrixDense with no arguments
johnc1231 6adb585
Addressed viirya's comments
johnc1231 06c2b3a
Fixed tests to call toBlockMatrixDense
johnc1231 12e78bf
Fixed style failings, added Since annotations
johnc1231 4582a7e
Decided on just one toBlockMatrix implementation
johnc1231 a38851c
Added tests for sparse matrices
johnc1231 3fe21cf
Addressed new comments
johnc1231 d692d30
Fixed sparse matrix tests
johnc1231 994b457
Fixed style nitpicks
johnc1231 a7a03dc
Addressed srowen comments
johnc1231 289dbdb
Fixed requirement, cleaned up tests with collect
johnc1231 f9c5506
Fixed nits, added dimension check to CoordinateMatrix.toBlockMatrix()
johnc1231 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ package org.apache.spark.mllib.linalg.distributed | |
| import breeze.linalg.{diag => brzDiag, DenseMatrix => BDM, DenseVector => BDV} | ||
|
|
||
| import org.apache.spark.SparkFunSuite | ||
| import org.apache.spark.mllib.linalg.{Matrices, Vectors} | ||
| import org.apache.spark.mllib.linalg._ | ||
| import org.apache.spark.mllib.util.MLlibTestSparkContext | ||
| import org.apache.spark.rdd.RDD | ||
|
|
||
|
|
@@ -87,19 +87,96 @@ class IndexedRowMatrixSuite extends SparkFunSuite with MLlibTestSparkContext { | |
| assert(coordMat.toBreeze() === idxRowMat.toBreeze()) | ||
| } | ||
|
|
||
| test("toBlockMatrix") { | ||
| val idxRowMat = new IndexedRowMatrix(indexedRows) | ||
| val blockMat = idxRowMat.toBlockMatrix(2, 2) | ||
| test("toBlockMatrix dense backing") { | ||
| val idxRowMatDense = new IndexedRowMatrix(indexedRows) | ||
|
|
||
| // Tests when n % colsPerBlock != 0 | ||
| val blockMat = idxRowMatDense.toBlockMatrix(2, 2) | ||
| assert(blockMat.numRows() === m) | ||
| assert(blockMat.numCols() === n) | ||
| assert(blockMat.toBreeze() === idxRowMat.toBreeze()) | ||
| assert(blockMat.toBreeze() === idxRowMatDense.toBreeze()) | ||
|
|
||
| // Tests when m % rowsPerBlock != 0 | ||
| val blockMat2 = idxRowMatDense.toBlockMatrix(3, 1) | ||
| assert(blockMat2.numRows() === m) | ||
| assert(blockMat2.numCols() === n) | ||
| assert(blockMat2.toBreeze() === idxRowMatDense.toBreeze()) | ||
|
|
||
| intercept[IllegalArgumentException] { | ||
| idxRowMat.toBlockMatrix(-1, 2) | ||
| idxRowMatDense.toBlockMatrix(-1, 2) | ||
| } | ||
| intercept[IllegalArgumentException] { | ||
| idxRowMat.toBlockMatrix(2, 0) | ||
| idxRowMatDense.toBlockMatrix(2, 0) | ||
| } | ||
|
|
||
| assert(blockMat.blocks.map { case (_, matrix: Matrix) => | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the style looks weird. Maybe: |
||
| matrix.isInstanceOf[DenseMatrix] | ||
| }.reduce(_ && _)) | ||
| assert(blockMat2.blocks.map { case (_, matrix: Matrix) => | ||
| matrix.isInstanceOf[DenseMatrix] | ||
| }.reduce(_ && _)) | ||
| } | ||
|
|
||
| test("toBlockMatrix sparse backing") { | ||
| val sparseData = Seq( | ||
| (15L, Vectors.sparse(12, Seq((0, 4.0)))) | ||
| ).map(x => IndexedRow(x._1, x._2)) | ||
|
|
||
| // Gonna make m and n larger here so the matrices can easily be completely sparse: | ||
| val m = 16 | ||
| val n = 12 | ||
|
|
||
| val idxRowMatSparse = new IndexedRowMatrix(sc.parallelize(sparseData)) | ||
|
|
||
| // Tests when n % colsPerBlock != 0 | ||
| val blockMat = idxRowMatSparse.toBlockMatrix(8, 8) | ||
| assert(blockMat.numRows() === m) | ||
| assert(blockMat.numCols() === n) | ||
| assert(blockMat.toBreeze() === idxRowMatSparse.toBreeze()) | ||
|
|
||
| // Tests when m % rowsPerBlock != 0 | ||
| val blockMat2 = idxRowMatSparse.toBlockMatrix(6, 6) | ||
| assert(blockMat2.numRows() === m) | ||
| assert(blockMat2.numCols() === n) | ||
| assert(blockMat2.toBreeze() === idxRowMatSparse.toBreeze()) | ||
|
|
||
| assert(blockMat.blocks.collect().forall{ case (_, matrix: Matrix) => | ||
| matrix.isInstanceOf[SparseMatrix] | ||
| }) | ||
| assert(blockMat2.blocks.collect().forall{ case (_, matrix: Matrix) => | ||
| matrix.isInstanceOf[SparseMatrix] | ||
| }) | ||
| } | ||
|
|
||
| test("toBlockMatrix mixed backing") { | ||
| val m = 24 | ||
| val n = 18 | ||
|
|
||
| val mixedData = Seq( | ||
| (0L, Vectors.dense((0 to 17).map(_.toDouble).toArray)), | ||
| (1L, Vectors.dense((0 to 17).map(_.toDouble).toArray)), | ||
| (23L, Vectors.sparse(18, Seq((0, 4.0))))) | ||
| .map(x => IndexedRow(x._1, x._2)) | ||
|
|
||
| val idxRowMatMixed = new IndexedRowMatrix( | ||
| sc.parallelize(mixedData)) | ||
|
|
||
| // Tests when n % colsPerBlock != 0 | ||
| val blockMat = idxRowMatMixed.toBlockMatrix(12, 12) | ||
| assert(blockMat.numRows() === m) | ||
| assert(blockMat.numCols() === n) | ||
| assert(blockMat.toBreeze() === idxRowMatMixed.toBreeze()) | ||
|
|
||
| // Tests when m % rowsPerBlock != 0 | ||
| val blockMat2 = idxRowMatMixed.toBlockMatrix(18, 6) | ||
| assert(blockMat2.numRows() === m) | ||
| assert(blockMat2.numCols() === n) | ||
| assert(blockMat2.toBreeze() === idxRowMatMixed.toBreeze()) | ||
|
|
||
| val blocks = blockMat.blocks.collect() | ||
|
|
||
| assert(blocks.forall { case((row, col), matrix) => | ||
| if (row == 0) matrix.isInstanceOf[DenseMatrix] else matrix.isInstanceOf[SparseMatrix]}) | ||
| } | ||
|
|
||
| test("multiply a local matrix") { | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an assumption here that the block index can't be larger than an Int, but it could, right? conceptually the index in an IndexedRow could be huge. Does blockRow need to stay a Long or am I overlooking why it won't happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is true that IndexedRowMatrix could have a Long number of rows, but BlockMatrix is backed by an RDD of ((Int, Int), Matrix), so we're limited by that. I can just add a check that computes whether it's possible to make a BlockMatrix from the given IndexedRowMatrix.