Skip to content

Conversation

@sethah
Copy link
Contributor

@sethah sethah commented Nov 15, 2016

What changes were proposed in this pull request?

This is a follow up to some of the discussion here. During LogisticRegression training, we store the coefficients combined with intercepts as a flat vector, but a more natural abstraction is a matrix. Here, we refactor the code to use matrix where possible, which makes the code more readable and greatly simplifies the indexing.

Note: We do not use a Breeze matrix for the cost function as was mentioned in the linked PR. This is because LBFGS/OWLQN require an implicit MutableInnerProductModule[DenseMatrix[Double], Double] which is not natively defined in Breeze. We would need to extend Breeze in Spark to define it ourselves. Also, we do not modify the regParamL1Fun because OWLQN in Breeze requires a MutableEnumeratedCoordinateField[(Int, Int), DenseVector[Double]] (since we still use a dense vector for coefficients). Here again we would have to extend Breeze inside Spark.

How was this patch tested?

This is internal code refactoring - the current unit tests passing show us that the change did not break anything. No added functionality in this patch.

@sethah
Copy link
Contributor Author

sethah commented Nov 15, 2016

cc @MLnick @dbtsai

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68679 has finished for PR 15893 at commit 28f67fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Nov 19, 2016

Only minor naming. LGTM. My interest can not access ssh to merge the code, will merge later tonight. Thanks.

allCoefficients)
val denseCoefficientMatrix = new DenseMatrix(numCoefficientSets, numFeatures,
new Array[Double](numCoefficientSets * numFeatures), isTransposed = true)
val interceptVec = if ($(fitIntercept) || !isMultinomial) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consistently use interceptVector?

@dbtsai
Copy link
Member

dbtsai commented Nov 20, 2016

LGTM. Since this doesn't have impact on performance, and make the codebase cleaner, I merged this PR into master and branch 2.1. Thanks.

asfgit pushed a commit that referenced this pull request Nov 20, 2016
…n LogisticRegression training

## What changes were proposed in this pull request?

This is a follow up to some of the discussion [here](#15593). During LogisticRegression training, we store the coefficients combined with intercepts as a flat vector, but a more natural abstraction is a matrix. Here, we refactor the code to use matrix where possible, which makes the code more readable and greatly simplifies the indexing.

Note: We do not use a Breeze matrix for the cost function as was mentioned in the linked PR. This is because LBFGS/OWLQN require an implicit `MutableInnerProductModule[DenseMatrix[Double], Double]` which is not natively defined in Breeze. We would need to extend Breeze in Spark to define it ourselves. Also, we do not modify the `regParamL1Fun` because OWLQN in Breeze requires a `MutableEnumeratedCoordinateField[(Int, Int), DenseVector[Double]]` (since we still use a dense vector for coefficients). Here again we would have to extend Breeze inside Spark.

## How was this patch tested?

This is internal code refactoring - the current unit tests passing show us that the change did not break anything. No added functionality in this patch.

Author: sethah <[email protected]>

Closes #15893 from sethah/logreg_refactor.

(cherry picked from commit 856e004)
Signed-off-by: DB Tsai <[email protected]>
@asfgit asfgit closed this in 856e004 Nov 20, 2016
@MLnick
Copy link
Contributor

MLnick commented Nov 21, 2016

👍

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…n LogisticRegression training

## What changes were proposed in this pull request?

This is a follow up to some of the discussion [here](apache#15593). During LogisticRegression training, we store the coefficients combined with intercepts as a flat vector, but a more natural abstraction is a matrix. Here, we refactor the code to use matrix where possible, which makes the code more readable and greatly simplifies the indexing.

Note: We do not use a Breeze matrix for the cost function as was mentioned in the linked PR. This is because LBFGS/OWLQN require an implicit `MutableInnerProductModule[DenseMatrix[Double], Double]` which is not natively defined in Breeze. We would need to extend Breeze in Spark to define it ourselves. Also, we do not modify the `regParamL1Fun` because OWLQN in Breeze requires a `MutableEnumeratedCoordinateField[(Int, Int), DenseVector[Double]]` (since we still use a dense vector for coefficients). Here again we would have to extend Breeze inside Spark.

## How was this patch tested?

This is internal code refactoring - the current unit tests passing show us that the change did not break anything. No added functionality in this patch.

Author: sethah <[email protected]>

Closes apache#15893 from sethah/logreg_refactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants