Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Mar 20, 2017

What changes were proposed in this pull request?

MLlib LinearRegression should support bound constrained optimization. Users can add bound constraints to coefficients to make the solver produce solution in the specified range.
Under the hood, we call breeze L-BFGS-B as the solver for bound constrained optimization. And we only support L2 regularization currently.

Todo:

  • Support set bound for intercept.

How was this patch tested?

Unit tests.

@SparkQA
Copy link

SparkQA commented Mar 20, 2017

Test build #74876 has finished for PR 17360 at commit aa7e768.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor

sethah commented Mar 20, 2017

I don't think this is the best approach. We're further confounding the algorithm API with parameters of the optimizer used to fit the algorithm.

I strongly prefer to put more effort into getting this right via SPARK-17136. For what it's worth, I have an initial PR basically ready that provides an API that makes adding this functionality trivial.

@yanboliang yanboliang changed the title [WIP][SPARK-20029][ML] ML LinearRegression supports bound constrained optimization. [SPARK-20029][ML] ML LinearRegression supports bound constrained optimization. Mar 21, 2017
@SparkQA
Copy link

SparkQA commented Mar 21, 2017

Test build #74979 has finished for PR 17360 at commit 5af16cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

yanboliang commented Mar 21, 2017

@sethah I left some questions on SPARK-17136. I think the main question we should figure out is whether we still expose the optimizer params as the estimator params after SPARK-17136. I'm more prefer to keep these params in estimators, make the optimizer layer as an internal API, and users can implement their own optimizer like Spark SQL external data source support. Since I found this is more aligned with the original ML pipeline design which stores params outside a pipeline component.
So I think this PR is not conflict with SPARK-17136 and can work parallel. I'm also open to hear your thoughts. Thanks!

@sethah
Copy link
Contributor

sethah commented Mar 21, 2017

@yanboliang Thanks for your feedback! The design of the optimizer interface, or even whether it should be included at all, is definitely open for discussion and your suggestions are much appreciated. If SPARK-17136 proceeds as you suggest (internal optimization API that allows users to register optimizers) then it is possible that this PR does not conflict with that JIRA (though I don't know about the details of that, so even that I'm not sure of). However, that matter is far from settled. If we end up deciding to provide the external optimizer API as is currently suggested in that JIRA, then these two do conflict. If we add the ability to specify parameter bounds on the estimator, then add an optimizer API, we have added yet more optimizer parameters to the estimator that can conflict with parameters of the optimizer provided to the estimator.

My point is that I think these are two competing approaches and we should settle on one over the other before we make API changes that cannot be undone. I'm open to potentially changing the design of SPARK-17136, but we need to decide on something first.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97747 has finished for PR 17360 at commit 5af16cb.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97754 has finished for PR 17360 at commit 5af16cb.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97789 has finished for PR 17360 at commit 5af16cb.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 17, 2020
@github-actions github-actions bot closed this Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants