-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20029][ML] ML LinearRegression supports bound constrained optimization. #17360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #74876 has finished for PR 17360 at commit
|
|
I don't think this is the best approach. We're further confounding the algorithm API with parameters of the optimizer used to fit the algorithm. I strongly prefer to put more effort into getting this right via SPARK-17136. For what it's worth, I have an initial PR basically ready that provides an API that makes adding this functionality trivial. |
|
Test build #74979 has finished for PR 17360 at commit
|
|
@sethah I left some questions on SPARK-17136. I think the main question we should figure out is whether we still expose the optimizer params as the estimator params after SPARK-17136. I'm more prefer to keep these params in estimators, make the optimizer layer as an internal API, and users can implement their own optimizer like Spark SQL external data source support. Since I found this is more aligned with the original ML pipeline design which stores params outside a pipeline component. |
|
@yanboliang Thanks for your feedback! The design of the optimizer interface, or even whether it should be included at all, is definitely open for discussion and your suggestions are much appreciated. If SPARK-17136 proceeds as you suggest (internal optimization API that allows users to register optimizers) then it is possible that this PR does not conflict with that JIRA (though I don't know about the details of that, so even that I'm not sure of). However, that matter is far from settled. If we end up deciding to provide the external optimizer API as is currently suggested in that JIRA, then these two do conflict. If we add the ability to specify parameter bounds on the estimator, then add an optimizer API, we have added yet more optimizer parameters to the estimator that can conflict with parameters of the optimizer provided to the estimator. My point is that I think these are two competing approaches and we should settle on one over the other before we make API changes that cannot be undone. I'm open to potentially changing the design of SPARK-17136, but we need to decide on something first. |
|
Test build #97747 has finished for PR 17360 at commit
|
|
Test build #97754 has finished for PR 17360 at commit
|
|
Test build #97789 has finished for PR 17360 at commit
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
MLlib
LinearRegressionshould support bound constrained optimization. Users can add bound constraints to coefficients to make the solver produce solution in the specified range.Under the hood, we call breeze
L-BFGS-Bas the solver for bound constrained optimization. And we only support L2 regularization currently.Todo:
How was this patch tested?
Unit tests.