-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31925][ML] Summary.totalIterations greater than maxIters #28786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123774 has finished for PR 28786 at commit
|
| } | ||
| if (instances.getStorageLevel != StorageLevel.NONE) instances.unpersist() | ||
| return createModel(dataset, numClasses, coefMatrix, interceptVec, Array.empty) | ||
| return createModel(dataset, numClasses, coefMatrix, interceptVec, Array(0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When training is not needed, LinearRegression set objectiveHistory to Array(0.0). https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L511
I think LogisticRegression should have the same bahavior.
| } | ||
| } | ||
| assert(model2.summary.totalIterations === 1) | ||
| assert(model2.summary.totalIterations === 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InitialModel is set in this case. init state is good so no training is needed. I think totalIterations should be 0 instead of 1.
| } | ||
| } | ||
| assert(model4.summary.totalIterations === 1) | ||
| assert(model4.summary.totalIterations === 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as L2576. I think totalIterations should be 0 instead of 1 since no training.
| assert(allZeroInterceptModel.coefficients ~== Vectors.dense(0.0) absTol 1E-3) | ||
| assert(allZeroInterceptModel.intercept === Double.NegativeInfinity) | ||
| assert(allZeroInterceptModel.summary.totalIterations === 0) | ||
| assert(allZeroInterceptModel.summary.objectiveHistory(0) ~== 0.0 absTol 1e-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I change the objectiveHistory from Array.empty to Array(0.0) for no training case, summary.objectiveHistory(0) should be 0.0 here.
| assert(pred === 0.0) | ||
| } | ||
| assert(modelZeroLabel.summary.totalIterations > 0) | ||
| assert(modelZeroLabel.summary.totalIterations === 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No training here so I think the totalIterations should be 0 instead of 1
| Seq("auto", "normal").foreach { solver => | ||
| val trainer = new LinearRegression().setSolver(solver) | ||
| val model = trainer.fit(datasetWithDenseFeature) | ||
| assert(model.summary.totalIterations === 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before my change, summary.totalIterations is 1 for "normal". I think it should be 0 since Normal Equation is not an iterative method. totalIterations should be 0 too for "auto", since "auto" uses Normal Equation in this test (a small dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think summary.totalIterations is 1 is reasonable, since it needs one pass on the dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no iterative optimizer was run, I think 0 makes sense?
| s = model.summary | ||
| # test that api is callable and returns expected types | ||
| self.assertGreater(s.totalIterations, 0) | ||
| self.assertEqual(s.totalIterations, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solver="normal" in this case, so I think totalIterations should be 0.
|
Test build #123780 has finished for PR 28786 at commit
|
|
Test build #123784 has finished for PR 28786 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if the objective history has one additional initial state element, as it does now, but I agree it feels funny to return n+1 as total iterations when n iterations were requested.
Looks fine, the only other thing I can think of is maybe documenting anywhere the history is returned that it will contain one more element, the initial state, than number of iterations
|
Test build #123866 has finished for PR 28786 at commit
|
|
I just check this in ml.clustering, |
|
@zhengruifeng if you don't strongly object to #28786 (comment) I think this one can be merged. Just needs a rebase at your convenience @huaxingao |
|
@srowen I do't feel strongly about it. |
|
Test build #124018 has finished for PR 28786 at commit
|
|
retest this please |
|
Test build #124025 has finished for PR 28786 at commit
|
|
Merged to master. It could go in 3.0.1 too; I dont' feel strongly about it. |
|
Thanks! @srowen @zhengruifeng |
What changes were proposed in this pull request?
In LogisticRegression and LinearRegression, if set maxIter=n, the model.summary.totalIterations returns n+1 if the training procedure does not drop out. This is because we use
objectiveHistory.lengthas totalIterations, butobjectiveHistorycontains init sate, thusobjectiveHistory.lengthis 1 larger than number of training iterations.Why are the changes needed?
correctness
Does this PR introduce any user-facing change?
No
How was this patch tested?
add new tests and also modify existing tests