Skip to content

Conversation

@smurakozi
Copy link
Contributor

What changes were proposed in this pull request?

Converting clustering tests to also check code with structured streaming, using the ML testing infrastructure implemented in SPARK-22882.

How was this patch tested?

N/A

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder

private[clustering] object Encoders {
implicit val vectorEncoder = ExpressionEncoder[Vector]()
Copy link
Contributor Author

@smurakozi smurakozi Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better solution to provide an implicit Encoder[Vector] for testTransformer?
Is it ok here, or is there a better place for it?
e.g. org.apache.spark.mllib.util.MLlibTestSparkContext.testImplicits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking; you shouldn't need to do this. I'll comment on BisectingKMeansSuite.scala
about using testImplicits instead. You basically just need to import testImplicits._ and use Tuple1 for the type param for testTransformer.

@squito
Copy link
Contributor

squito commented Jan 19, 2018

Jenkins, add to whitelist

@SparkQA
Copy link

SparkQA commented Jan 19, 2018

Test build #86391 has finished for PR 20319 at commit b6e06e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BisectingKMeansSuite extends MLTest with DefaultReadWriteTest
  • class GaussianMixtureSuite extends MLTest with DefaultReadWriteTest

@smurakozi
Copy link
Contributor Author

@jkbradley could you check out this change, please?

@SparkQA
Copy link

SparkQA commented Jan 22, 2018

Test build #86479 has finished for PR 20319 at commit dc7e708.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@smurakozi Thanks for the PR! I have bandwidth to review this now. Do you have time to rebase this to fix the merge conflicts?

@WeichenXu123
Copy link
Contributor

@smurakozi Thanks for the PR! Could you resolve conflicts first? and then I will make a review. If you're busy I can also take over it.

@SparkQA
Copy link

SparkQA commented Apr 9, 2018

Test build #89063 has finished for PR 20319 at commit b2aa3c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@smurakozi
Copy link
Contributor Author

smurakozi commented Apr 9, 2018

@jkbradley, @WeichenXu123 thanks for checking it out. I've resolved the conflicts, build is green.

@jkbradley
Copy link
Member

Reviewing now!

Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review; thanks!

import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder

private[clustering] object Encoders {
implicit val vectorEncoder = ExpressionEncoder[Vector]()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking; you shouldn't need to do this. I'll comment on BisectingKMeansSuite.scala
about using testImplicits instead. You basically just need to import testImplicits._ and use Tuple1 for the type param for testTransformer.

extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest {
class BisectingKMeansSuite extends MLTest with DefaultReadWriteTest {

import Encoders._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import testImplicits._ instead

// Verify we hit the edge case
assert(numClusters < k && numClusters > 1)

testTransformerByGlobalCheckFunc[Vector](sparseDataset.toDF(), model, "prediction") { rows =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Tuple1[Vector] instead of Vector

val clusters = rows.map(_.getAs[Int](predictionColName)).toSet
assert(clusters.size === k)
assert(clusters === Set(0, 1, 2, 3, 4))
assert(model.computeCost(dataset) < 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks which do not use "rows" should go outside of testTransformerByGlobalCheckFunc

@jkbradley
Copy link
Member

@smurakozi Do you have time to update this? I did a full review, though it now has a small merge conflict. Thanks!

@jkbradley
Copy link
Member

I'm going to take this over to get this done, but @smurakozi you'll be the primary author. I'll link the PR here in a minute

@jkbradley
Copy link
Member

Done! Here it is: #21358

@smurakozi Could you please close this issue and help review the new PR if you have time? Thanks!

asfgit pushed a commit that referenced this pull request May 17, 2018
## What changes were proposed in this pull request?

Converting clustering tests to also check code with structured streaming, using the ML testing infrastructure implemented in SPARK-22882.

This PR is a new version of #20319

Author: Sandor Murakozi <[email protected]>
Author: Joseph K. Bradley <[email protected]>

Closes #21358 from jkbradley/smurakozi-SPARK-22884.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@asfgit asfgit closed this in 1a4fda8 Jul 19, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#17422
Closes apache#17619
Closes apache#18034
Closes apache#18229
Closes apache#18268
Closes apache#17973
Closes apache#18125
Closes apache#18918
Closes apache#19274
Closes apache#19456
Closes apache#19510
Closes apache#19420
Closes apache#20090
Closes apache#20177
Closes apache#20304
Closes apache#20319
Closes apache#20543
Closes apache#20437
Closes apache#21261
Closes apache#21726
Closes apache#14653
Closes apache#13143
Closes apache#17894
Closes apache#19758
Closes apache#12951
Closes apache#17092
Closes apache#21240
Closes apache#16910
Closes apache#12904
Closes apache#21731
Closes apache#21095

Added:
Closes apache#19233
Closes apache#20100
Closes apache#21453
Closes apache#21455
Closes apache#18477

Added:
Closes apache#21812
Closes apache#21787

Author: hyukjinkwon <[email protected]>

Closes apache#21781 from HyukjinKwon/closing-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants