Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ class BisectingKMeansModel private[clustering] (

@Since("2.0.0")
override def save(sc: SparkContext, path: String): Unit = {
BisectingKMeansModel.SaveLoadV1_0.save(sc, this, path)
BisectingKMeansModel.SaveLoadV2_0.save(sc, this, path)
}

override protected def formatVersion: String = "1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this formatVersion needed to change too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes, but actually this is never used. I think we can actually remove this from the trait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch! need change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the formatVersion to 2.0. There are quite a few files that implement trait Saveable and have formatVersion. I don't feel comfortable to change other files for this PR. Maybe I will open a separate jira to remove formatVersion from trait Saveable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on that, I already have a patch for removing it (moreover this PR can target 2.4, while removal should be done only on master I think). I am submitting it. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #22830 for that, thanks.

Expand All @@ -126,7 +126,7 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] {
val model = SaveLoadV1_0.load(sc, path)
model
case (SaveLoadV2_0.thisClassName, SaveLoadV2_0.thisFormatVersion) =>
val model = SaveLoadV1_0.load(sc, path)
val model = SaveLoadV2_0.load(sc, path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nice catch!

Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a regression, but it looks like a correctness or data loss issue at Spark 2.4.0 new feature.
Do you know why https://issues.apache.org/jira/browse/SPARK-25793 is created as a Minor and Documentation issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no test to verify calling correct load method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the write/load model tests in order to include also a different distance measure from the default one. In this way we should catch this error. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have ever use SaveLoadV2_0 to save model for now? Looks BisectingKMeansModel.save simply calls SaveLoadV1_0.save to save models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya @mgaido91
I will change the BisectingKMeansModel.save to

BisectingKMeansModel.SaveLoadV2_0.save(sc, this, path)

model
case _ => throw new Exception(
s"BisectingKMeansModel.load did not recognize model with (className, format version):" +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -187,11 +187,12 @@ class BisectingKMeansSuite extends SparkFunSuite with MLlibTestSparkContext {

val points = (1 until 8).map(i => Vectors.dense(i))
val data = sc.parallelize(points, 2)
val model = new BisectingKMeans().run(data)
val model = new BisectingKMeans().setDistanceMeasure(DistanceMeasure.COSINE).run(data)
try {
model.save(sc, path)
val sameModel = BisectingKMeansModel.load(sc, path)
assert(model.k === sameModel.k)
assert(model.distanceMeasure === sameModel.distanceMeasure)
model.clusterCenters.zip(sameModel.clusterCenters).foreach(c => c._1 === c._2)
} finally {
Utils.deleteRecursively(tempDir)
Expand Down