Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

use while-loop instead of the recursive way

Why are the changes needed?

3% ~ 10% faster

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

@zhengruifeng
Copy link
Contributor Author

test:

import org.apache.spark.ml.linalg._
import org.apache.spark.ml.classification._
import org.apache.spark.storage.StorageLevel


val df = spark.read.option("numFeatures", "2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("label", (col("label")+1)/2)
df.persist(StorageLevel.MEMORY_AND_DISK)
df.count



val rf = new RandomForestClassifier().setMaxDepth(10).setNumTrees(100)
val model = rf.fit(df)
model.save("/tmp/rf-model")


val rf2 = new RandomForestClassifier().setMaxDepth(20).setNumTrees(100)
val model2 = rf2.fit(df)
model2.save("/tmp/rf-model-d20")



val model = RandomForestClassificationModel.load("/tmp/rf-model")
val model2 = RandomForestClassificationModel.load("/tmp/rf-model-d20")

val vecs = df.select("features").rdd.map(row => row.getAs[Vector](0)).collect

val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => vecs.foreach(model.predict)}; val end = System.currentTimeMillis; end - start


val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => vecs.foreach(model2.predict)}; val end = System.currentTimeMillis; end - start

Results (durations):
this PR: 167640, 404901
Master: 187645, 416243

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125804 has finished for PR 29095 at commit 50510dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

friendly ping @huaxingao @srowen @viirya

Different another attempt to save RAM, this should be a clear optimization. I found that those methods can not be marked @tailrec, so I use while-loop instead.

@viirya
Copy link
Member

viirya commented Jul 16, 2020

Is this also memory optimization? But looks like cpu time optimization from the description?

@zhengruifeng
Copy link
Contributor Author

@viirya This PR is only on cpu.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a clean optimization.

@huaxingao
Copy link
Contributor

LGTM

@srowen
Copy link
Member

srowen commented Jul 17, 2020

Merged to master

@srowen srowen closed this in 3a60b41 Jul 17, 2020
@zhengruifeng
Copy link
Contributor Author

Thanks all!

@zhengruifeng zhengruifeng deleted the tree_pred_opt branch July 18, 2020 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants