Skip to content

Conversation

@amanomer
Copy link
Contributor

@amanomer amanomer commented Nov 9, 2019

What changes were proposed in this pull request?

Use persist on RDD which is used for more than one action.

Why are the changes needed?

The rdd scoreAndLabels.combineByKey is used by two actions: sortByKey and count, so it needs to be persisted.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested manually.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@amanomer amanomer changed the title [SPARK-29816] Missing persist in mllib.evaluation.BinaryClassificationMetrics.recallByThreshold() [SPARK-29816][MLLIB] Missing persist in mllib.evaluation.BinaryClassificationMetrics.recallByThreshold() Nov 9, 2019
@amanomer
Copy link
Contributor Author

amanomer commented Nov 9, 2019

@MaxGekk Kindly review this PR.

@srowen
Copy link
Member

srowen commented Nov 9, 2019

There's no reason this has to be like 5 JIRAs and PRs. Just combine them into one, please.

@amanomer
Copy link
Contributor Author

amanomer commented Nov 9, 2019

Okay.

mergeCombiners = (c1: BinaryLabelCounter, c2: BinaryLabelCounter) => c1 += c2
).sortByKey(ascending = false)
)
binnedWeights.persist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why do you persist binnedWeights but not counts? As I can see binnedWeights is used only once ... or I missed something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rdd scoreAndLabels.combineByKey is found to be used more than one time, as mentioned in SPARK-29816.
I think, counts rdd is constructed over scoreAndLabels.combineByKey after action sortByKey. counts is indirectly dependent on scoreAndLabels.combineByKey i.e, binnedWeights?

@srowen
Copy link
Member

srowen commented Nov 10, 2019

Let's please merge this to #26454 after discussing some initial principles there.

@srowen srowen closed this Nov 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants