Skip to content

Conversation

@mridulm
Copy link
Contributor

@mridulm mridulm commented Dec 30, 2022

What changes were proposed in this pull request?

Incorrect merge id is removed from the DB when a newer shuffle merge id is received.

Why are the changes needed?

Bug fix

Does this PR introduce any user-facing change?

No, fixes a corner case bug

How was this patch tested?

Unit test updated

@github-actions github-actions bot added the CORE label Dec 30, 2022
@mridulm
Copy link
Contributor Author

mridulm commented Dec 30, 2022

+CC @zhouyejoe, @otterc, @Ngone51

@mridulm
Copy link
Contributor Author

mridulm commented Dec 30, 2022

+CC @wankunde, this fixes the bug I had described earlier.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the JIRA, is this for both Apache Spark 3.3.2 and 3.4.0, @mridulm ?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. (BTW, nit, the Java-style 8-space indentation for continued lines looks a little strange to me. Since we used 2-space indentation even in Java, I thought we used 4-space for continued lines)

@dongjoon-hyun
Copy link
Member

Oh, could you take a look at the failures? It looks like relevant.

[info] *** 2 TESTS FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.network.yarn.YarnShuffleServiceWithRocksDBBackendSuite
[error] 	org.apache.spark.network.yarn.YarnShuffleServiceWithLevelDBBackendSuite

@wankunde
Copy link
Contributor

+1, LGTM

Copy link
Contributor

@zhouyejoe zhouyejoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

closeAndDeleteOutdatedPartitions(
appAttemptShuffleMergeId, mergePartitionsInfo.shuffleMergePartitions));
closeAndDeleteOutdatedPartitions(currrentAppAttemptShuffleMergeId,
mergePartitionsInfo.shuffleMergePartitions));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 4 spaces for continued lines.

@yabola
Copy link
Contributor

yabola commented Jan 1, 2023

@mridulm
I agree with you. The failed UT is as expected and we should modify the UT.
Actually my previous PR also remove the current mergeId and modify the UT https://github.com/apache/spark/pull/38560/files#r1031350719

In my previous PR, I tried unify this deletion logic: delete mergePartitionsInfo.shuffleMergePartitions at the same time deleting the DB. What more, I also want to unify the deleting the finalized partition information ( if higher mergeId come and merge partitions was finalized, we should clean finalized partitions, please see here https://github.com/apache/spark/pull/38560/files#diff-d544fbb952b61283b3d18ca10a5027528efc4f2f65047130da015ae7754c117fR502)

@mridulm
Copy link
Contributor Author

mridulm commented Jan 1, 2023

Let me take a look, thanks @dongjoon-hyun.
Did not notice this failing in my local box ... weird. I switched branches after the specific test I modified was done and restarted build :-(

@mridulm
Copy link
Contributor Author

mridulm commented Jan 1, 2023

Agree with @yabola, the test is broken: I will fix the test to the expected output.

@github-actions github-actions bot added the YARN label Jan 1, 2023
@mridulm
Copy link
Contributor Author

mridulm commented Jan 1, 2023

@dongjoon-hyun:

According to the JIRA, is this for both Apache Spark 3.3.2 and 3.4.0, @mridulm ?

This is only for master, I have updated the jira ...

@dongjoon-hyun
Copy link
Member

Got it. Thank you, @mridulm

@mridulm
Copy link
Contributor Author

mridulm commented Jan 1, 2023

Merging to master

@mridulm mridulm closed this in 5d6b69f Jan 1, 2023
@mridulm
Copy link
Contributor Author

mridulm commented Jan 1, 2023

Thanks for the review @dongjoon-hyun, @wankunde, @zhouyejoe, @yabola :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants