-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister #46706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Exposed for testing. | ||
| */ | ||
| private[this] val mapIdToMapIndex = new OpenHashMap[Long, Int]() | ||
| private[spark] val mapIdToMapIndex = new HashMap[Long, Int]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Why change to HashMap from OpenHashMap ? (it is specialized for Long and Int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the above question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenHashMap doesn't support remove operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes !
| _numAvailableMapOutputs -= 1 | ||
| mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex) | ||
| val currentMapStatus = mapStatuses(mapIndex) | ||
| mapIdToMapIndex.remove(currentMapStatus.mapId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing it here will mean we cant query for it in mapStatusesDeleted, where we are relying on mapId -> mapIndex being in mapIdToMapIndex even when mapIndex is in mapStatusesDeleted
We should move this cleanup to when mapStatusesDeleted is being cleaned up.
Same applies to the cases below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. But IIUC, mapStatusesDeleted will only be cleanedup when there is recovery happen using K8s. So it's not guaranteed to be always cleaned up in the end. I removed the dependency of mapIdToMapIndex for mapStatusesDeleted as it's not a common use case.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, @Ngone51 , this should have a new JIRA ID because the original is Apache Spark 3.5.0. This PR cannot be a follow-up of the released JIRA issue.
|
@dongjoon-hyun Thanks for the reminder. Have created a separate ticket: SPARK-48394. |
mridulm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for fixing this @Ngone51 !
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (with one minor test case prefix comment).
| } | ||
| } | ||
|
|
||
| test("mapIdToMapIndex should cleanup unused mapIndexes after removeOutputsByFilter") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Jira ID test prefix for this bug fix.
core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
Outdated
Show resolved
Hide resolved
|
@dongjoon-hyun @mridulm Sorry, can we make it a bug and backport to maintenance release branches? This actually causes us an issue internally. I was pushing a quick fix before realizing it is the root cause. The issue leads to shuffle fetch failure and the job failure in the end. It happens this way:
// updateMapOutput
val mapIndex = mapIdToMapIndex.get(mapId)
val mapStatusOpt = mapIndex.map(mapStatuses(_)).flatMap(Option(_))
|
|
Looks like a valid bug to me - can you raise a backport please ? |
|
Thanks. Created the backport PR (#46747) for branch-3.5. |
This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places: * `removeMapOutput` * `removeOutputsByFilter` * `addMapOutput` (old mapstatus overwritten) There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos. No. Unit tests. No. Closes apache#46706 from Ngone51/SPARK-43043-followup. Lead-authored-by: Yi Wu <[email protected]> Co-authored-by: wuyi <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places: * `removeMapOutput` * `removeOutputsByFilter` * `addMapOutput` (old mapstatus overwritten) There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos. No. Unit tests. No. Closes apache#46706 from Ngone51/SPARK-43043-followup. Lead-authored-by: Yi Wu <[email protected]> Co-authored-by: wuyi <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
This PR backports #46706 to branch 3.5. ### What changes were proposed in this pull request? This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places: * `removeMapOutput` * `removeOutputsByFilter` * `addMapOutput` (old mapstatus overwritten) ### Why are the changes needed? There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46768 from Ngone51/SPARK-48394-3.5. Authored-by: Yi Wu <[email protected]> Signed-off-by: Kent Yao <[email protected]>
This PR backports apache#46706 to branch 3.5. ### What changes were proposed in this pull request? This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places: * `removeMapOutput` * `removeOutputsByFilter` * `addMapOutput` (old mapstatus overwritten) ### Why are the changes needed? There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46768 from Ngone51/SPARK-48394-3.5. Authored-by: Yi Wu <[email protected]> Signed-off-by: Kent Yao <[email protected]>

What changes were proposed in this pull request?
This PR cleans up
mapIdToMapIndexwhen the corresponding mapstatus is unregistered in three places:removeMapOutputremoveOutputsByFilteraddMapOutput(old mapstatus overwritten)Why are the changes needed?
There is only one valid mapstatus for the same
mapIndexat the same time in Spark.mapIdToMapIndexshould also follows the same rule to avoid chaos.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.