-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43404][SS] Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error #41089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… store to avoid id mismatch error
|
@HeartSaVioR - please take a look. Thx |
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding! Only a minor comment.
| // reused for this version | ||
| logInfo(s"Saving RocksDB files to DFS for $version") | ||
| val prevFilesToSizes = versionToRocksDBFiles.values.asScala.flatten.map { f => | ||
| val prevFilesToSizes = versionToRocksDBFiles.asScala.filterKeys(_ != version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ < version to fit with above code comment.
Btw, my preference is actually invalidating entries of cache for the version if we figure out the version will be reprocessed. (Say, invalidating all entries of cache for versions where version in entry > parameter in RocksDB.load().) But I agree this is broader fix which may not be necessary, since listing up cache entries are only performed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Yea, I thought about this too. But listing happens only here and we prob don't expect for this to happen too often. So went with the point fix for now.
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding! Only a minor comment.
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Thanks! Merging to master. |
…tate store to avoid id mismatch error
### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error
### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.
```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```
This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
Closes apache#41089 from anishshri-db/task/SPARK-43404.
Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
|
Should this be backported to 3.4? We just encountered this as well and it doesn't seem like there's anyway to recover from it beyond deleting or hacking your checkpoint |
|
@Kimahriman Why not? I missed that this is a non-trivial bugfix which is expected to port back to affected version(s). Thanks for the reminder. @anishshri-db Shall we have a cherry-pick PR against branch-3.4? Thanks in advance! |
|
The error happens on the read during the batch after a checkpoint partition gets "corrupted" (where the ID in the metadata and the SST file don't match). So once you encounter this error as far as I know your checkpoint is corrupted beyond repair. Can only try to re-run the previous version by deleting the |
|
I'll take care of it. If the new PR ends up with pure cherry-pick I can probably self approve and go. |
…tate store to avoid id mismatch error
### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error
### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.
```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```
This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
Closes apache#41089 from anishshri-db/task/SPARK-43404.
Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
|
PR for 3.4: #41530 |
…sDB state store to avoid id mismatch error NOTE: This ports back the commit d3b9f4e (PR #41089) to branch-3.4. This is a clean cherry-pick. ### What changes were proposed in this pull request? Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error ### Why are the changes needed? In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error. ``` Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212 ``` This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test RocksDBSuite ``` [info] Run completed in 35 seconds, 995 milliseconds. [info] Total number of tests run: 33 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` Closes #41530 from HeartSaVioR/SPARK-43404-3.4. Authored-by: Anish Shrigondekar <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
…sDB state store to avoid id mismatch error NOTE: This ports back the commit apache@d3b9f4e (PR apache#41089) to branch-3.4. This is a clean cherry-pick. ### What changes were proposed in this pull request? Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error ### Why are the changes needed? In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error. ``` Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212 ``` This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test RocksDBSuite ``` [info] Run completed in 35 seconds, 995 milliseconds. [info] Total number of tests run: 33 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` Closes apache#41530 from HeartSaVioR/SPARK-43404-3.4. Authored-by: Anish Shrigondekar <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
…sDB state store to avoid id mismatch error NOTE: This ports back the commit apache@d3b9f4e (PR apache#41089) to branch-3.4. This is a clean cherry-pick. ### What changes were proposed in this pull request? Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error ### Why are the changes needed? In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error. ``` Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212 ``` This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test RocksDBSuite ``` [info] Run completed in 35 seconds, 995 milliseconds. [info] Total number of tests run: 33 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` Closes apache#41530 from HeartSaVioR/SPARK-43404-3.4. Authored-by: Anish Shrigondekar <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
…sDB state store to avoid id mismatch error NOTE: This ports back the commit apache@d3b9f4e (PR apache#41089) to branch-3.4. This is a clean cherry-pick. ### What changes were proposed in this pull request? Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error ### Why are the changes needed? In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error. ``` Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212 ``` This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test RocksDBSuite ``` [info] Run completed in 35 seconds, 995 milliseconds. [info] Total number of tests run: 33 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` Closes apache#41530 from HeartSaVioR/SPARK-43404-3.4. Authored-by: Anish Shrigondekar <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error
Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for
versionToRocksDBFilesfor the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test
RocksDBSuite