Skip to content

Conversation

@anishshri-db
Copy link
Contributor

What changes were proposed in this pull request?

Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

Why are the changes needed?

In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for versionToRocksDBFiles for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

RocksDBSuite

[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

@anishshri-db anishshri-db changed the title [SPARK-43404] Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error [SPARK-43404][SS] Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error May 8, 2023
@anishshri-db
Copy link
Contributor Author

@HeartSaVioR - please take a look. Thx

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! Only a minor comment.

// reused for this version
logInfo(s"Saving RocksDB files to DFS for $version")
val prevFilesToSizes = versionToRocksDBFiles.values.asScala.flatten.map { f =>
val prevFilesToSizes = versionToRocksDBFiles.asScala.filterKeys(_ != version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ < version to fit with above code comment.

Btw, my preference is actually invalidating entries of cache for the version if we figure out the version will be reprocessed. (Say, invalidating all entries of cache for versions where version in entry > parameter in RocksDB.load().) But I agree this is broader fix which may not be necessary, since listing up cache entries are only performed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yea, I thought about this too. But listing happens only here and we prob don't expect for this to happen too often. So went with the point fix for now.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! Only a minor comment.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
…tate store to avoid id mismatch error

### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes apache#41089 from anishshri-db/task/SPARK-43404.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
@Kimahriman
Copy link
Contributor

Should this be backported to 3.4? We just encountered this as well and it doesn't seem like there's anyway to recover from it beyond deleting or hacking your checkpoint

@HeartSaVioR
Copy link
Contributor

@Kimahriman Why not? I missed that this is a non-trivial bugfix which is expected to port back to affected version(s). Thanks for the reminder.

@anishshri-db Shall we have a cherry-pick PR against branch-3.4? Thanks in advance!

@Kimahriman
Copy link
Contributor

Kimahriman commented Jun 2, 2023

The error happens on the read during the batch after a checkpoint partition gets "corrupted" (where the ID in the metadata and the SST file don't match). So once you encounter this error as far as I know your checkpoint is corrupted beyond repair. Can only try to re-run the previous version by deleting the commit entry, but that would only work if your downstream is idempotent, which in our case writing to a Delta table downstream is not (without trying to rollback the table version which is an option), or if you are ok with potential duplicates

@HeartSaVioR
Copy link
Contributor

I'll take care of it. If the new PR ends up with pure cherry-pick I can probably self approve and go.

HeartSaVioR pushed a commit to HeartSaVioR/spark that referenced this pull request Jun 9, 2023
…tate store to avoid id mismatch error

### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes apache#41089 from anishshri-db/task/SPARK-43404.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
@HeartSaVioR
Copy link
Contributor

PR for 3.4: #41530

HeartSaVioR pushed a commit that referenced this pull request Jun 9, 2023
…sDB state store to avoid id mismatch error

NOTE: This ports back the commit d3b9f4e (PR #41089) to branch-3.4. This is a clean cherry-pick.

### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes #41530 from HeartSaVioR/SPARK-43404-3.4.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…sDB state store to avoid id mismatch error

NOTE: This ports back the commit apache@d3b9f4e (PR apache#41089) to branch-3.4. This is a clean cherry-pick.

### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes apache#41530 from HeartSaVioR/SPARK-43404-3.4.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
GladwinLee pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
…sDB state store to avoid id mismatch error

NOTE: This ports back the commit apache@d3b9f4e (PR apache#41089) to branch-3.4. This is a clean cherry-pick.

### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes apache#41530 from HeartSaVioR/SPARK-43404-3.4.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
…sDB state store to avoid id mismatch error

NOTE: This ports back the commit apache@d3b9f4e (PR apache#41089) to branch-3.4. This is a clean cherry-pick.

### What changes were proposed in this pull request?
Skip reusing sst file for same version of RocksDB state store to avoid id mismatch error

### Why are the changes needed?
In case of task retry on the same executor, its possible that the original task completed the phase of creating the SST files and uploading them to the object store. In this case, we also might have added an entry to the in-memory map for `versionToRocksDBFiles` for the given version. When the retry task creates the local checkpoint, its possible the file name and size is the same, but the metadata ID embedded within the file may be different. So, when we try to load this version on successful commit, the metadata zip file points to the old SST file which results in a RocksDB mismatch id error.

```
Mismatch in unique ID on table file 24220. Expected: {9692563551998415634,4655083329411385714} Actual: {9692563551998415639,10299185534092933087} in file /local_disk0/spark-f58a741d-576f-400c-9b56-53497745ac01/executor-18e08e59-20e8-4a00-bd7e-94ad4599150b/spark-5d980399-3425-4951-894a-808b943054ea/StateStoreId(opId=2147483648,partId=53,name=default)-d89e082e-4e33-4371-8efd-78d927ad3ba3/workingDir-9928750e-f648-4013-a300-ac96cb6ec139/MANIFEST-024212
```

This change avoids reusing files for the same version on the same host based on the map entries to reduce the chance of running into the error above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

RocksDBSuite
```
[info] Run completed in 35 seconds, 995 milliseconds.
[info] Total number of tests run: 33
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 33, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes apache#41530 from HeartSaVioR/SPARK-43404-3.4.

Authored-by: Anish Shrigondekar <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants