-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-50808][CORE] Fix issue in writeAll with mixed types not getting written properly #49479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50808][CORE] Fix issue in writeAll with mixed types not getting written properly #49479
Conversation
|
+CC @HeartSaVioR (who reviewed #29149), thanks ! |
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java
Outdated
Show resolved
Hide resolved
common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java
Outdated
Show resolved
Hide resolved
| fullList.addAll(type2List); | ||
|
|
||
| db.writeAll(fullList); | ||
| for (CustomType1 value : type1List) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand how this test fails. (Sorry I lost the whole context.)
Because if it's just that writing for the same entity happens multiple times, it depends on how we constuct the key (since LevelDB/RocksDB put is upsert). If the same entity creates the same key, I don't see how this could fail. Maybe even further, even if multiple writes come up with different keys, if we can read any of them, the result should be the same. Seems like I'm unaware of how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, I purposely ensure both CustomType1 and CustomType2 are written together in a call to writeAll - and subsequently try to read them - which ends up with failing when deserializing CustomType2 (as CustomType1 was written to those records).
Without the fix, it results in deserialization error during read.
For LevelDBSuite, we see this for example - since it is writing CustomType1 instead of CustomType2 as value:
[ERROR] org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteAll -- Time elapsed: 0.024 s <<< ERROR!
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
Unrecognized field "name" (class org.apache.spark.util.kvstore.CustomType2), not marked as ignorable (3 known properties: "id", "parentId", "key"])
at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 34] (through reference chain: org.apache.spark.util.kvstore.CustomType2["name"])
at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1153)
at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2241)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1821)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1799)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:316)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:342)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4917)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3897)
at org.apache.spark.util.kvstore.KVStoreSerializer.deserialize(KVStoreSerializer.java:70)
at org.apache.spark.util.kvstore.LevelDB.get(LevelDB.java:136)
at org.apache.spark.util.kvstore.LevelDB.read(LevelDB.java:148)
at org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteAll(LevelDBSuite.java:452)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. I got the point. They are written with "wrong type information". Got it.
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 The change is obvious and the test performs the validation clearly. Thanks!
…g written properly ### What changes were proposed in this pull request? Fix a bug with LevelDB/RocksDB's batched write method (`writeAll`) not using the correct list to serialize values. Luckily, existing use of this api is for the same class - which avoids this bug in practice. This PR fixes the issue to ensure the api contract works as expected, and avoids issues in future. ### Why are the changes needed? Fix existing bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New test introduced. Test fails without proposed changes. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49479 from mridulm/mridulm/fix-kvstore-WriteAll-multiple-types. Authored-by: Mridul Muralidharan <mridulatgmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit 98d9968) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
|
Thanks @HeartSaVioR ! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @mridulm and @HeartSaVioR !
…g written properly ### What changes were proposed in this pull request? Fix a bug with LevelDB/RocksDB's batched write method (`writeAll`) not using the correct list to serialize values. Luckily, existing use of this api is for the same class - which avoids this bug in practice. This PR fixes the issue to ensure the api contract works as expected, and avoids issues in future. ### Why are the changes needed? Fix existing bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New test introduced. Test fails without proposed changes. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49479 from mridulm/mridulm/fix-kvstore-WriteAll-multiple-types. Authored-by: Mridul Muralidharan <mridulatgmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit 1801b6f) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
What changes were proposed in this pull request?
Fix a bug with LevelDB/RocksDB's batched write method (
writeAll) not using the correct list to serialize values.Luckily, existing use of this api is for the same class - which avoids this bug in practice.
This PR fixes the issue to ensure the api contract works as expected, and avoids issues in future.
Why are the changes needed?
Fix existing bug.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New test introduced. Test fails without proposed changes.
Was this patch authored or co-authored using generative AI tooling?
No