Skip to content

Conversation

@mridulm
Copy link
Contributor

@mridulm mridulm commented Jan 14, 2025

What changes were proposed in this pull request?

Fix a bug with LevelDB/RocksDB's batched write method (writeAll) not using the correct list to serialize values.
Luckily, existing use of this api is for the same class - which avoids this bug in practice.
This PR fixes the issue to ensure the api contract works as expected, and avoids issues in future.

Why are the changes needed?

Fix existing bug.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test introduced. Test fails without proposed changes.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Jan 14, 2025
@mridulm
Copy link
Contributor Author

mridulm commented Jan 14, 2025

+CC @HeartSaVioR (who reviewed #29149), thanks !

fullList.addAll(type2List);

db.writeAll(fullList);
for (CustomType1 value : type1List) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand how this test fails. (Sorry I lost the whole context.)

Because if it's just that writing for the same entity happens multiple times, it depends on how we constuct the key (since LevelDB/RocksDB put is upsert). If the same entity creates the same key, I don't see how this could fail. Maybe even further, even if multiple writes come up with different keys, if we can read any of them, the result should be the same. Seems like I'm unaware of how this works.

Copy link
Contributor Author

@mridulm mridulm Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, I purposely ensure both CustomType1 and CustomType2 are written together in a call to writeAll - and subsequently try to read them - which ends up with failing when deserializing CustomType2 (as CustomType1 was written to those records).

Without the fix, it results in deserialization error during read.
For LevelDBSuite, we see this for example - since it is writing CustomType1 instead of CustomType2 as value:

[ERROR] org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteAll -- Time elapsed: 0.024 s <<< ERROR!
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
Unrecognized field "name" (class org.apache.spark.util.kvstore.CustomType2), not marked as ignorable (3 known properties: "id", "parentId", "key"])
 at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 34] (through reference chain: org.apache.spark.util.kvstore.CustomType2["name"])
        at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1153)
        at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2241)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1821)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1799)
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:316)
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:342)
        at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4917)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3897)
        at org.apache.spark.util.kvstore.KVStoreSerializer.deserialize(KVStoreSerializer.java:70)
        at org.apache.spark.util.kvstore.LevelDB.get(LevelDB.java:136)
        at org.apache.spark.util.kvstore.LevelDB.read(LevelDB.java:148)
        at org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteAll(LevelDBSuite.java:452)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I got the point. They are written with "wrong type information". Got it.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 The change is obvious and the test performs the validation clearly. Thanks!

@mridulm mridulm closed this in 98d9968 Jan 18, 2025
mridulm pushed a commit that referenced this pull request Jan 18, 2025
…g written properly

### What changes were proposed in this pull request?

Fix a bug with LevelDB/RocksDB's batched write method (`writeAll`) not using the correct list to serialize values.
Luckily, existing use of this api is for the same class - which avoids this bug in practice.
This PR fixes the issue to ensure the api contract works as expected, and avoids issues in future.

### Why are the changes needed?
Fix existing bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New test introduced. Test fails without proposed changes.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #49479 from mridulm/mridulm/fix-kvstore-WriteAll-multiple-types.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(cherry picked from commit 98d9968)
Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
@mridulm
Copy link
Contributor Author

mridulm commented Jan 18, 2025

Thanks @HeartSaVioR !
Merged to master/branch-4.0

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @mridulm and @HeartSaVioR !

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…g written properly

### What changes were proposed in this pull request?

Fix a bug with LevelDB/RocksDB's batched write method (`writeAll`) not using the correct list to serialize values.
Luckily, existing use of this api is for the same class - which avoids this bug in practice.
This PR fixes the issue to ensure the api contract works as expected, and avoids issues in future.

### Why are the changes needed?
Fix existing bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New test introduced. Test fails without proposed changes.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#49479 from mridulm/mridulm/fix-kvstore-WriteAll-multiple-types.

Authored-by: Mridul Muralidharan <mridulatgmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(cherry picked from commit 1801b6f)
Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants