-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core : remove the UnmodifiableMap #2258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b357568 to
5268dcf
Compare
| if (map != null) { | ||
| Map<K, V> copy = Maps.newHashMapWithExpectedSize(map.size()); | ||
| copy.putAll(map); | ||
| return Collections.unmodifiableMap(copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that this was for defensive copy, are there any other unmodifiableMap/immutable map implementations that do not cause kryo serialization problem? Or is it possible to register extra kryo serializers in Flink so that we don't potentially run into similar problem in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly, we use the CombinedScanTask class in many places, such as RowDataIterator for reading data. The bottom layer of the CombinedScanTask class contains BaseFile. When we doing deserialize, because we cannot put the data into unmodifiableMap, it causes an exception , the deserialization failed, causing the program to fail. I tried other methods, but I did not find a suitable solution for the time being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove the unmodifiableMap ,it is work fine,But I am not sure if it will affect other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyanyy is right. This is a method for defensive copies. I think it can be fine to do this, but we want to provide the same assurance that the data won't be modified. We can do that by moving the unmodifiable map creation to other places, but that is a big change and will result in a lot more object creation.
Another option is to use an alternative wrapper class that can be cleanly serialized. Elsewhere in this class we use SerializableByteBufferMap. Does that work with Kryo? If so, can we use the same pattern here?
That uses a proxy object. If that works, then the proxy object can be used to add items to the map and then the actual map could be unmodifiable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will find again if there are other serializers, or if we can custom serializers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across #775 which seems to be solving the same problem that could potentially be reused here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to fix this BaseFile's kryo serialization issues. I encountered the same issue when preparing the rewrite action for v2 ( https://github.com/apache/iceberg/pull/2303/files#diff-b51f408c19404436a1cf77f1f31b0a87e5ec1404295f576fc3ca90849de90da0R436). I read the patch #775, it just change the com.google.common.collect.ImmutableList to java.util.Collections.UnmodifiableList, both them does not support the add method, so how did the PR fix the kryo issue ? I did not get the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way that I could think of is replacing the Map<Integer, Long> to an self-defined immutable map which won't implement the java.util.Map interface, in this way we don't have to depend on the kryo's internal MapSerializer (It will call the put to add element into immutable collection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, for this fix, I think we have to provide an unit test to reproduce the kryo issue so that we don't have to fix it again and again in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kryo in Spark somehow manages to handle unmodifiable collections. #775 came with a test and I also tested this recently in other places.
Now, in
BaseFile, there are some metrics, likenullValueCounts,nanValueCountsetc, they useUnmodifiableMap, resulting in flink serialization problems. For example, theRewriteDataFilesActionand streaming read function will failed,the issue is here(#2219 ). I checked some information and found that spark has similar problems if using kryo.,so I remove the
UnmodifiableMap, and I tested that, theRewriteDataFilesActionwill not throw exception.what do you think about this ? @rdblue