Skip to content

Conversation

@zhongyujiang
Copy link
Contributor

@zhongyujiang zhongyujiang commented Sep 7, 2021

This PR fixs ClassCastException when using Flink to read map and array data from parquet format data file. It adds a buildGenericMapData() in FlinkParquetReaders$ReusableMapData and adds a buildGenericArrayData() in FlinkParquetReaders$ReusableArrayData, and the XXReaders would build GenericXXData for return, which can be recognized by TypeSerializer of Flink.
Related to #3080.

@github-actions github-actions bot added the flink label Sep 7, 2021
@zhongyujiang
Copy link
Contributor Author

@openinx @chenjunjiedada, could you help to review this PR? Thanks.

@coolderli
Copy link
Contributor

Make sense to me!

@zlzhang0122
Copy link

LGTM!

import org.apache.flink.table.data.StringData;
import org.apache.flink.table.data.TimestampData;

import org.apache.flink.table.data.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In iceberg, we usually don't use * to import package, it's more clear to import the specify package one by one.

@openinx openinx added the bug Something isn't working label Sep 7, 2021
Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting this bug, @zhong-yj ! The root cause is: apache flink couldn't serialize & deserialize customized MapData and ArrayData, technically it's a bug from apache flink , not iceberg bug. It's reasonable to customize our own ReusableArrayData to reuse the array data for saving memory purpose, and we should't copy all the elements from reusable array data for just fixing the serialize issue.

Actually, we apache flink 1.13 has a fix for this issue: https://issues.apache.org/jira/browse/FLINK-21247 . I think you can try to upgrade the flink to 1.13 to reproduce this bug .

Though we've a fix in flink 1.13, but it still expose an import issue in our apache iceberg flink module: we don't have SQL nested data unit tests to address this. I think it's worth to public PR to address this testing thing ( that may need us to merge the #2629 firstly).

@openinx
Copy link
Member

openinx commented Sep 8, 2021

I tried this by building flink-iceberg-runtime.jar and running the SQL in flink 1.13.2 local cluster. All things works as expected
image .

Let's just close this PR , and turn to upgrade the flink version to 1.13.2 #2629 . Close this now. Thanks @zhong-yj for your reporting and contribution !

@openinx
Copy link
Member

openinx commented Sep 27, 2021

This bug has been addressed since we've successfully integrated to flink 1.13.2 in here : #3116

@zhongyujiang zhongyujiang deleted the fix-flink-read-mapdata-and-arraydata-error branch April 26, 2022 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working flink

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants