-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Support iceberg reader for hive catalog and data stored on HDFS #2225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @openinx could you help review the logic related with iceberg?thanks |
fe/fe-core/src/main/java/com/starrocks/analysis/CreateResourceStmt.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
|
Thanks for pinging me, @caneGuy . I'd like to take a look when I have a chance. |
fe/fe-core/src/main/java/com/starrocks/external/iceberg/IcebergMetaCache.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/planner/IcebergScanNode.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/StatisticsCalculator.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/analysis/CreateResourceStmt.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/common/StarRocksFEMetaVersion.java
Outdated
Show resolved
Hide resolved
54beee2 to
747cf1b
Compare
fe/fe-core/src/main/java/com/starrocks/external/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/sql/optimizer/operator/OperatorType.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergResource.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/external/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/analysis/CreateTableStmt.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergResource.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergResource.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergResource.java
Outdated
Show resolved
Hide resolved
5a9f493 to
ec4968d
Compare
|
run starrocks_clang-format |
6d83da3 to
0dd8fe1
Compare
|
run starrocks_fe_unittest |
|
run starrocks_clang-format |
2ee7eb1 to
dda852c
Compare
|
run starrocks_fe_unittest |
|
i have resolved comments from @openinx thanks PTAL |
3bb7a27 to
6c81ec3
Compare
fe/fe-core/src/main/java/com/starrocks/analysis/CreateTableStmt.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/external/iceberg/IcebergHiveCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we use the default new Configuration() to initialize the HiveCatalog, could we access the iceberg table that backed by a hadoop filesystem ? I rise this question because in my view the HiveCatalog will initialize its filesystem FileIO by using this hadoop configuration. https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L88
If we here use a default hadoop configuration, then how could we access the customized hadoop fs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a premise that we must put hadoop conf on classpath like hive table.
We will refactor for this use a common PR for hive external table and iceberg table
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we don't handle the nested schema correctly , right ? Because I see the nested schema won't be indexed in this icebergColumns map. Maybe you can try the iceberg' TypeUtil#indexByName method to generate the name -> fieldId index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the starrocks does not support nested fields, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not support nested fields in this version @openinx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will submit an other PR for nested schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see those tasks are FileScanTask, will starrocks provides any bin-pack algorithm to balance the splits between different parallelism ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, I'm thinking that we may need to introduce an extra data structure to handle the read process for iceberg v2 table because its split will contains both data file split and delete file split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statistics here are now useless, I will delete these codes and submit another pr.
fe/fe-core/src/main/java/com/starrocks/planner/IcebergScanNode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The div between task.length() and file.fileSizeInBytes() will always be 0 because it's a long-long division and the task.length() will always be less than file.fileSizeInBytes(). I will suggest to cast the task.length() to a double type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statistics here are now useless, I will delete these codes and submit another pr.
fe/fe-core/src/main/java/com/starrocks/external/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we currently only support expression like this pattern var op literal , actually there are more complex expression like AND, OR, NOT etc, is there any plan to support in the next version ?
In fact, I don't suggest to add the filter push down in this PR ( Because we are implementing an incomplete filter pushdown in this PR). It's good to focus a feature in one PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If plan to add filter push down for starrocks, the iceberg's SparkFilters class is a good example to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/catalog/IcebergTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/external/iceberg/ExpressionConverter.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/planner/IcebergScanNode.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/planner/IcebergScanNode.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/planner/IcebergScanNode.java
Outdated
Show resolved
Hide resolved
Co-authored-by: caneGuy <[email protected]> Co-authored-by: Ielihs <[email protected]>
3631360 to
fed910f
Compare
imay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
openinx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Give my +1.
imay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Seaven
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, and I suggest add some UT cases to cover query iceberg table later, like PlanFragmentTest
i will add some test case in |
|
run starrocks_be_unittest |
1 similar comment
|
run starrocks_be_unittest |
Add support for custom catalog which can be defined by users themselves when creating iceberg external table.
The custom catalog should be in the form of IcebergHiveCatalog, in other words extending BaseMetastoreCatalog and implementing IcebergCatalog. The catalog JAR should be placed into each fe/lib directory, and FE has to be restarted before custom catalog works.
Usage:
```sql
CREATE EXTERNAL RESOURCE "iceberg0"
PROPERTIES (
"type" = "iceberg",
"starrocks.catalog-type"="CUSTOM",
"iceberg.catalog-impl"="{The full class name of custom catalog}"
);
```
Extra config users defined can be added in table properties when executing CREATE EXTERNAL TABLE, see #2225.
Add support for custom catalog which can be defined by users themselves when creating iceberg external table.
The custom catalog should be in the form of IcebergHiveCatalog, in other words extending BaseMetastoreCatalog and implementing IcebergCatalog. The catalog JAR should be placed into each fe/lib directory, and FE has to be restarted before custom catalog works.
Usage:
```sql
CREATE EXTERNAL RESOURCE "iceberg0"
PROPERTIES (
"type" = "iceberg",
"starrocks.catalog-type"="CUSTOM",
"iceberg.catalog-impl"="{The full class name of custom catalog}"
);
```
Extra config users defined can be added in table properties when executing CREATE EXTERNAL TABLE, see StarRocks#2225.
Co-authored-by: caneGuy [email protected]
Co-authored-by: Ielihs [email protected]
This is the first PR for #1030 which add support for Iceberg table stored on HDFS and use HiveCatalog for catalog.
Goals:
Non Goals:
Example:
We have run the tpcds queries for correctness check.