-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: Implement Flink InputFormat and integrate it to FlinkCatalog #1293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
76426e6 to
3e43afd
Compare
|
|
||
| @Override | ||
| public Map<String, String> requiredContext() { | ||
| throw new UnsupportedOperationException("Iceberg Table Factory can not loaded from Java SPI"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Probable typo. Should possible be can not *be* loaded from Java SPI
I also think it might help the average reader if you defined the acronym SPI at least once.
3e43afd to
2f88127
Compare
openinx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a rough look, left few comments. Will take a more closer review.
| switch (fileFormat) { | ||
| case AVRO: | ||
| appender = Avro.write(Files.localOutput(file)) | ||
| .schema(table.schema()) | ||
| .createWriterFunc(DataWriter::create) | ||
| .named(fileFormat.name()) | ||
| .build(); | ||
| break; | ||
|
|
||
| case PARQUET: | ||
| appender = Parquet.write(Files.localOutput(file)) | ||
| .schema(table.schema()) | ||
| .createWriterFunc(GenericParquetWriter::buildWriter) | ||
| .named(fileFormat.name()) | ||
| .build(); | ||
| break; | ||
|
|
||
| case ORC: | ||
| appender = ORC.write(Files.localOutput(file)) | ||
| .schema(table.schema()) | ||
| .createWriterFunc(GenericOrcWriter::buildWriter) | ||
| .build(); | ||
| break; | ||
|
|
||
| default: | ||
| throw new UnsupportedOperationException("Cannot write format: " + fileFormat); | ||
| } | ||
|
|
||
| try { | ||
| appender.addAll(records); | ||
| } finally { | ||
| appender.close(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we could abstract those appender building into a separate method, I saw lots of unit tests depend on this common logics, could be a separate issue to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have a GenericAppenderFactory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #1340
| * Flink needs to get {@link Table} objects in the cluster (for example, to get splits), not just on the client side. | ||
| * So we need an Iceberg catalog loader to get the {@link Catalog} and get the {@link Table} object. | ||
| */ | ||
| public interface CatalogLoader extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we could move this class to hive modules ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in hive but somewhere in Iceberg modules, I don't know if any other modules want to use CatalogLoader.
We can discuss it in #1332
flink/src/test/java/org/apache/iceberg/flink/data/RandomData.java
Outdated
Show resolved
Hide resolved
2622071 to
9942126
Compare
| this.originalCatalog = icebergCatalog; | ||
| this.icebergCatalog = cacheEnabled ? CachingCatalog.wrap(icebergCatalog) : icebergCatalog; | ||
| this.originalCatalog = catalogLoader.loadCatalog( | ||
| HadoopUtils.getHadoopConfiguration(GlobalConfiguration.loadConfiguration())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a Configuration argument in this current constructor. IMO for FlinkCatalog, it shouldn't handle the logic about configuration initialization. Moving the configuration loading to the FlinkCatalogFactory sounds more reasonable. besides, unit test for FlinkCatalog will be easy because it don't depend on how the flink will load its configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss it in #1332
| } | ||
|
|
||
| public FlinkInputFormat build() { | ||
| return new FlinkInputFormat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better to have a Precondition#check for those arguments in case of throw NPE in the following call stack.
|
|
||
| @Override | ||
| protected List<Row> executeWithSnapshotId(Table table, long snapshotId) throws IOException { | ||
| return run(builder.table(table).options(ScanOptions.builder().snapshotId(snapshotId).build()).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add unit tests for following cases:
- scan with both
startSnapshotIdandendSnapshotId; - scan with only
asOfTimestamp; - scan with only
startSnapshotId.
|
|
||
| /** | ||
| * Prune columns from a {@link Schema} using a projected fields. | ||
| * TODO Why Spark care about filters? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the javadoc said:
The filters list of {@link Expression} is used to ensure that columns referenced by filters are projected.
For my understanding, when doing filter push down, we need to keep the columns which has been involved in push-down filter even if it does not in the projection column list.
Now we do not implement the FilterableTableSource interface, so I think we don't need to consider it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The columns which has been involved in push-down filter must be in the projection column list. Because just like spark:
Spark doesn't support residuals per task, so return all filters to get Spark to handle record-level filtering.
Flink source also doesn't support residuals per task, left these filtering to Flink planner.
| CloseableIterable<RowData> iterable = newIterable(task, idToConstant); | ||
| ProjectionRowData projectionRow = new ProjectionRowData(); | ||
| return (finalProjection == null ? iterable : CloseableIterable.transform( | ||
| iterable, rowData -> (RowData) projectionRow.replace(rowData, finalProjection))).iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Why we need to transform the CloseableIterable<RowData> to be a projected RowData iterable again, I mean we've created an AVRO iterable with the projected read schema, it should guarantee that iterable only contains the projected columns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can take a look to FlinkSchemaUtil. pruneWithoutReordering, it just keep the order from original schema. But the real Flink projection may change the order.
| @Override | ||
| public FlinkInputSplit[] createInputSplits(int minNumSplits) throws IOException { | ||
| // Invoked by Job manager, so it is OK to load table from catalog. | ||
| tableLoader.open(HadoopUtils.getHadoopConfiguration(GlobalConfiguration.loadConfiguration())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Are we binding the iceberg table's hadoop configuration with flink job manager's hadoop configuration ? Is it possible to access an iceberg table in the hadoop cluster which is different with flink's hadoop cluster ? Seems a more reasonable way is: Passing a customized SerializeableConfiguration from client, then job manager could access any hadoop clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK to pass a configuration to FlinkInputFormat, although there will be some serialization cost.
|
Thanks for the update, I will take a look today. |
9942126 to
2e0fc4c
Compare
2e0fc4c to
70755cc
Compare
70755cc to
a18e064
Compare
24ec004 to
dfb6825
Compare
| } | ||
|
|
||
| @Override | ||
| public Optional<TableFactory> getTableFactory() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: The javadoc says it's deprecated now, What's the time for us to use the getFactory in future ?
* @deprecated Use {@link #getFactory()} for the new factory stack. The new factory stack uses the
* new table sources and sinks defined in FLIP-95 and a slightly different discovery mechanism.
*/
@Deprecated
default Optional<TableFactory> getTableFactory() {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until the new API is really ready...
In the 1.11 and master, FLIP-95 interfaces still lack many things. I am not sure about Flink 1.12, maybe 1.13 is the time.
| * Flink Iceberg table factory to create table source and sink. | ||
| * Only works for catalog, can not be loaded from Java SPI(Service Provider Interface). | ||
| */ | ||
| class FlinkTableFactory implements TableSourceFactory<RowData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may put the tableSink creator in this class too. https://github.com/apache/iceberg/pull/1348/files#diff-0ad7dfff9cfa32fbb760796d976fd650R34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should be in this class too.
| private final CatalogLoader catalogLoader; | ||
| private final Configuration hadoopConf; | ||
| private final TableSchema schema; | ||
| private final Map<String, String> options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming it to scanOptions ? I was thought it's an options map of the iceberg table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also includes table hints.
| public TableSource<RowData> createTableSource(Context context) { | ||
| ObjectIdentifier identifier = context.getObjectIdentifier(); | ||
| ObjectPath objectPath = new ObjectPath(identifier.getDatabaseName(), identifier.getObjectName()); | ||
| TableIdentifier icebergIdentifier = catalog.toIdentifier(objectPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the toIdentifier could be a static method in catalog .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can not, because it needs baseNamespace information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK , I did not take look into the toNamespace, sounds good.
| try { | ||
| Table table = catalog.getIcebergTable(objectPath); | ||
| // Excludes computed columns | ||
| TableSchema icebergSchema = TableSchemaUtils.getPhysicalSchema(context.getTable().getSchema()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what if someone query the flink table with projecting a computed column (which does not exist in iceberg table )? Does that works fine in current version , or will it throw an exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not works.
Even if the iceberg table supports computed columns in the future, these computed columns will be generated by Flink instead of Iceberg source. (So here should always be getPhysicalSchema)
|
Thanks all for your help, all sub-PRs have been completed. I'll close this PR. |
Fixes #1275
This is Proof of Concept (POC) for Flink reader.
The Flink reader is essentially the same as Spark.
InputFormatis similar to Hive (Hadoop) input format. Its splits are generated in the job manager. Therefore, an iceberg catalog loader is needed to obtain the IcebergTableobject.TableFactoryandTableSourceare similar to SparkTableProviderandSparkScanBuilder, It also provides projection push downProjectableTableSourceand filter push downFilterableTableSource.Work can be divided into:
(Done) Flink: Using RowData to avro reader and writer Flink: Using RowData to avro reader and writer #1232(Done) IntroduceCatalogLoader. Flink: Introduce CatalogLoader and TableLoader #1332(Done) IntroduceGenericAppenderFactoryandGenericAppenderHelperfor reusing and testing. Introduce GenericAppenderFactory and GenericAppenderHelper #1340(Done) IntroduceFlinkInputFormat: implement SplitGenerator and RowDataReader. Flink: Introduce Flink InputFormat #1346(Done) IntroduceFlinkTableFactoryandFlinkTableSource. Flink: Integrate Flink reader to SQL #1509