-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: FLIP-27 source split and reader #2305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e236d11
a0037db
ef0b937
5b3f5fd
ce4cfad
c2b7eea
93e3e49
887436f
75b29cb
db48cd7
819bed5
7e51471
71eaa71
d7ec63d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| project(':iceberg-flink-runtime') { | ||
| apply plugin: 'com.github.johnrengelman.shadow' | ||
|
|
||
| tasks.jar.dependsOn tasks.shadowJar | ||
|
|
||
| configurations { | ||
| implementation { | ||
| exclude group: 'org.apache.flink' | ||
| // included in Flink | ||
| exclude group: 'org.slf4j' | ||
| exclude group: 'org.apache.commons' | ||
| exclude group: 'commons-pool' | ||
| exclude group: 'commons-codec' | ||
| exclude group: 'org.xerial.snappy' | ||
| exclude group: 'javax.xml.bind' | ||
| exclude group: 'javax.annotation' | ||
| } | ||
| } | ||
|
|
||
| dependencies { | ||
| implementation project(':iceberg-flink') | ||
| implementation project(':iceberg-aws') | ||
| implementation(project(':iceberg-nessie')) { | ||
| exclude group: 'com.google.code.findbugs', module: 'jsr305' | ||
| } | ||
|
|
||
| // flink-connector-base is not part of Flink runtime. Hence, | ||
| // iceberg-flink-runtime should include it as a transitive dependency. | ||
| implementation "org.apache.flink:flink-connector-base" | ||
| } | ||
|
|
||
| shadowJar { | ||
| configurations = [project.configurations.runtimeClasspath] | ||
|
|
||
| zip64 true | ||
|
|
||
| // include the LICENSE and NOTICE files for the shaded Jar | ||
| from(projectDir) { | ||
| include 'LICENSE' | ||
| include 'NOTICE' | ||
| } | ||
|
|
||
| // Relocate dependencies to avoid conflicts | ||
| relocate 'org.apache.avro', 'org.apache.iceberg.shaded.org.apache.avro' | ||
| relocate 'org.apache.parquet', 'org.apache.iceberg.shaded.org.apache.parquet' | ||
| relocate 'com.google', 'org.apache.iceberg.shaded.com.google' | ||
| relocate 'com.fasterxml', 'org.apache.iceberg.shaded.com.fasterxml' | ||
| relocate 'com.github.benmanes', 'org.apache.iceberg.shaded.com.github.benmanes' | ||
| relocate 'org.checkerframework', 'org.apache.iceberg.shaded.org.checkerframework' | ||
| relocate 'shaded.parquet', 'org.apache.iceberg.shaded.org.apache.parquet.shaded' | ||
| relocate 'org.apache.orc', 'org.apache.iceberg.shaded.org.apache.orc' | ||
| relocate 'io.airlift', 'org.apache.iceberg.shaded.io.airlift' | ||
| relocate 'org.threeten.extra', 'org.apache.iceberg.shaded.org.threeten.extra' | ||
|
|
||
| classifier null | ||
| } | ||
|
|
||
| jar { | ||
| enabled = false | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ project(':iceberg-flink') { | |
| implementation project(':iceberg-parquet') | ||
| implementation project(':iceberg-hive-metastore') | ||
|
|
||
| compileOnly "org.apache.flink:flink-connector-base" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's strange the the build for flink 1.12 & flink 1.13 has been passed, because I don't see the same dependency are added to flink 1.12 build.gradle and 1.13 build.gradle. Maybe I need to check the 1.12's build.gradle again.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @openinx Maybe follow-up on the other comment discussion here. With the SplitEnumerator API change, looks like I need to put FLIP-27 source in the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @openinx, the tests run against the iceberg-flink module. They aren't present in the 1.12 or 1.13 modules. If you want them to be run for those modules, you'd need to add the source folder like you do for
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevenzwu, I think that copying the parts that change is reasonable. And once we remove support for 1.12, you can move the files back into the common module.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah. that is my plan too. Once 1.12 support is removed, we should be able to move files back to the common module. We just need to be diligent with these efforts. |
||
| compileOnly "org.apache.flink:flink-streaming-java_2.12" | ||
| compileOnly "org.apache.flink:flink-streaming-java_2.12::tests" | ||
| compileOnly "org.apache.flink:flink-table-api-java-bridge_2.12" | ||
|
|
@@ -56,6 +57,7 @@ project(':iceberg-flink') { | |
| exclude group: 'org.apache.hive', module: 'hive-storage-api' | ||
| } | ||
|
|
||
| testImplementation "org.apache.flink:flink-connector-test-utils" | ||
| testImplementation "org.apache.flink:flink-core" | ||
| testImplementation "org.apache.flink:flink-runtime_2.12" | ||
| testImplementation "org.apache.flink:flink-table-planner-blink_2.12" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,4 +40,10 @@ private FlinkConfigOptions() { | |
| .intType() | ||
| .defaultValue(100) | ||
| .withDescription("Sets max infer parallelism for source operator."); | ||
|
|
||
| public static final ConfigOption<Integer> SOURCE_READER_FETCH_RECORD_BATCH_SIZE = ConfigOptions | ||
| .key("source.iceberg.reader.fetch-record-batch-size") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there precedent for this config key? What other keys are similar? The others in this file start with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't use I checked the two FLIP-27 source impls (Kafka and file) in Flink repo.
This is following the file source convention. @openinx any suggestion?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this is unrelated to the table/SQL execution. Both datastream job and table SQL job are using the same configuration keys. So I'm okay to keep the current name. (In fact, if we don't consider the flink's configuration name, I'd prefer to name it |
||
| .intType() | ||
| .defaultValue(2048) | ||
openinx marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| .withDescription("The target number of records for Iceberg reader fetch batch."); | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| import org.apache.iceberg.encryption.InputFilesDecryptor; | ||
| import org.apache.iceberg.io.CloseableIterator; | ||
| import org.apache.iceberg.io.FileIO; | ||
| import org.apache.iceberg.relocated.com.google.common.base.Preconditions; | ||
|
|
||
| /** | ||
| * Flink data iterator that reads {@link CombinedScanTask} into a {@link CloseableIterator} | ||
|
|
@@ -41,18 +42,47 @@ public class DataIterator<T> implements CloseableIterator<T> { | |
| private final FileScanTaskReader<T> fileScanTaskReader; | ||
|
|
||
| private final InputFilesDecryptor inputFilesDecryptor; | ||
| private Iterator<FileScanTask> tasks; | ||
| private final CombinedScanTask combinedTask; | ||
| private final Position position; | ||
|
|
||
| private Iterator<FileScanTask> fileTasksIterator; | ||
| private CloseableIterator<T> currentIterator; | ||
|
|
||
| public DataIterator(FileScanTaskReader<T> fileScanTaskReader, CombinedScanTask task, | ||
| FileIO io, EncryptionManager encryption) { | ||
| this.fileScanTaskReader = fileScanTaskReader; | ||
|
|
||
| this.inputFilesDecryptor = new InputFilesDecryptor(task, io, encryption); | ||
| this.tasks = task.files().iterator(); | ||
| this.combinedTask = task; | ||
| // fileOffset starts at -1 because we started | ||
| // from an empty iterator that is not from the split files. | ||
| this.position = new Position(-1, 0L); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The general I will suggest to introduce a separate SeekableDataIterator to isolate the two code path, I made a simple commit for this: https://github.com/openinx/incubator-iceberg/commit/b08dde86aae0c718d9d72acb347dffb3a836b336, you may want to take a look.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't say that Thanks a lot for the Overall, I still think adding |
||
|
|
||
| this.fileTasksIterator = task.files().iterator(); | ||
| this.currentIterator = CloseableIterator.empty(); | ||
| } | ||
|
|
||
| public void seek(Position startingPosition) { | ||
| // skip files | ||
| Preconditions.checkArgument(startingPosition.fileOffset() < combinedTask.files().size(), | ||
| "Checkpointed file offset is %d, while CombinedScanTask has %d files", | ||
| startingPosition.fileOffset(), combinedTask.files().size()); | ||
| for (long i = 0L; i < startingPosition.fileOffset(); ++i) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. integer would certainly be sufficient. I was using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as another comment. will update |
||
| fileTasksIterator.next(); | ||
| } | ||
| updateCurrentIterator(); | ||
| // skip records within the file | ||
| for (long i = 0; i < startingPosition.recordOffset(); ++i) { | ||
| if (hasNext()) { | ||
| next(); | ||
| } else { | ||
| throw new IllegalStateException("Not enough records to skip: " + | ||
| startingPosition.recordOffset()); | ||
| } | ||
| } | ||
| this.position.update(startingPosition.fileOffset(), startingPosition.recordOffset()); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, will change |
||
| } | ||
|
|
||
| @Override | ||
| public boolean hasNext() { | ||
| updateCurrentIterator(); | ||
|
|
@@ -62,18 +92,24 @@ public boolean hasNext() { | |
| @Override | ||
| public T next() { | ||
| updateCurrentIterator(); | ||
| position.advanceRecord(); | ||
| return currentIterator.next(); | ||
| } | ||
|
|
||
| public boolean isCurrentIteratorDone() { | ||
| return !currentIterator.hasNext(); | ||
| } | ||
|
|
||
| /** | ||
| * Updates the current iterator field to ensure that the current Iterator | ||
| * is not exhausted. | ||
| */ | ||
| private void updateCurrentIterator() { | ||
| try { | ||
| while (!currentIterator.hasNext() && tasks.hasNext()) { | ||
| while (!currentIterator.hasNext() && fileTasksIterator.hasNext()) { | ||
| currentIterator.close(); | ||
| currentIterator = openTaskIterator(tasks.next()); | ||
| currentIterator = openTaskIterator(fileTasksIterator.next()); | ||
| position.advanceFile(); | ||
| } | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException(e); | ||
|
|
@@ -88,6 +124,10 @@ private CloseableIterator<T> openTaskIterator(FileScanTask scanTask) { | |
| public void close() throws IOException { | ||
| // close the current iterator | ||
| currentIterator.close(); | ||
| tasks = null; | ||
| fileTasksIterator = null; | ||
| } | ||
|
|
||
| public Position position() { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It appears that you are using CheckpointedPosition DS to communicate the position that the iterator has to seek to from outside. However, in order to communicate the current position to the outside world, you are using the internal Position DS. Wondering if we can keep this consistent to be either CheckpointedPosition or the mutable Position?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is a good point. It is also related to your question above. Let me see how to unify them and maybe move away from Flink's CheckpointedPosition |
||
| return position; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,33 +22,58 @@ | |
| import java.io.IOException; | ||
| import java.io.UncheckedIOException; | ||
| import java.util.List; | ||
| import org.apache.flink.annotation.Internal; | ||
| import org.apache.iceberg.CombinedScanTask; | ||
| import org.apache.iceberg.Table; | ||
| import org.apache.iceberg.TableProperties; | ||
| import org.apache.iceberg.TableScan; | ||
| import org.apache.iceberg.expressions.Expression; | ||
| import org.apache.iceberg.flink.source.split.IcebergSourceSplit; | ||
| import org.apache.iceberg.io.CloseableIterable; | ||
| import org.apache.iceberg.relocated.com.google.common.collect.Lists; | ||
|
|
||
| class FlinkSplitGenerator { | ||
| private FlinkSplitGenerator() { | ||
| @Internal | ||
| public class FlinkSplitPlanner { | ||
| private FlinkSplitPlanner() { | ||
| } | ||
|
|
||
| static FlinkInputSplit[] createInputSplits(Table table, ScanContext context) { | ||
| List<CombinedScanTask> tasks = tasks(table, context); | ||
| FlinkInputSplit[] splits = new FlinkInputSplit[tasks.size()]; | ||
| for (int i = 0; i < tasks.size(); i++) { | ||
| splits[i] = new FlinkInputSplit(i, tasks.get(i)); | ||
| static FlinkInputSplit[] planInputSplits(Table table, ScanContext context) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why change the name of this method?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| try (CloseableIterable<CombinedScanTask> tasksIterable = planTasks(table, context)) { | ||
| List<CombinedScanTask> tasks = Lists.newArrayList(tasksIterable); | ||
| FlinkInputSplit[] splits = new FlinkInputSplit[tasks.size()]; | ||
| for (int i = 0; i < tasks.size(); i++) { | ||
| splits[i] = new FlinkInputSplit(i, tasks.get(i)); | ||
| } | ||
| return splits; | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to process tasks iterable", e); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * This returns splits for the FLIP-27 source | ||
| */ | ||
| public static List<IcebergSourceSplit> planIcebergSourceSplits( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a javadoc (Or replace it with a more clear name) to indicate why do we need to add an extra
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's good to align with the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, I think we should rename Since |
||
| Table table, ScanContext context) { | ||
| try (CloseableIterable<CombinedScanTask> tasksIterable = planTasks(table, context)) { | ||
| List<IcebergSourceSplit> splits = Lists.newArrayList(); | ||
| tasksIterable.forEach(task -> splits.add(IcebergSourceSplit.fromCombinedScanTask(task))); | ||
| return splits; | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to process task iterable: ", e); | ||
| } | ||
| return splits; | ||
| } | ||
|
|
||
| private static List<CombinedScanTask> tasks(Table table, ScanContext context) { | ||
| static CloseableIterable<CombinedScanTask> planTasks(Table table, ScanContext context) { | ||
| TableScan scan = table | ||
| .newScan() | ||
| .caseSensitive(context.caseSensitive()) | ||
| .project(context.project()); | ||
|
|
||
| if (context.includeColumnStats()) { | ||
| scan = scan.includeColumnStats(); | ||
| } | ||
|
|
||
| if (context.snapshotId() != null) { | ||
| scan = scan.useSnapshot(context.snapshotId()); | ||
| } | ||
|
|
@@ -83,10 +108,6 @@ private static List<CombinedScanTask> tasks(Table table, ScanContext context) { | |
| } | ||
| } | ||
|
|
||
| try (CloseableIterable<CombinedScanTask> tasksIterable = scan.planTasks()) { | ||
| return Lists.newArrayList(tasksIterable); | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to close table scan: " + scan, e); | ||
| } | ||
| return scan.planTasks(); | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.flink.source; | ||
|
|
||
| import java.io.Serializable; | ||
| import java.util.Objects; | ||
| import org.apache.flink.annotation.Internal; | ||
| import org.apache.iceberg.CombinedScanTask; | ||
| import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; | ||
|
|
||
| /** | ||
| * A mutable class that defines the read position | ||
| * <ul> | ||
| * <li>file offset in the list of files in a {@link CombinedScanTask}</li> | ||
| * <li>record offset within a file</li> | ||
| * </ul> | ||
| */ | ||
| @Internal | ||
| public class Position implements Serializable { | ||
|
|
||
| private static final long serialVersionUID = 1L; | ||
|
|
||
| private int fileOffset; | ||
| private long recordOffset; | ||
|
|
||
| public Position(int fileOffset, long recordOffset) { | ||
| this.fileOffset = fileOffset; | ||
| this.recordOffset = recordOffset; | ||
| } | ||
|
|
||
| void advanceFile() { | ||
| this.fileOffset += 1; | ||
| this.recordOffset = 0L; | ||
| } | ||
|
|
||
| void advanceRecord() { | ||
| this.recordOffset += 1L; | ||
| } | ||
|
|
||
| public void update(int newFileOffset, long newRecordOffset) { | ||
| this.fileOffset = newFileOffset; | ||
| this.recordOffset = newRecordOffset; | ||
| } | ||
|
|
||
| public int fileOffset() { | ||
| return fileOffset; | ||
| } | ||
|
|
||
| public long recordOffset() { | ||
| return recordOffset; | ||
| } | ||
|
|
||
| @Override | ||
| public boolean equals(Object o) { | ||
| if (this == o) { | ||
| return true; | ||
| } | ||
| if (o == null || getClass() != o.getClass()) { | ||
| return false; | ||
| } | ||
| final Position that = (Position) o; | ||
| return Objects.equals(fileOffset, that.fileOffset) && | ||
| Objects.equals(recordOffset, that.recordOffset); | ||
| } | ||
|
|
||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(fileOffset, recordOffset); | ||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return MoreObjects.toStringHelper(this) | ||
| .add("fileOffset", fileOffset) | ||
| .add("recordOffset", recordOffset) | ||
| .toString(); | ||
| } | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other dependencies that we expect to be present at runtime, we use compileOnly so the dependency doesn't leak into the runtime Jar. Is that something we should do here as well? This looks like it would add the Flink Jar into
runtimeClasspath, which would get included in the Jar.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the flink-runtime module. Hence we used
implementationdep to pull inflink-connector-basefor theiceberg-flink-runtimejar. In the flink module below, it iscompileOnlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment.
@rdblue
flink-connector-baseneeds to be a transitive dependency of the iceberg connector (or shaded/relocated). It is not part of the Flink runtime.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu After we got this PR merged: #3364, we don't need to introduce a common
iceberg-flink-runtimefor all different flink versions, instead we have a differenticeberg-flink:iceberg-flink-<MAJOR.MINO>-runtimemodule for different<MAJRO.MINOR>flink releases so that we could build the features on top of the latest flink API.You may want to add the transitive dependency
org.apache.flink:flink-connector-basein this line for flink 1.12 , and this line for flink 1.13.