-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: infer source parallelism for FLIP-27 source in batch execution mode #10832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: infer source parallelism for FLIP-27 source in batch execution mode #10832
Conversation
e6c3a59 to
de63e1d
Compare
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Show resolved
Hide resolved
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Outdated
Show resolved
Hide resolved
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Show resolved
Hide resolved
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Outdated
Show resolved
Hide resolved
| "testBasicRead", | ||
| TypeInformation.of(RowData.class)) | ||
| sourceBuilder | ||
| .buildStream(env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched this class to test the new buildStream API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests remaining to check the fromSource API for bounded sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will only switch this class
...ink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedGenericRecord.java
Outdated
Show resolved
Hide resolved
| Field privateField = | ||
| MiniClusterExtension.class.getDeclaredField("internalMiniClusterExtension"); | ||
| privateField.setAccessible(true); | ||
| InternalMiniClusterExtension internalExtension = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use reflect to retrieve InternalMiniClusterExtension to get MiniCluster in order to get execution graph to verify source parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call
env.getTransformations().get(0).getParallelism()
before env.executeAsync() then you could get the parallelism. Would this help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried it with debugger. the value is the default parallelism of 4 while the expected inferred source parallelism is 1 after the executeAsync()
DataStream<Row> dataStream =
IcebergSource.forRowData()
.tableLoader(CATALOG_EXTENSION.tableLoader())
.table(table)
.flinkConfig(config)
// force one file per split
.splitSize(1L)
.buildStream(env)
.map(new RowDataToRowMapper(FlinkSchemaUtil.convert(table.schema())));
int sourceParallelism = env.getTransformations().get(0).getParallelism();
| * | ||
| * @return data stream from the Iceberg source | ||
| */ | ||
| public DataStream<T> buildStream(StreamExecutionEnvironment env) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new public API. I also thought about the method name as createStream. but decided this name for now. open to other suggestion.
also think it is better to require StreamExecutionEnvironment here instead of having it a builder method so that it is clear it is not required for the build() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence about env is also true for outputTypeInfo, and watermarkStrategy. Maybe adding them as a parameter would be reasonable too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, outputTypeInfo can be inferred from the ReaderFunction if using provided RowData or Avro reader.
if (outputTypeInfo == null) {
this.outputTypeInfo = inferOutputTypeInfo(table, context, readerFunction);
}
watermarkStrategy is defaulted WatermarkStrategy.noWatermarks. so it is not mandatory either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputTypeInfo is not needed anymore with the Converter interface. Removed watermark strategy for now. We can always add it back in the future if it is needed.
|
the new |
72ea6fb to
3ead397
Compare
| if (getRuntimeContext().getTaskInfo().getAttemptNumber() <= 0) { | ||
| // Simulate slow subtask 0 with attempt 0 | ||
| TaskInfo taskInfo = getRuntimeContext().getTaskInfo(); | ||
| if (taskInfo.getIndexOfThisSubtask() == 0 && taskInfo.getAttemptNumber() <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this change of inferring source parallelism, this test would hang (even with the inferring parallelism flag turned off). It seems that speculative execution won't kick in somehow. This line of change seems to fix the problem however (tried local run 50 times without a failure). Not sure exactly why. Regardless of the reason, this seems like a good change anyway.
@pvary @venkata91 @becketqin let me know if you have any idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Thanks for tagging me. Do you mean adding taskInfo.getIndexOfThisSubtask() == 0 solves the hang issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@venkata91 that is correct.
| new Configuration() | ||
| // disable classloader check as Avro may cache class/object in the serializers. | ||
| .set(CoreOptions.CHECK_LEAKED_CLASSLOADER, false) | ||
| // disable inferring source parallelism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferred parallelism might mess up the watermark and record ordering comparison. disable it to avoid the flakiness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this so?
Do you have any idea? Would it be an issue in prod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. let me dig more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestIcebergSourceSql assume the parallelism is 1 for testWatermarkOptionsAscending and testWatermarkOptionsDescending. Table has 2 files. This test just check split assignment is ordered with single reader.
tableEnvironment.getConfig().set("table.exec.resource.default-parallelism", "1");
I think TestIcebergSourceWithWatermarkExtractor similarly assumes parallelism of 4. inferring parallelism would change source parallelism to the number of splits and potentially inferring with the assertion on ordering of the read records..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explicitly setting the parallelism in those tests would be better.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, we don't need the disabling for this particular test as it doesn't go through the buildStream(env) path where infer parallelism happens. will revert the change
1035dad to
aaf3765
Compare
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Outdated
Show resolved
Hide resolved
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Outdated
Show resolved
Hide resolved
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java
Outdated
Show resolved
Hide resolved
| * Optional. Default is no watermark strategy. Only relevant if using the {@link | ||
| * Builder#buildStream(StreamExecutionEnvironment)}. | ||
| */ | ||
| public Builder<T> watermarkStrategy(WatermarkStrategy<T> newStrategy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to provide a useful WatermarkStrategy?
I think it is possible to provide a useful TimestampAssigner, but I don't see how can someone provide a useful WatermarkGenerator. The only possible way to generate watermarks are with the watermarkColumn is provided, but even then the WatermarkGenerator should not be used.
Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only for the buildStream method. this just moved the watermark strategy from env.fromSource to the builder.
for regular build method, users would also need to set the watermark strategy, which most likely would be none.
DataStream<RowData> stream =
env.fromSource(
sourceBuilder().build(),
WatermarkStrategy.noWatermarks(),
"IcebergSource",
TypeInformation.of(RowData.class));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove WatermarkStrategy from here. we can always add another buildStream with a new arg in the future if there is really a need.
| } | ||
|
|
||
| ScanContext context = contextBuilder.build(); | ||
| this.context = contextBuilder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ugly as hell.
Side-effect of the build method....
Maybe creating an init() with all the side-effects?
And then build() which uses the initialized attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree side-effect is undesirable. let me think of a way to refactor the code. maybe extract the ScanContext building into a separate method from build()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Converter interface will obsolete this problem.
...9/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceInferParallelism.java
Outdated
Show resolved
Hide resolved
...9/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceInferParallelism.java
Outdated
Show resolved
Hide resolved
be4ce67 to
09cb0eb
Compare
32310d4 to
3aabbad
Compare
3aabbad to
f86449a
Compare
|
thanks @pvary for the review |
* main: (208 commits) Docs: Fix Flink 1.20 support versions (apache#11065) Flink: Fix compile warning (apache#11072) Docs: Initial committer guidelines and requirements for merging (apache#10780) Core: Refactor ZOrderByteUtils (apache#10624) API: implement types timestamp_ns and timestamptz_ns (apache#9008) Build: Bump com.google.errorprone:error_prone_annotations (apache#11055) Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062) Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018) Kafka Connect: Disable publish tasks in runtime project (apache#11032) Flink: add unit tests for range distribution on bucket partition column (apache#11033) Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027) Core: Add benchmark for appending files (apache#11029) Build: Ignore benchmark output folders across all modules (apache#11030) Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846) Docs: bump latest version to 1.6.1 (apache#11036) OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024) Core: Generate realistic bounds in benchmarks (apache#11022) Add REST Compatibility Kit (apache#10908) Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009) Docs: Add Druid docs url to sidebar (apache#10997) ...
| flinkConf, | ||
| scanContext.limit(), | ||
| () -> { | ||
| List<IcebergSourceSplit> splits = planSplitsForBatch(planningThreadName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether it is intentional to modify the split list instance field in planSplitsForBatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intentional. See the comment for the method. We cache it as it will be reused later.
… mode (apache#10832) (cherry picked from commit 2ed61a1)
…ource (apache#11009) (cherry picked from commit bf00d51)
…h execution mode Backport apache#10832 This pr is to Backport infer source parallelism for [FLIP-27](https://jira.pinadmin.com/browse/FLIP-27) source in batch execution mode. Note: This is not a clean backport. RowDataConverter is not present in our forked version. So i had to add it to make this backport work.
No description provided.