-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: switch to use SortKey for data statistics #9212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...k/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceWithWatermarkExtractor.java
Outdated
Show resolved
Hide resolved
| import org.junit.Test; | ||
|
|
||
| public class TestDataStatisticsOperator { | ||
| private final Schema schema = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it worth to have an e2e test where we have a wider record and a narrower SortKey?
fa153d6 to
e594f9e
Compare
f2c7246 to
b13439b
Compare
| operator.processElement(new StreamRecord<>(GenericRowData.of(StringData.fromString("a")))); | ||
| operator.processElement(new StreamRecord<>(GenericRowData.of(StringData.fromString("b")))); | ||
| operator.processElement(new StreamRecord<>(genericRowDataA)); | ||
| operator.processElement(new StreamRecord<>(genericRowDataA)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we make something like this here:
genericRowDataA_2 = GenericRowData.of(StringData.fromString("a"), 2);
operator.processElement(new StreamRecord<>(genericRowDataA_2));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean not reusing the object? if yes, I would just get rid of the pre-constructed objects and always construct them on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - my midnight review was not clear enough. 😢
I was trying to suggest to process a different RowData object, with the same key, like:
private final GenericRowData genericRowDataA_1 = GenericRowData.of(StringData.fromString("a"), 1);
private final GenericRowData genericRowDataA_2 = GenericRowData.of(StringData.fromString("a"), 2);
private final GenericRowData genericRowDataB = GenericRowData.of(StringData.fromString("b"), 3);
[..]
operator.processElement(new StreamRecord<>(genericRowDataA_1));
operator.processElement(new StreamRecord<>(genericRowDataA_2));
operator.processElement(new StreamRecord<>(genericRowDataB));
I know that we have individual test for the correct grouping, but I consider this as an e2e tests, and it would be nice to test this out as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. for readability,I would just get rid of the pre-constructed objects and always construct them on the fly then
pvary
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, but this is really nice work.
Thanks, Steven!
|
thanks @pvary for the review |
…ata statistics
…ata statistics
| @Override | ||
| public void writeSnapshot(DataOutputView out) throws IOException { | ||
| Preconditions.checkState(schema != null, "Invalid schema: null"); | ||
| Preconditions.checkState(sortOrder != null, "Invalid sort order: null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Preconditions.checkNotNull to check if schema and sortOrder are null or not
Same for line 329, 330
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer to avoid chechNotNull, as the result is a NullPointerException. Generally NullPointerExceptions are something very unexpected, while IllegalArgumentExceptions are a different category for me (configuration errors, and such)
That is why, I did not flag this at the review, but - taking a second look - this could be strange for someone less biased
…ata statistics
…ata statistics
…ata statistics
…ata statistics
No description provided.