-
Notifications
You must be signed in to change notification settings - Fork 3k
[KafkaConnect] Fix RecordConverter for UUID and Fixed Types #11346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cd14e69 to
84730e5
Compare
744d931 to
718f62c
Compare
a5f1bc6 to
6727670
Compare
...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java
Outdated
Show resolved
Hide resolved
orc/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
Outdated
Show resolved
Hide resolved
6727670 to
f964819
Compare
b18b9e3 to
67fcc0f
Compare
2986a26 to
4e876ac
Compare
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
|
|
||
| public class RecordConverterTest { | ||
| @ExtendWith(ParameterizedTestExtension.class) | ||
| public class RecordConverterTest extends BaseWriterTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping we'd keep this test specific to the conversion functions, and keep writer tests separate. Do you have thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking in lines of conversion functions are no longer format agnostic as we are adding format info into deciding the record conversion, hence though it would be fair to test this E2E here.
please let me know your thoughts considering above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can create a dedicated test class for writer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have this parameterized by file type? The tests here make sense to me but I am only looking at this module for the first time for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah iI see, the format only comes into play for UUID so the other parameterizations are essentially no-ops. Perhaps we just need one specialized test then ' testParquetUUIDSerialization"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added this test, Thanks for suggesting this !
I also do think we need an E2E with write I can take this test as a followup to this pr as it would require refactoring of the writer tests.
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java
Outdated
Show resolved
Hide resolved
ae03796 to
83e5393
Compare
...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java
Outdated
Show resolved
Hide resolved
...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java
Outdated
Show resolved
Hide resolved
|
This change looks good, I'm just wondering about the test. I would have kept the original test and create a new one dedicated for the writer. |
...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java
Outdated
Show resolved
Hide resolved
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me on the fix side but I agree with the others than we need to adjust the tests to be a bit more specific to this fix I think?
947c01e to
aaa8b33
Compare
|
Thanks @singhpk234 for the PR and @jbonofre, @bryanck and @ajantha-bhat For Review! |
|
When writing UUIDs, should we handle the conversion directly within BaseParquetWriter, by modifying BaseParquetWriter#primitive to check if the LogicalTypeAnnotation is a UUID and then use a UUIDWriter for writing it, instead of performing the conversion based on the file type before writing? @Override
public ParquetValueWriter<?> primitive(PrimitiveType primitive) {
....
switch (primitive.getPrimitiveTypeName()) {
case FIXED_LEN_BYTE_ARRAY:
if (LogicalTypeAnnotation.uuidType().equals(primitive.getLogicalTypeAnnotation())) {
return new UUIDWriter(desc);
}
return new FixedWriter(desc);
...
}
}
private static class UUIDWriter extends ParquetValueWriters.PrimitiveWriter<UUID> {
private UUIDWriter(ColumnDescriptor desc) {
super(desc);
}
@Override
public void write(int repetitionLevel, UUID value) {
column.writeBinary(repetitionLevel, Binary.fromReusedByteArray(UUIDUtil.convert(value)));
}
}Similar to the approach taken in #7399 in Apache Iceberg. |
About the change
The UUID type in the parquet writer expects ByteBuffer rather than UUID otherwise writer fails with :
The FixedLength would need byteArray rather than ByteBuffer otherwise one get this error
Testing
Added new tests
cc @bryanck