[KafkaConnect] Fix RecordConverter for UUID and Fixed Types #11346

singhpk234 · 2024-10-17T22:14:18Z

About the change

The UUID type in the parquet writer expects ByteBuffer rather than UUID otherwise writer fails with :

class java.util.UUID cannot be cast to class [B (java.util.UUID and [B are in module java.base of loader 'bootstrap')

The FixedLength would need byteArray rather than ByteBuffer otherwise one get this error

class java.nio.HeapByteBuffer cannot be cast to class [B (java.nio.HeapByteBuffer and [B are in module java.base of loader 'bootstrap')

Testing

Added new tests

cc @bryanck

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java

orc/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java

kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java

bryanck · 2024-10-22T20:05:34Z

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java


-public class RecordConverterTest {
+@ExtendWith(ParameterizedTestExtension.class)
+public class RecordConverterTest extends BaseWriterTest {


I was hoping we'd keep this test specific to the conversion functions, and keep writer tests separate. Do you have thoughts on that?

Was thinking in lines of conversion functions are no longer format agnostic as we are adding format info into deciding the record conversion, hence though it would be fair to test this E2E here.

please let me know your thoughts considering above.

Maybe we can create a dedicated test class for writer ?

Shouldn't we have this parameterized by file type? The tests here make sense to me but I am only looking at this module for the first time for this PR.

Ah iI see, the format only comes into play for UUID so the other parameterizations are essentially no-ops. Perhaps we just need one specialized test then ' testParquetUUIDSerialization"

Sure, added this test, Thanks for suggesting this !
I also do think we need an E2E with write I can take this test as a followup to this pr as it would require refactoring of the writer tests.

kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java

jbonofre · 2024-10-23T15:19:07Z

This change looks good, I'm just wondering about the test. I would have kept the original test and create a new one dedicated for the writer.

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java

RussellSpitzer

This looks good to me on the fix side but I agree with the others than we need to adjust the tests to be a bit more specific to this fix I think?

RussellSpitzer · 2024-10-25T20:40:37Z

Thanks @singhpk234 for the PR and @jbonofre, @bryanck and @ajantha-bhat For Review!

Gezi-lzq · 2024-11-28T02:49:42Z

When writing UUIDs, should we handle the conversion directly within BaseParquetWriter, by modifying BaseParquetWriter#primitive to check if the LogicalTypeAnnotation is a UUID and then use a UUIDWriter for writing it, instead of performing the conversion based on the file type before writing?

@Override
public ParquetValueWriter<?> primitive(PrimitiveType primitive) {
    ....
    switch (primitive.getPrimitiveTypeName()) {
        case FIXED_LEN_BYTE_ARRAY:
            if (LogicalTypeAnnotation.uuidType().equals(primitive.getLogicalTypeAnnotation())) {
                return new UUIDWriter(desc);
            }
            return new FixedWriter(desc);

            ...
    }
}

private static class UUIDWriter extends ParquetValueWriters.PrimitiveWriter<UUID> {
    private UUIDWriter(ColumnDescriptor desc) {
        super(desc);
    }

    @Override
    public void write(int repetitionLevel, UUID value) {
        column.writeBinary(repetitionLevel, Binary.fromReusedByteArray(UUIDUtil.convert(value)));
    }
}

Similar to the approach taken in #7399 in Apache Iceberg.
@singhpk234 @bryanck @openinx @RussellSpitzer

…1346)

github-actions bot added the KAFKACONNECT label Oct 17, 2024

singhpk234 marked this pull request as draft October 17, 2024 22:16

[KafkaConnect] Fix RecordConverter

84730e5

singhpk234 force-pushed the fix/type-conversion branch from cd14e69 to 84730e5 Compare October 17, 2024 23:11

update E2E test

718f62c

singhpk234 force-pushed the fix/type-conversion branch from 744d931 to 718f62c Compare October 18, 2024 01:25

change test

6e2cb73

github-actions bot added the ORC label Oct 18, 2024

singhpk234 force-pushed the fix/type-conversion branch from a5f1bc6 to 6727670 Compare October 18, 2024 15:37

singhpk234 commented Oct 18, 2024

View reviewed changes

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java Outdated Show resolved Hide resolved

orc/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java Outdated Show resolved Hide resolved

singhpk234 changed the title ~~[KafkaConnect] Fix RecordConverter~~ [KafkaConnect] Fix RecordConverter for UUID and Fixed Types Oct 18, 2024

singhpk234 marked this pull request as ready for review October 18, 2024 15:38

make all the formats pass

f964819

singhpk234 force-pushed the fix/type-conversion branch from 6727670 to f964819 Compare October 18, 2024 16:23

github-actions bot removed the ORC label Oct 18, 2024

remove orc test for now

67fcc0f

singhpk234 force-pushed the fix/type-conversion branch from b18b9e3 to 67fcc0f Compare October 18, 2024 17:16

github-actions bot added the build label Oct 22, 2024

add more tests and revert todo

4e876ac

singhpk234 force-pushed the fix/type-conversion branch from 2986a26 to 4e876ac Compare October 22, 2024 19:26

RussellSpitzer added this to the Iceberg 1.7.0 milestone Oct 22, 2024

RussellSpitzer reviewed Oct 22, 2024

View reviewed changes

kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java Outdated Show resolved Hide resolved

bryanck reviewed Oct 22, 2024

View reviewed changes

RussellSpitzer reviewed Oct 22, 2024

View reviewed changes

kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java Outdated Show resolved Hide resolved

bryanck reviewed Oct 22, 2024

View reviewed changes

kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java Outdated Show resolved Hide resolved

Address review feedback

83e5393

singhpk234 force-pushed the fix/type-conversion branch from ae03796 to 83e5393 Compare October 22, 2024 20:28

ajantha-bhat reviewed Oct 23, 2024

View reviewed changes

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java Outdated Show resolved Hide resolved

ajantha-bhat reviewed Oct 23, 2024

View reviewed changes

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java Outdated Show resolved Hide resolved

fix missed part of code

715269e

RussellSpitzer reviewed Oct 24, 2024

View reviewed changes

...connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java Outdated Show resolved Hide resolved

RussellSpitzer approved these changes Oct 24, 2024

View reviewed changes

Prashant Singh added 2 commits October 24, 2024 15:12

Add parquet uuid test

9e6bb4b

remove dependency

aaa8b33

singhpk234 force-pushed the fix/type-conversion branch from 947c01e to aaa8b33 Compare October 25, 2024 14:28

RussellSpitzer merged commit 9ecd97b into apache:main Oct 25, 2024

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

[KafkaConnect] Fix RecordConverter for UUID and Fixed Types (apache#1…

7f17587

…1346)

Gezi-lzq mentioned this pull request Sep 17, 2025

fix(dependencies): update iceberg version to 1.9.2 AutoMQ/automq#2843

Closed

[KafkaConnect] Fix RecordConverter for UUID and Fixed Types #11346

[KafkaConnect] Fix RecordConverter for UUID and Fixed Types #11346

Uh oh!

Conversation

singhpk234 commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About the change

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bryanck Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

singhpk234 Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

jbonofre Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

singhpk234 Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbonofre commented Oct 23, 2024

Uh oh!

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer commented Oct 25, 2024

Uh oh!

Gezi-lzq commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

singhpk234 commented Oct 17, 2024 •

edited

Loading

RussellSpitzer Oct 24, 2024 •

edited

Loading

Gezi-lzq commented Nov 28, 2024 •

edited

Loading