Skip to content

Conversation

@CatOrLeader
Copy link
Contributor

implement automatic replacement of UUID type columns with the FixedType[16]

Why are the changes needed?

There is an issue with Amoro's attempt to optimise tables in Iceberg when using partitions or buckets based on a UUID key. These changes FIX #3796 and do not corrupt any data

Brief change log

  • Added conversion of UUID type in the Iceberg table to the FixedType[16]
  • Conversion is internal only so there is no side effect on the data in the table

How was this patch tested?

  • The patch was tested manually in the production environment of the company I currently work in – the new. jar file of amoro-format-iceberg (with the changes) replace the original one

  • Optimisers work correctly, and table files are optimised successfully.

Documentation

  • Does this pull request introduce a new feature? No
  • If yes, how is the feature documented? Not applicable

implement automatic replacement of UUID type columns with the FixedType[16]
Copy link
Contributor

@xxubai xxubai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this issue.

The UUID records in the data file are persisted as binary. If I understand correctly, the reader cannot directly cast binary to UUID.

You converted the UUID type to a fixed-length 16-byte binary in the schema, which makes sense. But I wonder if there is a more elegant way to fix this, ideally without intrusive changes to the writer code.

@CatOrLeader
Copy link
Contributor Author

@xxubai

Hello! Thanks for your answer; I appreciate it a lot! About the issues you raised:

I was trying to find any other part of the code that can be fixed to resolve this issue; however, it is directly connected to the original Iceberg library, for example:

It is not fixed yet, but in my opinion, it is necessary to give users the ability to have UUID as a partition key even now because in some way it can be more optimised than other types.

Additionally, we can see a similar issue with the UUID in the libraries that use Iceberg:

I am open to any other possibility to fix this issue in the Amoro code; however, for now it seems like this dirty way of casting classes is the single possibility to resolve the issue on the Amoro code side

@zhoujinsong
Copy link
Contributor

@CatOrLeader
Copy link
Contributor Author

@zhoujinsong Hello! Yep, it is necessary to fix it in the iceberg library; however, it seems like we need to wait a lot of time while the bug is fixed. Additionally, when the iceberg library is updated, we need to bump the version.

However, it will be great if there exists a process that will keep my changes at least while a new Iceberg version with corrected code will be included in the amoro

@xxubai
Copy link
Contributor

xxubai commented Sep 28, 2025

@zhoujinsong Hello! Yep, it is necessary to fix it in the iceberg library; however, it seems like we need to wait a lot of time while the bug is fixed. Additionally, when the iceberg library is updated, we need to bump the version.

Hi @CatOrLeader . Thanks for the contribution!

I directly tried modifying InternalRecordWrapper to add the corresponding converter in order to fix this issue (https://github.com/apache/iceberg/pull/14208/files#diff-3be4a9b849bc3dad71d9224f6d8f1b70ac5d6e36db9a1e608273c5df3e127a7c).

I think we could follow @zhoujinsong 's suggestion and create a copy class inside Amoro to replace the current InternalRecordWrapper(Such as src/main/java/org/apache/iceberg/data/IcebergRecordWrapper.java).
Later, once this issue is fixed upstream, we could revert to the native class. What do you think?

@CatOrLeader
Copy link
Contributor Author

@xxubai Hi! I agree with you and @zhoujinsong that this method is more elegant and should be applied. Thank you for the expertise!

I will reflect the conclusions in the code and tag you again when it will be ready and tested in out environment. Is it ok?

Also, thank you for contributing to the iceberg library to fix this issue; I appreciate it

Artur.Mukhutdinov added 3 commits September 28, 2025 22:47
add more elegant way to handle the issue with the Iceberg tables and UUID type
add a little change to `converter` method to always return an instance of `IcebergRecordWrapper` class
Copy link
Contributor

@xxubai xxubai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 56.75676% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 21.99%. Comparing base (43c2ee1) to head (39052f2).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
.../org/apache/iceberg/data/IcebergRecordWrapper.java 55.55% 14 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3797      +/-   ##
============================================
+ Coverage     21.96%   21.99%   +0.02%     
- Complexity     2431     2442      +11     
============================================
  Files           441      442       +1     
  Lines         40790    40826      +36     
  Branches       5756     5758       +2     
============================================
+ Hits           8961     8981      +20     
- Misses        31078    31092      +14     
- Partials        751      753       +2     
Flag Coverage Δ
trino 21.99% <56.75%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhoujinsong zhoujinsong merged commit 208399f into apache:master Sep 29, 2025
6 checks passed
@zhoujinsong
Copy link
Contributor

Thanks for the work! @CatOrLeader
Thanks for the review! @xxubai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: UUID Issue from native iceberg library

4 participants