-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS: Add parameter of excluding non-current fields in Glue #12664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Add parameter of excluding non-current fields in Glue #12664
Conversation
e216cf9 to
c65ca7f
Compare
c65ca7f to
1b83550
Compare
|
This change fixes an issue we have faced during some time in our stack. |
|
Displaying non-current columns is intentional in Glue, as users may use LakeFormation and need to access dropped columns. Users should not rely on Glue for the latest table status, Iceberg metadata should always be considered the source of truth. |
Hello Xiaoyuan 👋,
To be clear, this PR doesn't aim to change the default behavior, but rather to add a new configuration option that would allow users to choose whether non-current columns are displayed. This provides flexibility for both use cases:
Would you be open to considering this approach since it preserves the existing functionality while adding an option for users with different requirements? |
|
@duoxoud I still believe this change isn't necessary and could break LF integration. For your use case, could you explain why end users rely solely on the latest Glue table status? This approach isn't reliable, since the Glue schema can be modified independently, while the Iceberg schema might remain unchanged. Even if they do depend on Glue, it's still possible to filter out the column using |
|
Hello @jackye1995 👋, do you happen to have time to check this PR? Many thanks 🫡 |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
(Reopen #11334)
Closes #7584
This PR addresses a feature request for improving the Glue Schema generation process. It introduces a new configuration option that allows users to exclude non-current fields from the Glue Schema, providing clarity and reducing confusion for Athena users who primarily query current data.
In PR #3888, the Glue schema generation was modified to include all historical fields. This was intended to help users recognize previously used columns and avoid duplicating column names. However, in practice, this approach has led to confusion among users (for example, the same issue explained in #7584 ).
The current behaviour remains unchanged.
(introduced
GLUE_NON_CURRENT_FIELDS_DISABLED_DEFAULT = falseto keep the current behaviour)