Skip to content

AWS: Add integration with Glue catalog extensions for Amazon SageMaker Lakehouse #11692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sachet-saurabh
Copy link

This PR adds integration with Glue extensions for Iceberg, to enable access to the new Glue multi-catalog hierarchy in the Amazon SageMaker Lakehouse, as announced in the CEO Keynote with Matt Garman during AWS re:Invent 2024. This will allow access to read non-Iceberg data sources through Iceberg GlueCatalog, starting with Amazon Redshift data right away, and will be extended to many more federated data sources in subsequent releases of the Glue service in 2025.

The extensions library is used to communicate with the Glue catalog extensions API is another set of APIs in OpenAPI specification that we have added and can be viewed as extensions to the Glue Catalog APIs and Glue IRC APIs to enable additional functionalities that we have been discussing in open source Iceberg IRC spec for scan planning, fine-grained data commit and long-running transaction supports. As we continue to discuss these IRC features in open source, AWS Glue customers will be able to use these features early with all AWS analytics service integrations through this extensions API that is integrated with the GlueCatalog component in Iceberg.

@jackye1995 @yyanyy

@@ -189,6 +208,20 @@ void initialize(
this.warehousePath = Strings.isNullOrEmpty(path) ? null : LocationUtil.stripTrailingSlash(path);
this.glue = client;
this.lockManager = lock;
this.extensionsEnabled =
PropertyUtil.propertyAsBoolean(
catalogProperties, GlueExtensionsProperties.GLUE_EXTENSIONS_ENABLED, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're defaulting the extensions to be enabled, but does this require the glue bundle/runtime to work? Why do we need to disable it for all of the tests?

@danielcweeks
Copy link
Contributor

danielcweeks commented Dec 4, 2024

@sachet-saurabh Can you share a little more context around how the extensions are intended to work? It appears that if extensions are enabled we're actually bypassing the Iceberg glue implementation. Does any of the extension capability work without the glue extension bundle and glue spark runtime?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sachet-saurabh.

Much of this implementation of the catalog resides in a dependency and the Iceberg project is taking a dependency on this. My primary concern is that since much of the implementation is in this dependency, in the current approach every dependency upgrade to Iceberg would still need to be reviewed to make sure the catalog implementation is correct and safe to be consumed in Iceberg. Having much of the critical path for a catalog implementation be in a dependency and having to review that does not seem like a scalable way to go about things in Iceberg.

The way I look at this, there are 2 paths:

  1. We work to get the catalog implementation that resides in the Glue extensions and makes sense for the Iceberg project into Iceberg itself.

  2. If it's desired to have the catalog extensions be separate, that can be a separate project which publishes its own jar with the catalog implementations.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 11, 2025
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants