Skip to content

Conversation

@jayceslesar
Copy link
Contributor

@jayceslesar jayceslesar commented Jun 7, 2025

3 changes here...

  1. Basically a mirror of Adds support for creating a GlueCatalog with own client #1920 but for dynamo
  2. Allow users to pass in an existing client/pass initialize client with properties in RestCatalog with sigv4 enabled. (closes RestCatalog with sigv4 enabled doesn't pick up creds provided as arguments #2070, closes Allow to pass botocore session to REST Catalog (S3 Tables) like Glue Catalog does #2008)
  3. Re-use of the client in add_headers in RestCatalog with sigv4. (closes RestCatalog with sigv4 enabled instantiates a boto session per each client call #2069 )

@jayceslesar jayceslesar changed the title Adds support for creating a DynamoDB with own client Robustify boto3 session handling (DynamoDB, RestCatalog) Jun 8, 2025
Comment on lines 392 to 397
def __init__(self, **properties: str):
super().__init__()
self._properties = properties
self._boto_session = boto3.Session(
region_name=get_first_property_value(self._properties, AWS_REGION),
botocore_session=self._properties.get(BOTOCORE_SESSION),
aws_access_key_id=get_first_property_value(self._properties, AWS_ACCESS_KEY_ID),
aws_secret_access_key=get_first_property_value(self._properties, AWS_SECRET_ACCESS_KEY),
aws_session_toxken=get_first_property_value(self._properties, AWS_SESSION_TOKEN),
)

def add_headers(self, request: PreparedRequest, **kwargs: Any) -> None: # pylint: disable=W0613
boto_session = boto3.Session()
credentials = boto_session.get_credentials().get_frozen_credentials()
region = self._properties.get(SIGV4_REGION, boto_session.region_name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko @kevinjqliu probably worth refactoring this class out to get some proper test coverage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jayceslesar yes, I think that makes sense. Let's do that in a separate PR 👍

@rudolfix
Copy link

rudolfix commented Jun 9, 2025

thx for working on this :)

pyproject.toml Outdated
glue = ["boto3", "mypy-boto3-glue"]
adlfs = ["adlfs"]
dynamodb = ["boto3"]
dynamodb = ["boto3", "mypy-boto3-dynamodb"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a dev-dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably -- the glue one isnt but I will move both over

)
self.dynamodb = session.client(DYNAMODB_CLIENT)
self.dynamodb_table_name = self.properties.get(DYNAMODB_TABLE_NAME, DYNAMODB_TABLE_NAME_DEFAULT)
self._ensure_catalog_table_exists_or_create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to execute this when passing in an existing client?

aws_session_token=get_first_property_value(properties, DYNAMODB_SESSION_TOKEN, AWS_SESSION_TOKEN),
)
self.dynamodb = session.client(DYNAMODB_CLIENT)
self.dynamodb_table_name = self.properties.get(DYNAMODB_TABLE_NAME, DYNAMODB_TABLE_NAME_DEFAULT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to assign this one when passing in an existing client. Adding the methods to the class for completeness may also be a good thing to do as part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they do exist (in the MetastoreCatalog) and were just never updated into this class. I will take a look

Copy link
Contributor Author

@jayceslesar jayceslesar Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second look I dont know if we need to make a change _ensure_catalog_table_exists_or_create makes the dynamo table which is needed to make + interact with iceberg tables

@CharlesAttend
Copy link

Thank you for fixing this!

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jayceslesar for working on this, one minor comment that needs to be addressed, but apart from that, this looks great 🥳

poetry.lock Outdated
Comment on lines 4494 to 4738
version = "2.44.0"
version = "2.46.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko possible causes of CI failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to add

    "ignore::ResourceWarning",
    "ignore::pytest.PytestUnraisableExceptionWarning",

To filterwarnings? idrk

ray-project/ray#10279

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we don't want to bump Ray as part of this PR. Maybe add the <=2.44.0 constraint to Ray for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@Fokko Fokko merged commit 38ebb19 into apache:main Jun 11, 2025
10 checks passed
@Fokko
Copy link
Contributor

Fokko commented Jun 11, 2025

Finally green, thanks @jayceslesar for working on this 🙌

amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
3 changes here...

1. Basically a mirror of
apache#1920 but for dynamo
2. Allow users to pass in an existing client/pass initialize client with
properties in `RestCatalog` with sigv4 enabled. (closes apache#2070, closes
apache#2008)
3. Re-use of the client in `add_headers` in `RestCatalog` with sigv4.
(closes apache#2069 )

---------

Co-authored-by: Marko Grujic <[email protected]>
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
3 changes here...

1. Basically a mirror of
apache#1920 but for dynamo
2. Allow users to pass in an existing client/pass initialize client with
properties in `RestCatalog` with sigv4 enabled. (closes apache#2070, closes
apache#2008)
3. Re-use of the client in `add_headers` in `RestCatalog` with sigv4.
(closes apache#2069 )

---------

Co-authored-by: Marko Grujic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants