-
Notifications
You must be signed in to change notification settings - Fork 385
Robustify boto3 session handling (DynamoDB, RestCatalog) #2071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def __init__(self, **properties: str): | ||
| super().__init__() | ||
| self._properties = properties | ||
| self._boto_session = boto3.Session( | ||
| region_name=get_first_property_value(self._properties, AWS_REGION), | ||
| botocore_session=self._properties.get(BOTOCORE_SESSION), | ||
| aws_access_key_id=get_first_property_value(self._properties, AWS_ACCESS_KEY_ID), | ||
| aws_secret_access_key=get_first_property_value(self._properties, AWS_SECRET_ACCESS_KEY), | ||
| aws_session_toxken=get_first_property_value(self._properties, AWS_SESSION_TOKEN), | ||
| ) | ||
|
|
||
| def add_headers(self, request: PreparedRequest, **kwargs: Any) -> None: # pylint: disable=W0613 | ||
| boto_session = boto3.Session() | ||
| credentials = boto_session.get_credentials().get_frozen_credentials() | ||
| region = self._properties.get(SIGV4_REGION, boto_session.region_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko @kevinjqliu probably worth refactoring this class out to get some proper test coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jayceslesar yes, I think that makes sense. Let's do that in a separate PR 👍
|
thx for working on this :) |
pyproject.toml
Outdated
| glue = ["boto3", "mypy-boto3-glue"] | ||
| adlfs = ["adlfs"] | ||
| dynamodb = ["boto3"] | ||
| dynamodb = ["boto3", "mypy-boto3-dynamodb"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a dev-dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably -- the glue one isnt but I will move both over
pyiceberg/catalog/dynamodb.py
Outdated
| ) | ||
| self.dynamodb = session.client(DYNAMODB_CLIENT) | ||
| self.dynamodb_table_name = self.properties.get(DYNAMODB_TABLE_NAME, DYNAMODB_TABLE_NAME_DEFAULT) | ||
| self._ensure_catalog_table_exists_or_create() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to execute this when passing in an existing client?
pyiceberg/catalog/dynamodb.py
Outdated
| aws_session_token=get_first_property_value(properties, DYNAMODB_SESSION_TOKEN, AWS_SESSION_TOKEN), | ||
| ) | ||
| self.dynamodb = session.client(DYNAMODB_CLIENT) | ||
| self.dynamodb_table_name = self.properties.get(DYNAMODB_TABLE_NAME, DYNAMODB_TABLE_NAME_DEFAULT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also want to assign this one when passing in an existing client. Adding the methods to the class for completeness may also be a good thing to do as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they do exist (in the MetastoreCatalog) and were just never updated into this class. I will take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second look I dont know if we need to make a change _ensure_catalog_table_exists_or_create makes the dynamo table which is needed to make + interact with iceberg tables
Co-authored-by: Marko Grujic <[email protected]>
|
Thank you for fixing this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @jayceslesar for working on this, one minor comment that needs to be addressed, but apart from that, this looks great 🥳
poetry.lock
Outdated
| version = "2.44.0" | ||
| version = "2.46.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko possible causes of CI failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe need to add
"ignore::ResourceWarning",
"ignore::pytest.PytestUnraisableExceptionWarning",
To filterwarnings? idrk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we don't want to bump Ray as part of this PR. Maybe add the <=2.44.0 constraint to Ray for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
Finally green, thanks @jayceslesar for working on this 🙌 |
3 changes here... 1. Basically a mirror of apache#1920 but for dynamo 2. Allow users to pass in an existing client/pass initialize client with properties in `RestCatalog` with sigv4 enabled. (closes apache#2070, closes apache#2008) 3. Re-use of the client in `add_headers` in `RestCatalog` with sigv4. (closes apache#2069 ) --------- Co-authored-by: Marko Grujic <[email protected]>
3 changes here... 1. Basically a mirror of apache#1920 but for dynamo 2. Allow users to pass in an existing client/pass initialize client with properties in `RestCatalog` with sigv4 enabled. (closes apache#2070, closes apache#2008) 3. Re-use of the client in `add_headers` in `RestCatalog` with sigv4. (closes apache#2069 ) --------- Co-authored-by: Marko Grujic <[email protected]>
3 changes here...
RestCatalogwith sigv4 enabled. (closesRestCatalogwith sigv4 enabled doesn't pick up creds provided as arguments #2070, closes Allow to pass botocore session to REST Catalog (S3 Tables) like Glue Catalog does #2008)add_headersinRestCatalogwith sigv4. (closesRestCatalogwith sigv4 enabled instantiates a boto session per each client call #2069 )