Skip to content

Conversation

@ebelgasmi12
Copy link

[AWS] S3FileIO - Add Cross-Region Bucket Access

Made corresponding updates to main and test.

Resolves #9785

CC @nastra

ArgumentCaptor.forClass(S3Configuration.class);

Mockito.doReturn(mockA).when(mockA).dualstackEnabled(Mockito.anyBoolean());
Mockito.doReturn(mockA).when(mockA).crossRegionAccessEnabled(Mockito.anyBoolean());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add Assertions.assertThat(s3Configuration.crossRegionAccessEnabled()).isTrue() at the bottom of this test method

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra
crossRegionAccessEnabled is an S3Client attribute rather than S3Configuration (same as dualstackEnabled).
Similar to dualstackEnabled attribute (not asserted in this test method), s3Configuration.crossRegionAccessEnabled() does not exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately I think getting the value of the attributes from the client isn't possible without doing some reflection hacks.

What I think we can do is add S3 integration tests though to verify the actual cross region access behavior. That seems like a more useful test to verify that when the property is set we can perform operations across regions as expected. cc @jackye1995 @geruh @rahil-c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for integ test, there is already a AWS_CROSS_REGION config there that can be used

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a AWS_TEST_CROSS_REGION_BUCKET config (although originally added for S3 Access Point tests).
Can anybody help / provide guidance on this feature's integration tests? Thanks.

@github-actions github-actions bot added the docs label Feb 26, 2024
* Determines if S3 client will allow Cross-Region bucket access, default to false.
*
* <p>For more details, see
* https://docs.aws.amazon.com/AmazonS3/latest/userguide/dual-stack-endpoints.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a wrong doc link?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, fixed.

docs/docs/aws.md Outdated

S3 Cross-Region bucket access can be turned on by setting catalog property `s3.cross-region-access-enabled` to `true`.
This is turned off by default to avoid first S3 API call increased latency.
For more details, please refer to [Cross-Region access for Amazon S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow the convention of the other sections and add an example here?

Something like:

spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \
    --conf spark.sql.catalog.my_catalog.type=glue \
    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.my_catalog.s3.cross-region-access-enabled=true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the same as in S3 Access Control List and and S3 Write Checksum Verification sections, since it's a single parameter configuration. But I can definitely change it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc updated.

@sfc-gh-schen
Copy link

Hi, what is blocking from merging this change?

@munendrasn
Copy link
Contributor

+1 to adding this support

@ebelgasmi12 @nastra @amogh-jahagirdar @jackye1995
Can we please consider adding this property as part of RESTCatalog spec too for S3?

@singhpk234
Copy link
Contributor

sounds like a resonable change to add, for adding an integ test, which is what this pr is pending please check this pr

@munendrasn
Copy link
Contributor

@elmehdibelgasmi
Based on the above, I have added sample test for cross-region, and post rebasing. I have created a PR #11259 (to see if the tests succeed in CI env) based on top of the changes in this PR. Please let me know if it looks good, I will create PR to against your fork.
This feature would unblock us, hence trying to close the PR. Apologies in advance, If I have overstepped.

@munendrasn
Copy link
Contributor

@singhpk234 Created separate PR for Spec update #11260

@github-actions
Copy link

github-actions bot commented Nov 5, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 5, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3FileIO does not support Iceberg Cross-Region API Calls to Amazon S3 buckets

9 participants