-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS: Add support for s3 access points #4334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2985f72 to
8e85276
Compare
|
FYI @flyrain @anuragmantri, this is based on our discussion in https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1645066803099319 |
anuragmantri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, pending integration tests requested by @jackye1995.
Thanks for adding this @singhpk234
4af17c5 to
d123c94
Compare
aws/src/integration/java/org/apache/iceberg/aws/AwsIntegTestUtil.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
Show resolved
Hide resolved
80a82d8 to
cf04f67
Compare
aws/src/integration/java/org/apache/iceberg/aws/AwsIntegTestUtil.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testNewInputStreamWithCrossRegionAccessPoint() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a side note that in aws/aws-sdk-java-v2@51632cb, a new config multiRegionEnabled is added, which defaults to true and controls cross-region access when using MRAP. The default works for us, so we probably don't need to add it for now.
4209080 to
8535e82
Compare
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
aws/src/main/java/org/apache/iceberg/aws/AssumeRoleAwsClientFactory.java
Outdated
Show resolved
Hide resolved
| ); | ||
| S3URI uri1 = new S3URI(p1, bucketToAccessPointMapping); | ||
|
|
||
| assertEquals("access-point", uri1.bucket()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be any validation that the access point is in a valid format? Or is that handled by the SDK and we don't want to duplicate logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be 2 possiblities:
- an ARN, something like
arn:aws:s3:region:111122223333:accesspoint/my-access-point - an alias, something like
my-access-point-aqfqprnstn7aefdfbarligizwgyfouse1a-s3alias
because of 2, the string can really be anything that satisfies the bucket regex, so it's a bit hard to validate on service side, given the fact that we do not really check for normal bucket name correctness either. I think it should be fine to delegate to the S3 service to check for name correctness.
2eea753 to
63d1e8a
Compare
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for addressing all the comments, @rdblue I think all the comments are fixed, let us know if there is any more concern, thanks!
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
Outdated
Show resolved
Hide resolved
5f3ad24 to
bb31773
Compare
|
This looks good to me, and I think there has been no comment for a few days, I will merge it to unblock other feature contributions, and for people to test out the access point usages. @rdblue @rajarshisarkar let us know if you have any additional concern, we can address before the upcoming release. |
This tries to add support of s3 access points
ref : https://aws.amazon.com/s3/features/access-points/
spark-shell \ --conf spark.sql.catalog.test=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.test.warehouse=s3://my-bucket \ --conf spark.sql.catalog.test.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \ --conf spark.sql.catalog.test.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ --conf spark.sql.catalog.test.s3.access-points.my-bucket=arn:aws:s3::123456789012:accesspoint:mfzwi23gnjvgw.mrapThe s3.access-points config provides a map of bucket to endpoint mapping for paths stored in Iceberg, so that the S3FileIO can use the access point to get the bucket instead when the mapping is configured.
cc: @jackye1995 @rajarshisarkar @arminnajafi @amogh-jahagirdar @xiaoxuandev @yyanyy