Skip to content

Add S3 security mapping for native S3#22559

Merged
electrum merged 6 commits intotrinodb:masterfrom
electrum:s3security
Jul 30, 2024
Merged

Add S3 security mapping for native S3#22559
electrum merged 6 commits intotrinodb:masterfrom
electrum:s3security

Conversation

@electrum
Copy link
Copy Markdown
Member

@electrum electrum commented Jul 2, 2024

Description

Release notes

(x) Release notes are required, with the following suggested text:

# Object storage
* Add S3 security mapping for native S3. ({issue}`22559`)

@cla-bot cla-bot bot added the cla-signed label Jul 2, 2024
@electrum
Copy link
Copy Markdown
Member Author

electrum commented Jul 2, 2024

@mosabua This is compatible with the legacy S3 feature and uses the same config properties (minus the hive. prefix). How do you want to handle the docs? Should we just copy/paste?

@electrum electrum requested review from anusudarsan and dain July 2, 2024 08:18
Copy link
Copy Markdown
Member

@nineinchnick nineinchnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the tests based on some existing ones in the legacy FS? I haven't reviewed them for coverage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add @NotNull to getConfigFilePath()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed because the code directly returns a non-null Optional. The validation annotations are for the Bean Validation framework to validate user-supplied values.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be checked in the config class?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use @FileExists because the file path can also be a URL. But we could use @AssertTrue. I'll add that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm going to make two configs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easier to review if moving of this code to the loader was extracted to a separate commit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, and tried to do that originally, but the code is restructured significantly. I didn't see a good way to do a clean refactoring.

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jul 2, 2024

@mosabua This is compatible with the legacy S3 feature and uses the same config properties (minus the hive. prefix). How do you want to handle the docs? Should we just copy/paste?

Yeah .. we can just have it in the new docs as copy.. then cleaning up later will be easier. You want to do that or should I send a separate PR @electrum ?

@electrum
Copy link
Copy Markdown
Member Author

electrum commented Jul 2, 2024

I copied over the documentation and cleaned it up a bit.

Copy link
Copy Markdown
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of doc suggestions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to say what type of regex?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should explain that the examples uses an OR and therefore matches for alice and bob

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we go into this level of detail in the other documentation? Note that this documentation is copied over from the original, so we could do a more major cleanup as a separate task.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should .. just have not gotten around to fixing that up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow-up with the docs improvement after this PR is merged.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path to the JSON

relative to ? or absolute? or both

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be relative or absolute. I don't think we discuss "relative to what" in every place that uses additional files, but it's relative to the server configuration directory. Do we have a place that discusses that to link to?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we have a generic place .. maybe just go with Absolute or relative path to the JSON configuration file.

@electrum electrum force-pushed the s3security branch 2 times, most recently from 0e0743d to 18b0e6f Compare July 2, 2024 21:18
@electrum
Copy link
Copy Markdown
Member Author

electrum commented Jul 2, 2024

@mosabua Thanks for the great review of the documentation. I believe I've addressed all of your comments.

@electrum
Copy link
Copy Markdown
Member Author

electrum commented Jul 2, 2024

@nineinchnick The tests are based on the legacy S3 support.

TestS3SecurityMappinghas extensive tests to verify that we produce the correctS3SecurityMappingResult` record, which is used to configure the S3 client.

What is not tested, either here or in the legacy version, is that the client configuration is correct. We would like to verify that the client is making the correct requests, i.e. using the correct access key, IAM role, KMS key ID, etc. This is what I investigated:

  • Tracing: Doesn't work because AWS SDK tracing support for STS doesn't include the access key, role, etc.
  • LocalStack: Create users and roles, then setup S3 policies such that different objects are only accessible to certain users or roles. Test via side effect that the correct values are sent -- the requests should fail if the wrong values are sent. I see that LocalStack supports IAM and STS, but I don't know if their security implementation is complete enough for this.
  • Mocking: Use MockWebServer to validate requests and send mock responses for STS and S3. I have an initial version of this working, but didn't have time to complete it.

@anusudarsan
Copy link
Copy Markdown
Member

@electrum can we merge this PR, and the integration tests can be a followup?

@electrum electrum requested a review from wendigo July 15, 2024 22:44
@electrum
Copy link
Copy Markdown
Member Author

@anusudarsan yes, I don’t plan to work on integration tests

import static java.util.concurrent.Executors.newCachedThreadPool;

final class S3FileSystemLoader
implements Function<Location, TrinoFileSystemFactory>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be a dedicated interface

Copy link
Copy Markdown
Member Author

@electrum electrum Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can introduce one later if it's need

Location location,
Executor uploadExecutor)
{
this.mappingProvider = mappingProvider;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rnn mappingProvider

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed that

private static Supplier<S3SecurityMappings> mappingsProvider(Supplier<S3SecurityMappings> supplier, Optional<Duration> refreshPeriod)
{
return refreshPeriod
.map(refresh -> Suppliers.memoizeWithExpiration(supplier::get, refresh.toMillis(), MILLISECONDS))
Copy link
Copy Markdown
Contributor

@wendigo wendigo Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import: memoize*

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Jul 16, 2024

@electrum let's merge it and follow-up with code and test improvements

- External ID for the IAM role trust policy when connecting to S3.
:::

## Security mapping
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this "copy from legacy docs" stuff to a separate commit

@electrum electrum merged commit 90ea793 into trinodb:master Jul 30, 2024
@electrum electrum deleted the s3security branch July 30, 2024 21:04
@github-actions github-actions bot added this to the 454 milestone Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants