Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions docs/src/main/sphinx/object-storage/file-system-s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,140 @@ and secret keys, STS, or an IAM role:
* - `s3.external-id`
- External ID for the IAM role trust policy when connecting to S3.
:::

## Security mapping
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this "copy from legacy docs" stuff to a separate commit


Trino supports flexible security mapping for S3, allowing for separate
credentials or IAM roles for specific users or S3 locations. The IAM role
for a specific query can be selected from a list of allowed roles by providing
it as an *extra credential*.

Each security mapping entry may specify one or more match criteria.
If multiple criteria are specified, all criteria must match.
The following match criteria are available:

- `user`: Regular expression to match against username. Example: `alice|bob`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to say what type of regex?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should explain that the examples uses an OR and therefore matches for alice and bob

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we go into this level of detail in the other documentation? Note that this documentation is copied over from the original, so we could do a more major cleanup as a separate task.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should .. just have not gotten around to fixing that up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow-up with the docs improvement after this PR is merged.

- `group`: Regular expression to match against any of the groups that the user
belongs to. Example: `finance|sales`
- `prefix`: S3 URL prefix. You can specify an entire bucket or a path within a
bucket. The URL must start with `s3://` but also matches for `s3a` or `s3n`.
Example: `s3://bucket-name/abc/xyz/`

The security mapping must provide one or more configuration settings:

- `accessKey` and `secretKey`: AWS access key and secret key. This overrides
any globally configured credentials, such as access key or instance credentials.
- `iamRole`: IAM role to use if no user provided role is specified as an
extra credential. This overrides any globally configured IAM role. This role
is allowed to be specified as an extra credential, although specifying it
Comment thread
electrum marked this conversation as resolved.
Outdated
explicitly has no effect.
- `roleSessionName`: Optional role session name to use with `iamRole`. This can only
be used when `iamRole` is specified. If `roleSessionName` includes the string
`${USER}`, then the `${USER}` portion of the string is replaced with the
current session's username. If `roleSessionName` is not specified, it defaults
to `trino-session`.
- `allowedIamRoles`: IAM roles that are allowed to be specified as an extra
credential. This is useful because a particular AWS account may have permissions
to use many roles, but a specific user should only be allowed to use a subset
of those roles.
- `kmsKeyId`: ID of KMS-managed key to be used for client-side encryption.
- `allowedKmsKeyIds`: KMS-managed key IDs that are allowed to be specified as an extra
credential. If list cotains `*`, then any key can be specified via extra credential.
- `endpoint`: The S3 storage endpoint server. This optional property can be used
to override S3 endpoints on a per-bucket basis.
- `region`: The S3 region to connect to. This optional property can be used
to override S3 regions on a per-bucket basis.

The security mapping entries are processed in the order listed in the JSON configuration.
Therefore, specific mappings must be specified before less specific mappings.
For example, the mapping list might have URL prefix `s3://abc/xyz/` followed by
`s3://abc/` to allow different configuration for a specific path within a bucket
than for other paths within the bucket. You can specify the default configuration
by not including any match criteria for the last entry in the list.

In addition to the preceding rules, the default mapping can contain the optional
`useClusterDefault` boolean property set to `true` to use the default S3 configuration.
It cannot be used with any other configuration settings.

If no mapping entry matches and no default is configured, access is denied.

The configuration JSON is read from a file via `s3.security-mapping.config-file`
or from an HTTP endpoint via `s3.security-mapping.config-uri`.

Example JSON configuration:

```json
{
"mappings": [
{
"prefix": "s3://bucket-name/abc/",
"iamRole": "arn:aws:iam::123456789101:role/test_path"
},
{
"user": "bob|charlie",
"iamRole": "arn:aws:iam::123456789101:role/test_default",
"allowedIamRoles": [
"arn:aws:iam::123456789101:role/test1",
"arn:aws:iam::123456789101:role/test2",
"arn:aws:iam::123456789101:role/test3"
]
},
{
"prefix": "s3://special-bucket/",
"accessKey": "AKIAxxxaccess",
"secretKey": "iXbXxxxsecret"
},
{
"prefix": "s3://regional-bucket/",
"iamRole": "arn:aws:iam::123456789101:role/regional-user",
"endpoint": "https://bucket.vpce-1a2b3c4d-5e6f.s3.us-east-1.vpce.amazonaws.com",
"region": "us-east-1"
},
{
"prefix": "s3://encrypted-bucket/",
"kmsKeyId": "kmsKey_10"
},
{
"user": "test.*",
"iamRole": "arn:aws:iam::123456789101:role/test_users"
},
{
"group": "finance",
"iamRole": "arn:aws:iam::123456789101:role/finance_users"
},
{
"iamRole": "arn:aws:iam::123456789101:role/default"
}
]
}
```

:::{list-table} Security mapping properties
:header-rows: 1
Comment thread
electrum marked this conversation as resolved.
Outdated

* - Property name
- Description
* - `s3.security-mapping.enabled`
- Activate the security mapping feature. Defaults to `false`.
Must be set to `true` for all other properties be used.
* - `s3.security-mapping.config-file`
- Path to the JSON configuration file containing security mappings.
* - `s3.security-mapping.config-uri`
- HTTP endpoint URI containing security mappings.
* - `s3.security-mapping.json-pointer`
- A JSON pointer (RFC 6901) to mappings inside the JSON retrieved from the
configuration file or HTTP endpoint. The default is the root of the document.
* - `s3.security-mapping.iam-role-credential-name`
- The name of the *extra credential* used to provide the IAM role.
* - `s3.security-mapping.kms-key-id-credential-name`
- The name of the *extra credential* used to provide the KMS-managed key ID.
* - `s3.security-mapping.refresh-period`
- How often to refresh the security mapping configuration, specified as a
{ref}`prop-type-duration`. By default, the configuration is not refreshed.
* - `s3.security-mapping.colon-replacement`
- The character or characters to be used instead of a colon character
when specifying an IAM role name as an extra credential.
Any instances of this replacement value in the extra credential value
are converted to a colon.
Choose a value not used in any of your IAM ARNs.
:::
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

import com.google.inject.Binder;
import com.google.inject.Module;
import com.google.inject.Scopes;
import io.airlift.configuration.AbstractConfigurationAwareModule;

public class AzureFileSystemModule
Expand All @@ -23,6 +24,7 @@ public class AzureFileSystemModule
@Override
protected void setup(Binder binder)
{
binder.bind(AzureFileSystemFactory.class).in(Scopes.SINGLETON);
Module module = switch (buildConfigObject(AzureFileSystemConfig.class).getAuthType()) {
case ACCESS_KEY -> new AzureAuthAccessKeyModule();
case OAUTH -> new AzureAuthOAuthModule();
Expand Down
5 changes: 0 additions & 5 deletions lib/trino-filesystem-manager/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,6 @@
<artifactId>guice</artifactId>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>bootstrap</artifactId>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>configuration</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@
package io.trino.filesystem.manager;

import com.google.inject.Binder;
import com.google.inject.Key;
import com.google.inject.Provides;
import com.google.inject.Scopes;
import com.google.inject.Singleton;
import io.airlift.bootstrap.LifeCycleManager;
import io.airlift.configuration.AbstractConfigurationAwareModule;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.trino.filesystem.Location;
import io.trino.filesystem.TrinoFileSystemFactory;
import io.trino.filesystem.alluxio.AlluxioFileSystemCacheModule;
import io.trino.filesystem.azure.AzureFileSystemFactory;
Expand All @@ -33,13 +34,15 @@
import io.trino.filesystem.cache.TrinoFileSystemCache;
import io.trino.filesystem.gcs.GcsFileSystemFactory;
import io.trino.filesystem.gcs.GcsFileSystemModule;
import io.trino.filesystem.s3.S3FileSystemFactory;
import io.trino.filesystem.s3.FileSystemS3;
import io.trino.filesystem.s3.S3FileSystemModule;
import io.trino.filesystem.switching.SwitchingFileSystemFactory;
import io.trino.filesystem.tracing.TracingFileSystemFactory;
import io.trino.spi.NodeManager;

import java.util.Map;
import java.util.Optional;
import java.util.function.Function;

import static com.google.inject.multibindings.MapBinder.newMapBinder;
import static com.google.inject.multibindings.OptionalBinder.newOptionalBinder;
Expand Down Expand Up @@ -90,9 +93,9 @@ protected void setup(Binder binder)

if (config.isNativeS3Enabled()) {
install(new S3FileSystemModule());
factories.addBinding("s3").to(S3FileSystemFactory.class);
factories.addBinding("s3a").to(S3FileSystemFactory.class);
factories.addBinding("s3n").to(S3FileSystemFactory.class);
factories.addBinding("s3").to(Key.get(TrinoFileSystemFactory.class, FileSystemS3.class));
factories.addBinding("s3a").to(Key.get(TrinoFileSystemFactory.class, FileSystemS3.class));
factories.addBinding("s3n").to(Key.get(TrinoFileSystemFactory.class, FileSystemS3.class));
}

if (config.isNativeGcsEnabled()) {
Expand All @@ -112,18 +115,21 @@ protected void setup(Binder binder)

@Provides
@Singleton
public TrinoFileSystemFactory createFileSystemFactory(
static TrinoFileSystemFactory createFileSystemFactory(
Optional<HdfsFileSystemLoader> hdfsFileSystemLoader,
LifeCycleManager lifeCycleManager,
Map<String, TrinoFileSystemFactory> factories,
Optional<TrinoFileSystemCache> fileSystemCache,
Optional<CacheKeyProvider> keyProvider,
Tracer tracer)
{
Optional<TrinoFileSystemFactory> hdfsFactory = hdfsFileSystemLoader.map(HdfsFileSystemLoader::create);
hdfsFactory.ifPresent(lifeCycleManager::addInstance);

TrinoFileSystemFactory delegate = new SwitchingFileSystemFactory(hdfsFactory, factories);
Function<Location, TrinoFileSystemFactory> loader = location -> location.scheme()
.map(factories::get)
.or(() -> hdfsFactory)
.orElseThrow(() -> new IllegalArgumentException("No factory for location: " + location));

TrinoFileSystemFactory delegate = new SwitchingFileSystemFactory(loader);
if (fileSystemCache.isPresent()) {
delegate = new CacheFileSystemFactory(tracer, delegate, fileSystemCache.orElseThrow(), keyProvider.orElseThrow());
}
Expand Down
46 changes: 46 additions & 0 deletions lib/trino-filesystem-s3/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@
</properties>

<dependencies>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand All @@ -37,6 +42,11 @@
<artifactId>configuration</artifactId>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>http-client</artifactId>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>units</artifactId>
Expand All @@ -62,6 +72,17 @@
<artifactId>trino-memory-context</artifactId>
</dependency>

<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-plugin-toolkit</artifactId>
<exclusions>
<exclusion>
<groupId>io.airlift</groupId>
<artifactId>bootstrap</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-spi</artifactId>
Expand Down Expand Up @@ -103,6 +124,11 @@
<artifactId>http-client-spi</artifactId>
</dependency>

<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>identity-spi</artifactId>
</dependency>

<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>regions</artifactId>
Expand Down Expand Up @@ -162,6 +188,12 @@
<scope>test</scope>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>testing</artifactId>
<scope>test</scope>
</dependency>

<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-filesystem</artifactId>
Expand Down Expand Up @@ -206,6 +238,20 @@
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<ignoredNonTestScopedDependencies>
<ignoredDependency>software.amazon.awssdk:identity-spi</ignoredDependency>
</ignoredNonTestScopedDependencies>
</configuration>
</plugin>
</plugins>
</build>

<profiles>
<profile>
<id>default</id>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.trino.filesystem.s3;

import com.google.inject.BindingAnnotation;

import java.lang.annotation.Retention;
import java.lang.annotation.Target;

import static java.lang.annotation.ElementType.FIELD;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.PARAMETER;
import static java.lang.annotation.RetentionPolicy.RUNTIME;

@Retention(RUNTIME)
@Target({FIELD, PARAMETER, METHOD})
@BindingAnnotation
public @interface FileSystemS3 {}
Loading