Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -191,4 +191,12 @@ public static <T> Builder<T> builder() {
"If set to true, allows tables to be dropped with the purge parameter set to true.")
.defaultValue(true)
.build();

public static final PolarisConfiguration<Boolean> SUPPORT_WASB_CATALOG =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should re-examine this approach. It's not clear to me how this pattern can extend to having more than 2 schemes -- for example S3 has s3n, s3a, and s3. Also, conceivably, you might want to allow just abfss and not abfs or vice versa.

Copy link
Contributor Author

@tzuan16 tzuan16 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the current approach is not at all extensible. I was hoping we have a test annotation like @WithConfig(configName="true") that could directly modify the config store, but that would require a larger test framework change that would fall out of scope of this PR. Currently the config store only contains getter methods and there's also no way to modify its content via code via a setter method, so I had to proceed with this pattern of initiating two DropwizardAppExtension instances.

Do you think we could wait until the test framework to be more interactable with the config store that we refactor this test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought that test with the two DropwizardAppExtension was pretty good given what we have now. The suggestion you made for an annotation sounds great, but I definitely don't think that's a blocker for this PR.

My comment here is specifically about having a single boolean config specific to WASB (+WASBS) and how that doesn't scale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that we need to extend this configuration to support s3n, etc. I do think we could push the check into the StorageLocation class so that it can be responsible for tracking which scheme(s) it supports and which should be rejected at creation. So, e.g., the AzureLocation class can implement a canCreateNew() method or something and it can, internally, determine whether wasb is supported or not. Some future S3StorageLocation can track s3x URLs, though AFAIK there's no reason to explicitly add/deny support for any of those variations are supported by the existing Iceberg S3FileIO.

Copy link
Contributor

@eric-maynard eric-maynard Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably should support it. Indeed, don't we already block http paths?

Also, why is this specific to catalog creation? The approach here just seems very specific to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that rationalizing this with the StorageLocation or FileIO or StorageConfigurationInfo or StorageType types is a good idea. Right now it seems like the logic for dealing with different schemes and file systems is not centralized enough.

PolarisConfiguration.<Boolean>builder()
.key("SUPPORT_WASB_CATALOG")
.description(
"If set to true, allows the creation of catalogs with storage locations using Azure WASB prefix.")
.defaultValue(false)
Copy link
Contributor

@eric-maynard eric-maynard Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we support FILE storage by default, I don't think we need to be so strict so as to not support WASB by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FILE is different, as it works out of the box. wasb does not. In my mind, opting in to something that doesn't work quite right is different from opting out of something you don't intend to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the case that WASB does work out of the box for Polaris just like FILE, but the concern is that some clients might not be able to use credential vending with WASB (until the next Iceberg release)? Clients also can't use credential vending with FILE.

.build();
}
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,13 @@ public class AzureLocation extends StorageLocation {
private static final Pattern URI_PATTERN = Pattern.compile("^(abfss?|wasbs?)://([^/?#]+)(.*)?$");

public static final String ADLS_ENDPOINT = "dfs.core.windows.net";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spurious change?

public static final String BLOB_ENDPOINT = "blob.core.windows.net";

public static final String ABFS_SCHEME = "abfs://";
public static final String ABFSS_SCHEME = "abfss://";
public static final String WASB_SCHEME = "wasb://";
public static final String WASBS_SCHEME = "wasbs://";

Comment on lines +33 to +37
Copy link
Contributor

@eric-maynard eric-maynard Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the scheme is not considered to include the ://. I also think there might be a more extensible way to capture this than 4 constants.

Furthermore, isn't this duplicative of the constants in WasbTranslatingFileIO as well as StorageType?

private final String scheme;
private final String storageAccount;
private final String container;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@
import org.apache.polaris.core.storage.PolarisStorageConfigurationInfo;
import org.apache.polaris.core.storage.StorageLocation;
import org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo;
import org.apache.polaris.core.storage.azure.AzureLocation;
import org.apache.polaris.core.storage.azure.AzureStorageConfigurationInfo;
import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;
Expand Down Expand Up @@ -533,6 +534,24 @@ private boolean catalogOverlapsWithExistingCatalog(CatalogEntity catalogEntity)
});
}

private boolean catalogLocationsContainWasb(CatalogEntity catalogEntity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a thin method and only used in one place. If we are going to make it its own method, can we at least make it more visible to test it?

boolean supportWasbCatalog =
callContext
.getPolarisCallContext()
.getConfigurationStore()
.getConfiguration(
callContext.getPolarisCallContext(), PolarisConfiguration.SUPPORT_WASB_CATALOG);
if (supportWasbCatalog) {
return false;
}

return getCatalogLocations(catalogEntity).stream()
.anyMatch(
location ->
location.startsWith(AzureLocation.WASB_SCHEME)
|| location.startsWith(AzureLocation.WASBS_SCHEME));
}

public PolarisEntity createCatalog(PolarisEntity entity) {
PolarisAuthorizableOperation op = PolarisAuthorizableOperation.CREATE_CATALOG;
authorizeBasicRootOperationOrThrow(op);
Expand All @@ -543,6 +562,12 @@ public PolarisEntity createCatalog(PolarisEntity entity) {
entity.getName());
}

if (catalogLocationsContainWasb((CatalogEntity) entity)) {
throw new ValidationException(
"Cannot create Catalog %s. Polaris does not support catalog locations containing WASB paths",
entity.getName());
}

long id =
entity.getId() <= 0
? entityManager
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.polaris.service.admin;

import static org.apache.polaris.service.context.DefaultContextResolver.REALM_PROPERTY_KEY;
import static org.assertj.core.api.Assertions.assertThat;

import io.dropwizard.testing.ConfigOverride;
import io.dropwizard.testing.ResourceHelpers;
import io.dropwizard.testing.junit5.DropwizardAppExtension;
import io.dropwizard.testing.junit5.DropwizardExtensionsSupport;
import jakarta.ws.rs.client.Entity;
import jakarta.ws.rs.client.Invocation;
import jakarta.ws.rs.core.Response;
import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.UUID;
import java.util.stream.Stream;
import org.apache.polaris.core.admin.model.AzureStorageConfigInfo;
import org.apache.polaris.core.admin.model.Catalog;
import org.apache.polaris.core.admin.model.CatalogProperties;
import org.apache.polaris.core.admin.model.CreateCatalogRequest;
import org.apache.polaris.core.admin.model.StorageConfigInfo;
import org.apache.polaris.service.PolarisApplication;
import org.apache.polaris.service.config.PolarisApplicationConfig;
import org.apache.polaris.service.test.PolarisConnectionExtension;
import org.apache.polaris.service.test.PolarisRealm;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.extension.ExtendWith;
import org.junit.jupiter.api.extension.ExtensionContext;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
import org.junit.jupiter.params.provider.ArgumentsProvider;
import org.junit.jupiter.params.provider.ArgumentsSource;

/** Tests for the validations performed in {@link PolarisAdminService}. */
@ExtendWith({DropwizardExtensionsSupport.class, PolarisConnectionExtension.class})
public class PolarisAdminServiceValidationTest {
private static final DropwizardAppExtension<PolarisApplicationConfig> EXT_BLOCK_WASB =
initExt(ConfigOverride.config("featureConfiguration.SUPPORT_WASB_CATALOG", "false"));
private static final DropwizardAppExtension<PolarisApplicationConfig> EXT_SUPPORT_WASB =
initExt(ConfigOverride.config("featureConfiguration.SUPPORT_WASB_CATALOG", "true"));

private static String userToken;
private static String realm;

private static DropwizardAppExtension<PolarisApplicationConfig> initExt(
ConfigOverride... overrides) {
return new DropwizardAppExtension<>(
PolarisApplication.class,
ResourceHelpers.resourceFilePath("polaris-server-integrationtest.yml"),
Stream.concat(
Stream.of(
// Bind to random port to support parallelism
ConfigOverride.config("server.applicationConnectors[0].port", "0"),
ConfigOverride.config("server.adminConnectors[0].port", "0")),
Stream.of(overrides))
.toArray(ConfigOverride[]::new));
}

@BeforeAll
public static void setup(
PolarisConnectionExtension.PolarisToken adminToken, @PolarisRealm String polarisRealm)
throws IOException {
userToken = adminToken.token();
realm = polarisRealm;

// Set up the database location
PolarisConnectionExtension.createTestDir(realm);
}

private static Invocation.Builder request(DropwizardAppExtension<PolarisApplicationConfig> ext) {
return ext.client()
.target(String.format("http://localhost:%d/api/management/v1/catalogs", ext.getLocalPort()))
.request("application/json")
.header("Authorization", "Bearer " + userToken)
.header(REALM_PROPERTY_KEY, realm);
}

private Response createAzureCatalog(
DropwizardAppExtension<PolarisApplicationConfig> ext,
String defaultBaseLocation,
boolean isExternal,
List<String> allowedLocations) {
String uuid = UUID.randomUUID().toString();
StorageConfigInfo config =
AzureStorageConfigInfo.builder()
.setTenantId("tenantId")
.setConsentUrl("https://consentUrl")
.setMultiTenantAppName("multiTenantAppName")
.setStorageType(StorageConfigInfo.StorageTypeEnum.AZURE)
.setAllowedLocations(allowedLocations)
.build();
Catalog catalog =
new Catalog(
isExternal ? Catalog.TypeEnum.EXTERNAL : Catalog.TypeEnum.INTERNAL,
String.format("overlap_catalog_%s", uuid),
new CatalogProperties(defaultBaseLocation),
System.currentTimeMillis(),
System.currentTimeMillis(),
1,
config);
try (Response response = request(ext).post(Entity.json(new CreateCatalogRequest(catalog)))) {
return response;
}
}

@ParameterizedTest
@ArgumentsSource(AzurePathArgs.class)
public void testWasbCatalogCreationBlocked(String defaultBaseLocation, String allowedLocation) {
int expectedStatusCode =
(defaultBaseLocation.startsWith("wasbs") || allowedLocation.startsWith("wasbs"))
? Response.Status.BAD_REQUEST.getStatusCode()
: Response.Status.CREATED.getStatusCode();

assertThat(
createAzureCatalog(
EXT_BLOCK_WASB,
defaultBaseLocation,
false,
Collections.singletonList(allowedLocation)))
.returns(expectedStatusCode, Response::getStatus);
assertThat(
createAzureCatalog(
EXT_BLOCK_WASB,
defaultBaseLocation,
true,
Collections.singletonList(allowedLocation)))
.returns(expectedStatusCode, Response::getStatus);
}

@ParameterizedTest
@ArgumentsSource(AzurePathArgs.class)
public void testWasbCatalogCreationAllowed(String defaultBaseLocation, String allowedLocation) {
assertThat(
createAzureCatalog(
EXT_SUPPORT_WASB,
defaultBaseLocation,
false,
Collections.singletonList(allowedLocation)))
.returns(Response.Status.CREATED.getStatusCode(), Response::getStatus);
assertThat(
createAzureCatalog(
EXT_SUPPORT_WASB,
defaultBaseLocation,
true,
Collections.singletonList(allowedLocation)))
.returns(Response.Status.CREATED.getStatusCode(), Response::getStatus);
}

private static class AzurePathArgs implements ArgumentsProvider {
@Override
public Stream<? extends Arguments> provideArguments(ExtensionContext extensionContext) {
return Stream.of(createArguments(), createArguments(), createArguments(), createArguments());
}

private Arguments createArguments() {
String prefix = UUID.randomUUID().toString();
String wasbPath = String.format("wasbs://[email protected]/%s/wasb", prefix);
String abfsPath = String.format("abfss://[email protected]/%s/abfs", prefix);
return Arguments.of(wasbPath, abfsPath);
}
}
}
4 changes: 3 additions & 1 deletion spec/polaris-management-service.yml
Original file line number Diff line number Diff line change
Expand Up @@ -903,7 +903,9 @@ components:

AzureStorageConfigInfo:
type: object
description: azure storage configuration info
description:
azure storage configuration info; base location supports abfs[s] schemes and only supports wasb[s] schemes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the code, it's not just the base location

when featureConfiguration.SUPPORT_WASB_CATALOG is true
allOf:
- $ref: '#/components/schemas/StorageConfigInfo'
properties:
Expand Down