Skip to content

Fix createViewWithCustomMetadataLocation tests for cloudTest task#2405

Closed
tmater wants to merge 3 commits intoapache:mainfrom
tmater:cloud_test_fix
Closed

Fix createViewWithCustomMetadataLocation tests for cloudTest task#2405
tmater wants to merge 3 commits intoapache:mainfrom
tmater:cloud_test_fix

Conversation

@tmater
Copy link
Contributor

@tmater tmater commented Aug 20, 2025

This patch fixes the createViewWithCustomMetadataLocation tests for cloudTest tasks. The original test was generating temp directories internally, causing cloudTests to fail with BadRequestException instead of the expected ForbiddenException.

Changes:

  • Switched to Hadoop's Path (Java Path removes slashes, e.g. s3://bucket/path -> s3:/bucket/path)
  • Made base classes abstract to avoid running them
  • Implemented createViewWithCustomMetadataLocation to allow passing a custom location

Testing:

  • Verified locally

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Aug 20, 2025
@tmater tmater marked this pull request as ready for review August 20, 2025 09:00

/** Runs PolarisRestCatalogViewIntegrationTest on AWS. */
public class PolarisRestCatalogViewAwsIntegrationTest
public abstract class PolarisRestCatalogViewAwsIntegrationTestBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're renaming this type anyway, let's better use the protocol name, not the vendor name.

Suggested change
public abstract class PolarisRestCatalogViewAwsIntegrationTestBase
public abstract class PolarisRestCatalogViewS3IntegrationTestBase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I change these 3 (non-view test bases) as well in one go:

  • PolarisRestCatalogAwsIntegrationTestBase
  • PolarisRestCatalogAzureIntegrationTestBase
  • PolarisRestCatalogGcpIntegrationTestBase

Also the 6 implementations:

  • RestCatalogAwsIT
  • RestCatalogAzureIT
  • RestCatalogGcpIT
  • RestCatalogViewAwsIT
  • RestCatalogViewAzureIT
  • RestCatalogViewGcpIT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd support it, yea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean, we support MinIO and a bunch of other S3 compatible systems, not only AWS. Sure, we could argue about GCS or ADLS. However, GCS and ADLS are specific, where a vendor name or "product suite" name are not.

Copy link
Contributor Author

@tmater tmater Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, renamed the classes and related parts.

import java.util.List;
import java.util.Optional;
import java.util.stream.Stream;
import org.apache.hadoop.fs.Path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sure, this is just a test, but could we avoid Hadoop dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind Hadoop Path came because Iceberg does something "funky" with the metadataFileLocation() that resembles Hadoop Path behavior: It normalizes the scheme file:///... to file:/..., but it keeps s3://... as s3://....
For the assertion later I couldn't use java.nio.Path because that normalizes the scheme and creates an incorrect s3:/... path.

Just dug a bit more, I found a reference to Hadoop Path in Iceberg's LocationProviders:24, maybe this is how they do it as well, I could not find the exact location where it gets normalized.

I’m open to other approaches, but the only solution that came to mind was writing a custom assertion, which I’m also happy with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh - the damn file URI scheme is so ambiguous (IMHO, because of multiple representations for the same thing, legacy representations, risk of interpreting those wrong).

If it's just about eliminating all host related parts, I'd say let's just use a regex to "fix" the paths?
Something like

protected static final Pattern FILE_LOCATION_PATTERN =
  Pattern.compile("file:/*(.*)");

protected String fixFileUri(String location) {
  var m = FILE_LOCATION_PATTERN.matcher(location);
  return m.matches() ? "file:/" + m.group(1) : location;
}

WDYT? Would that work?

Copy link
Contributor

@adutra adutra Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI In Polaris we have:

  • org.apache.polaris.core.storage.StorageLocation which standardizes file URIs to "file:///
  • but InMemoryStorageIntegration does the opposite 😅 :

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - maybe let's ignore the file scheme specialty and just assume that it's "correct" if supplied.

Copy link
Contributor Author

@tmater tmater Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I used java.nio.file.Path to remove the scheme.

tmater added 3 commits August 28, 2025 09:42
This patch fixes the createViewWithCustomMetadataLocation tests for
cloudTest tasks. The original test was generating temp directories
internally, causing cloudTests to fail with BadRequestException
instead of the expected ForbiddenException.

Changes:
- Switched to Hadoop's Path (Java Path removes slashes, e.g.
  s3://bucket/path -> s3:/bucket/path)
- Made base classes abstract to avoid running them
- Implemented createViewWithCustomMetadataLocation to allow
  passing a custom location

Testing:
- Verified locally
Comment on lines +185 to +187
protected String getCustomMetadataLocationDir() {
return "";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected String getCustomMetadataLocationDir() {
return "";
}
protected abstract String getCustomMetadataLocationDir();

private static PolarisClient client;
private static ManagementApi managementApi;
protected static final String POLARIS_IT_SUBDIR = "polaris_it";
protected static final String POLARIS_IT_CUSTOM_SUBDIR = "polaris_it_custom";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant seems unused.

protected boolean shouldSkip() {
return Stream.of(BASE_LOCATION, TENANT_ID).anyMatch(Strings::isNullOrEmpty);
protected String getCustomMetadataLocationDir() {
return StorageUtil.concatFilePrefixes(BASE_LOCATION, POLARIS_IT_SUBDIR, File.separator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

Suggested change
return StorageUtil.concatFilePrefixes(BASE_LOCATION, POLARIS_IT_SUBDIR, File.separator);
return StorageUtil.concatFilePrefixes(BASE_LOCATION, POLARIS_IT_CUSTOM_SUBDIR, File.separator);

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Sep 30, 2025
@github-actions github-actions bot closed this Oct 5, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Oct 5, 2025
tmater added a commit to tmater/polaris that referenced this pull request Oct 23, 2025
snazy added a commit to snazy/polaris that referenced this pull request Nov 20, 2025
* Update dependency com.fasterxml.jackson:jackson-bom to v2.20.1 (apache#2949)

* Update dependency com.github.jk1:gradle-license-report to v3 (apache#2958)

* Fix typos and spelling issues identified by codespell (apache#2959)

* Fix findings with codespell

* Fix findings with codespell

* Fix regtest doc (apache#2955)

* fix(deps): update dependency software.amazon.awssdk:bom to v2.37.3 (apache#2960)

* Refactor Polaris cloudTests and adopt changes from PR apache#2405 (apache#2871)

* Last merged commit 4a80c51

---------

Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>
Co-authored-by: Tamas Mate <50709850+tmater@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants