Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESTTableOperations does not support table metadata swap like others TableOperations did #12134

Open
1 of 3 tasks
dramaticlly opened this issue Jan 30, 2025 · 3 comments
Open
1 of 3 tasks
Labels
bug Something isn't working

Comments

@dramaticlly
Copy link
Contributor

dramaticlly commented Jan 30, 2025

Apache Iceberg version

1.7.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Before migrate to REST catalog, we rely on following TableOperations.commit API call to swap table metadata atomically.

String deisredMetadataPath = "/var/newdb/table/metadata/00003-579b23d1-4ca5-4acf-85ec-081e1699cb83.metadata.json""
ops.commit(ops.current(), TableMetadataParser.read(ops.io(), dedeisredMetadataPath));

However this is no longer working in REST based catalog, I suspect it might relate to how update type was modeled here where metadata.changes() return empty when read from parser and end up with empty changeset in update table POST call.

This can be reproduced by adding following test case in org.apache.iceberg.catalog.CatalogTests.java where all other catalogs are functioning as expected but only failure for TestRESTCatalog

Image

repro:

  @Test
  public void testTableOperationCommit() {
    C catalog = catalog();

    if (requiresNamespaceCreate()) {
      catalog.createNamespace(TABLE.namespace());
    }

    Map<String, String> properties =
        ImmutableMap.of("user", "someone", "created-at", "2023-01-15T00:00:01");
    Table originalTable =
        catalog
            .buildTable(TABLE, SCHEMA)
            .withPartitionSpec(SPEC)
            .withSortOrder(WRITE_ORDER)
            .withProperties(properties)
            .create();
    TableOperations ops = ((BaseTable) originalTable).operations();
    String original = ops.current().metadataFileLocation();
    FileIO io = ops.io();

    originalTable.newFastAppend().appendFile(FILE_A).commit();
    originalTable.newFastAppend().appendFile(FILE_B).commit();

    String metadataLocation = ops.refresh().metadataFileLocation();
    System.out.printf("After write, metadata location is:" + metadataLocation);

    ops.commit(ops.refresh(), TableMetadataParser.read(originalTable.io(), original));

    originalTable.refresh();
    TableMetadata actual = ((BaseTable) originalTable).operations().current();
    TableMetadata expected = new StaticTableOperations(original, io).current();

    assertThat(actual.properties())
        .as("Props must match")
        .containsAllEntriesOf(expected.properties());
    assertThat(actual.schema().asStruct())
        .as("Schema must match")
        .isEqualTo(expected.schema().asStruct());
    assertThat(actual.specs()).as("Specs must match").isEqualTo(expected.specs());
    assertThat(actual.sortOrders()).as("Sort orders must match").isEqualTo(expected.sortOrders());
    assertThat(actual.currentSnapshot())
        .as("Current snapshot must match")
        .isEqualTo(expected.currentSnapshot());
    assertThat(actual.snapshots()).as("Snapshots must match").isEqualTo(expected.snapshots());
    assertThat(actual.snapshotLog()).as("History must match").isEqualTo(expected.snapshotLog());

    TestHelpers.assertSameSchemaMap(actual.schemasById(), expected.schemasById());

    assertThat(actual)
        .isEqualToIgnoringGivenFields(
            expected,
            "metadataFileLocation",
            "schemas",
            "specs",
            "sortOrders",
            "properties",
            "schemasById",
            "specsById",
            "sortOrdersById",
            "snapshots");
  }

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@dramaticlly dramaticlly added the bug Something isn't working label Jan 30, 2025
@dramaticlly
Copy link
Contributor Author

@rdblue @nastra appreciate your insight on this!

@danielcweeks
Copy link
Contributor

I added this on the mailing list but thought I'd drop it here as well:

I think the issue here is that you're using the commit api in table operations to perform a non-incremental/linear change to the metadata. The REST implementation is a little more strict in that it builds a set of updates based on the mutations made to the metadata and the commit process applies those changes. In this scenario, no changes have been made and the call is attempting a complete replacement.

The other implementations are just blindly swapping the location, so while that operation does achieve the effect you're looking for, it's not the right semantics for the commit.

You might want to consider using the "register table" operation instead, which takes the table identifier and location to perform this type of swap.

@dramaticlly
Copy link
Contributor Author

I added this on the mailing list but thought I'd drop it here as well:

I think the issue here is that you're using the commit api in table operations to perform a non-incremental/linear change to the metadata. The REST implementation is a little more strict in that it builds a set of updates based on the mutations made to the metadata and the commit process applies those changes. In this scenario, no changes have been made and the call is attempting a complete replacement.

The other implementations are just blindly swapping the location, so while that operation does achieve the effect you're looking for, it's not the right semantics for the commit.

You might want to consider using the "register table" operation instead, which takes the table identifier and location to perform this type of swap.

Thank you @danielcweeks for the detailed reply. I actually trying to rely on this as a replacement for "drop + register table" operation as TableOperations.commit can be done atomically suggested by @rdblue way back in #5327 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants