Skip to content

Conversation

@slfan1989
Copy link
Contributor

Related Issues

  • #TODO: update partition statistics file paths (completes the table path rewrite functionality).

@slfan1989 slfan1989 force-pushed the spark-4.0-partition-stats-rewrite branch from 9c651b8 to c695ae2 Compare September 1, 2025 01:47
@huaxingao
Copy link
Contributor

also cc @szehon-ho @dramaticlly

@slfan1989
Copy link
Contributor Author

also cc @szehon-ho @dramaticlly

@huaxingao, thank you for helping to review the code! @dramaticlly @szehon-ho, could you please take a look at this PR? Thank you very much!

@pvary
Copy link
Contributor

pvary commented Sep 4, 2025

CC: @gaborkaszab - Another place where the stat files need to be considered

Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one small nit on tests

Comment on lines 1390 to 1393
if (partitionColumn != null && !partitionColumn.isEmpty()) {
sqlStr = String.format("%s USING iceberg PARTITIONED BY (%s)", sqlStr, partitionColumn);
}
if (!location.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this starts to look a bit hard to read with many configurable parameter on table creation, Can we try move it out to a separate method for table creation, something like

  private Table createMetastoreTable(
      String location,
      Map<String, String> properties,
      String namespace,
      String tableName,
      int snapshotNumber,
      String partitionColumn) {
    spark.conf().set("spark.sql.catalog.hive", SparkCatalog.class.getName());
    spark.conf().set("spark.sql.catalog.hive.type", "hive");
    spark.conf().set("spark.sql.catalog.hive.default-namespace", "default");
    spark.conf().set("spark.sql.catalog.hive.cache-enabled", "false");

    sql("DROP TABLE IF EXISTS hive.%s.%s", namespace, tableName);

    // create table
    sql(tableDDL(location, properties, namespace, tableName, partitionColumn));

    // Insert test data
    for (int i = 0; i < snapshotNumber; i++) {
      sql("INSERT INTO hive.%s.%s VALUES (%s, 'AAAAAAAAAA', 'AAAA')", namespace, tableName, i);
    }

    return catalog.loadTable(TableIdentifier.of(namespace, tableName));
  }

  private String tableDDL(
      String location,
      Map<String, String> properties,
      String namespace,
      String tableName,
      String partitionColumn) {
    String tblProperties =
        properties.entrySet().stream()
            .map(entry -> String.format("'%s'='%s'", entry.getKey(), entry.getValue()))
            .collect(Collectors.joining(","));

    StringBuilder createTableSql = new StringBuilder();
    createTableSql
        .append("CREATE TABLE hive.")
        .append(namespace)
        .append(".")
        .append(tableName)
        .append(" (c1 bigint, c2 string, c3 string)");

    if (partitionColumn != null && !partitionColumn.isEmpty()) {
      createTableSql.append(" USING iceberg PARTITIONED BY (").append(partitionColumn).append(")");
    } else {
      createTableSql.append(" USING iceberg");
    }

    if (!location.isEmpty()) {
      createTableSql.append(" LOCATION '").append(location).append("'");
    }

    if (!tblProperties.isEmpty()) {
      createTableSql.append(" TBLPROPERTIES (").append(tblProperties).append(")");
    }
    return createTableSql.toString();
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dramaticlly Thank you for the feedback! Optimized according to suggestions:

  • Method renamed to generateCreateTableSQL (more semantic)
  • Added complete method comments

@slfan1989
Copy link
Contributor Author

@huaxingao @dramaticlly Could you please review this PR again? Thank you very much!

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slfan1989
Copy link
Contributor Author

LGTM

@huaxingao @dramaticlly Thanks a lot for reviewing this PR! Are we good to merge it into the master branch?

@huaxingao huaxingao merged commit bc56756 into apache:main Sep 8, 2025
42 checks passed
@huaxingao
Copy link
Contributor

Thanks @slfan1989 for the PR! Thanks @dramaticlly for the review!

@slfan1989
Copy link
Contributor Author

Thanks @slfan1989 for the PR! Thanks @dramaticlly for the review!

Thank you again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants