Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ public void acceptDataFile(DataFile dataFile, PartitionSpec partitionSpec)
Conversions.fromByteBuffer(column.type(), Optional.ofNullable(dataFile.lowerBounds()).map(a -> a.get(id)).orElse(null)));
Object upperBound = convertIcebergValueToTrino(column.type(),
Conversions.fromByteBuffer(column.type(), Optional.ofNullable(dataFile.upperBounds()).map(a -> a.get(id)).orElse(null)));
Optional<Long> nullCount = Optional.ofNullable(dataFile.nullValueCounts().get(id));
Optional<Long> nullCount = Optional.ofNullable(dataFile.nullValueCounts()).map(nullCounts -> nullCounts.get(id));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to showcase this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a naive test case showcasing the bug you fixed

    @Test
    public void testNoMetrics()
    {
        assertUpdate("create table table_no_metrics_partitioned (name varchar, country varchar) WITH (partitioning = ARRAY['country'])");
        Table table = IcebergUtil.loadIcebergTable(trinoCatalog, tableOperationsProvider, TestingConnectorSession.SESSION,
                new SchemaTableName("tpch", "table_no_metrics_partitioned"));
        // skip metrics
        table.updateProperties()
                .set("write.metadata.metrics.default", "none")
                .commit();
        assertUpdate("INSERT INTO table_no_metrics_partitioned VALUES ('Christoph', 'AT'), (NULL, 'RO')", 2);
        assertEquals(computeActual("SELECT count(*) FROM \"table_no_metrics_partitioned$partitions\" WHERE data IS NOT NULL").getOnlyValue(), 0L);
    }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this probably would exercise missing min/max as well, right? IIRC we didn't add a test then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out, but I'm getting an empty Map for min/max/null-counts instead of null. I'd have to look at the Iceberg code, but maybe it changed at some point?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to file an Issue to respect write.metadata.metrics.default and the other stats tuning properties at some point

Copy link
Copy Markdown
Contributor

@findinpath findinpath Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run the test on your branch @alexjo2144 and I inspected specifically to see whether the nullCounts map is NULL.

BTW I did the test on TestIcebergMergeAppend - it wouldn't fit for this PR, but I had there the trinoCatalog and tableOperationsProvider at hand.
I'm thinking that this is a good reason to handle #11828 in the near future to be covered against regressions that we can't catch when creating content with the latest version of Iceberg.

updateMinMaxStats(
id,
trinoType,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1455,6 +1455,20 @@ public void testTrinoReadsSparkRowLevelDeletesWithRowTypes(StorageFormat tableSt
onSpark().executeQuery("DROP TABLE " + sparkTableName);
}

@Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS})
public void testMissingMetrics()
{
String tableName = "test_missing_metrics_" + randomTableSuffix();
String sparkTableName = sparkTableName(tableName);
onSpark().executeQuery("CREATE TABLE " + sparkTableName + " (name STRING, country STRING) USING ICEBERG " +
"PARTITIONED BY (country) TBLPROPERTIES ('write.metadata.metrics.default'='none')");
onSpark().executeQuery("INSERT INTO " + sparkTableName + " VALUES ('Christoph', 'AT'), (NULL, 'RO')");
assertThat(onTrino().executeQuery(format("SELECT count(*) FROM %s.%s.\"%s$partitions\" WHERE data IS NOT NULL", TRINO_CATALOG, TEST_SCHEMA_NAME, tableName)))
.containsOnly(row(0));

onSpark().executeQuery("DROP TABLE " + sparkTableName);
}

private static String escapeSparkString(String value)
{
return value.replace("\\", "\\\\").replace("'", "\\'");
Expand Down