Skip to content

Handle Iceberg data file missing null count stats#11832

Merged
findepi merged 1 commit intotrinodb:masterfrom
alexjo2144:null-counts-npe
Apr 7, 2022
Merged

Handle Iceberg data file missing null count stats#11832
findepi merged 1 commit intotrinodb:masterfrom
alexjo2144:null-counts-npe

Conversation

@alexjo2144
Copy link
Copy Markdown
Member

Description

Null counts is an optional field and may be null. Handle that case appropriately by invalidating stats rather than throwing a NPE.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Fix NPE when an Iceberg data file is missing null count statistics

@cla-bot cla-bot bot added the cla-signed label Apr 6, 2022
@alexjo2144 alexjo2144 requested review from findepi and findinpath April 6, 2022 18:34
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to showcase this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a naive test case showcasing the bug you fixed

    @Test
    public void testNoMetrics()
    {
        assertUpdate("create table table_no_metrics_partitioned (name varchar, country varchar) WITH (partitioning = ARRAY['country'])");
        Table table = IcebergUtil.loadIcebergTable(trinoCatalog, tableOperationsProvider, TestingConnectorSession.SESSION,
                new SchemaTableName("tpch", "table_no_metrics_partitioned"));
        // skip metrics
        table.updateProperties()
                .set("write.metadata.metrics.default", "none")
                .commit();
        assertUpdate("INSERT INTO table_no_metrics_partitioned VALUES ('Christoph', 'AT'), (NULL, 'RO')", 2);
        assertEquals(computeActual("SELECT count(*) FROM \"table_no_metrics_partitioned$partitions\" WHERE data IS NOT NULL").getOnlyValue(), 0L);
    }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this probably would exercise missing min/max as well, right? IIRC we didn't add a test then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out, but I'm getting an empty Map for min/max/null-counts instead of null. I'd have to look at the Iceberg code, but maybe it changed at some point?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to file an Issue to respect write.metadata.metrics.default and the other stats tuning properties at some point

Copy link
Copy Markdown
Contributor

@findinpath findinpath Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run the test on your branch @alexjo2144 and I inspected specifically to see whether the nullCounts map is NULL.

BTW I did the test on TestIcebergMergeAppend - it wouldn't fit for this PR, but I had there the trinoCatalog and tableOperationsProvider at hand.
I'm thinking that this is a good reason to handle #11828 in the near future to be covered against regressions that we can't catch when creating content with the latest version of Iceberg.

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 7, 2022

@alexjo2144 there are test failures

@alexjo2144 alexjo2144 force-pushed the null-counts-npe branch 2 times, most recently from f8a2961 to 342343f Compare April 7, 2022 17:46
@findepi findepi merged commit 85b9e76 into trinodb:master Apr 7, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 7, 2022

@alexjo2144 do you want RN for this?

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Apr 7, 2022

RN entry is in description .. I will use that

@github-actions github-actions bot added this to the 376 milestone Apr 7, 2022
@alexjo2144 alexjo2144 deleted the null-counts-npe branch April 7, 2022 21:20
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 8, 2022

RN entry is in description .. I will use that

@mosabua thanks, missed that

* Fix NPE when an Iceberg data file is missing null count statistics

@alexjo2144 this should spell out NPE and should specify what operation could fail (SELECT? INSERT? SELECT from a system table?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants