Skip to content

Conversation

@okumin
Copy link
Contributor

@okumin okumin commented Apr 21, 2024

What changes were proposed in this pull request?

Add a null check when using RelMetadataQuery#areColumnsUnique.
https://issues.apache.org/jira/browse/HIVE-28207

Why are the changes needed?

RelMetadataQuery#areColumnsUnique returns true, false, or null. Some of our implementations skip testing whether it is null and throw NPE.

Does this PR introduce any user-facing change?

No. It is a bug fix to resolve NPE.

Is the change a dependency upgrade?

No.

How was this patch tested?

I added integration tests.
Without this patch, cbo_join_constraints.q fails with NPE thrown by HiveJoinConstraintsRule.
cbo_row_count_non_simple_filter.q hits the problem of HiveJoinConstraintsRule and HiveRelMdRowCount.
I have not found a real case where the issue happens with HiveAggregateSplitRule, but I modified it as it is likely to hit the same problem.

@ayushtkn
Copy link
Member

RelMetadataQuery#areColumnsUnique returns true, false, or null.

why is it returning null, columns can be either unique or not unique, what null denotes here?

@okumin
Copy link
Contributor Author

okumin commented Apr 22, 2024

@ayushtkn Calcite can express the three statuses, either of "it is definitely unique", "it is not definitely unique", or unknown. Null stands for the last case. If it is null, this PR assumes a more general case, not unique.
https://github.com/apache/calcite/blob/0551b8903391c1706422a2c1b8b648a6941f39a2/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L523-L524

@okumin
Copy link
Contributor Author

okumin commented Apr 22, 2024

I may also check why and when it can return null if it is unexpected

Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I think it makes sense to add checks for nulls when necessary since from an API perspective it is a valid value. I left some minor comments in .q files but other than that I am fine to merge this change.

Independently (maybe as a separate JIRA ticket) it may be worth checking why the null value appears in the first place and tackle that problem as well since it could lead to better query plans. In many cases the NPE raised in metadata can be treated by finding the origin of the null value.

@okumin
Copy link
Contributor Author

okumin commented Apr 22, 2024

I attached a remote debugger and found RelMetadataQuery#areColumnsUnique invoked recursively against nested RelNodes. RelMdColumnUniqueness has many paths returning null, so I think adding null checks like this PR seems valid.

I will follow zabetak's comments and update the PR.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarqubecloud
Copy link

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@okumin
Copy link
Contributor Author

okumin commented Apr 23, 2024

CI succeeded 💯

@zabetak zabetak closed this in 2d855b2 Apr 24, 2024
mr3project pushed a commit to mr3project/hive-mr3 that referenced this pull request Aug 21, 2024
…eness (Shohei Okumiya reviewed by Stamatis Zampetakis, Ayush Saxena)

Close apache/hive#5207
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Sep 20, 2024
…eness (Shohei Okumiya reviewed by Stamatis Zampetakis, Ayush Saxena)

Close apache#5207
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Sep 20, 2024
…eness (Shohei Okumiya reviewed by Stamatis Zampetakis, Ayush Saxena)

Close apache#5207
mr3project pushed a commit to mr3project/hive-mr3 that referenced this pull request Nov 29, 2024
…eness (Shohei Okumiya reviewed by Stamatis Zampetakis, Ayush Saxena)

Close apache/hive#5207
@okumin okumin deleted the HIVE-28207-column-uniqueness branch October 11, 2025 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants