-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix: Null pointer exception when there is no valid column metrics #26316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -247,26 +247,29 @@ private TableStatistics makeTableStatistics(StatisticsFileCache statisticsFileCa | |
| for (IcebergColumnHandle columnHandle : selectedColumns) { | ||
| int fieldId = columnHandle.getId(); | ||
| ColumnStatistics.Builder columnBuilder = tableStats.getOrDefault(fieldId, ColumnStatistics.builder()); | ||
| Long nullCount = summary.getNullCounts().get(fieldId); | ||
| if (nullCount != null) { | ||
| columnBuilder.setNullsFraction(Estimate.of(nullCount / recordCount)); | ||
| } | ||
|
|
||
| Object min = summary.getMinValues().get(fieldId); | ||
| Object max = summary.getMaxValues().get(fieldId); | ||
| if (min instanceof Number && max instanceof Number) { | ||
| DoubleRange range = new DoubleRange(((Number) min).doubleValue(), ((Number) max).doubleValue()); | ||
| columnBuilder.setRange(Optional.of(range)); | ||
|
|
||
| // the histogram is generated by scanning the entire dataset. It is possible that | ||
| // the constraint prevents scanning portions of the table. Given that we know the | ||
| // range that the scan provides for a particular column, bound the histogram to the | ||
| // scanned range. | ||
|
|
||
| final DoubleRange histRange = range; | ||
| columnBuilder.setHistogram(columnBuilder.getHistogram() | ||
| .map(histogram -> DisjointRangeDomainHistogram | ||
| .addConjunction(histogram, Range.range(DOUBLE, histRange.getMin(), true, histRange.getMax(), true)))); | ||
| if (summary.hasValidColumnMetrics()) { | ||
| Long nullCount = summary.getNullCounts().get(fieldId); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A little curious, in what scenario did you encounter this problem, and is there a way to reproduce it? Even when I set "write.metadata.metrics.max-inferred-column-defaults" to 0, the column statistics still return an empty map rather than null.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hantangwangd Thanks for the comment.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The basic insertion in velox does not return any metrics to coordinator node. And it was planned to be added in following PR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hantangwangd I think it is possible to have a null instead of empty nullcount map.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @PingLiuPing Thanks for the details. I was considering whether we could add a test case for this scenario, and found that this example works: Could you add a similar test case for this change?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I will add a case for this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hantangwangd I cannot find this case in prestodb repo, but I think the case is suitable for this scenario, thank you very much. And I add this case into
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without the fix, |
||
| if (nullCount != null) { | ||
| columnBuilder.setNullsFraction(Estimate.of(nullCount / recordCount)); | ||
| } | ||
|
|
||
| Object min = summary.getMinValues().get(fieldId); | ||
| Object max = summary.getMaxValues().get(fieldId); | ||
| if (min instanceof Number && max instanceof Number) { | ||
| DoubleRange range = new DoubleRange(((Number) min).doubleValue(), ((Number) max).doubleValue()); | ||
| columnBuilder.setRange(Optional.of(range)); | ||
|
|
||
| // the histogram is generated by scanning the entire dataset. It is possible that | ||
| // the constraint prevents scanning portions of the table. Given that we know the | ||
| // range that the scan provides for a particular column, bound the histogram to the | ||
| // scanned range. | ||
|
|
||
| final DoubleRange histRange = range; | ||
| columnBuilder.setHistogram(columnBuilder.getHistogram() | ||
| .map(histogram -> DisjointRangeDomainHistogram | ||
| .addConjunction(histogram, Range.range(DOUBLE, histRange.getMin(), true, histRange.getMax(), true)))); | ||
| } | ||
| } | ||
| result.setColumnStatistics(columnHandle, columnBuilder.build()); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential loss of precision in nulls fraction calculation.
Cast nullCount and recordCount to double before dividing to avoid integer division and ensure correct fraction calculation.