[Iceberg] Fix bugs in Iceberg statistics caching#24480
Merged
ZacBlanco merged 2 commits intoprestodb:masterfrom Feb 7, 2025
Merged
[Iceberg] Fix bugs in Iceberg statistics caching#24480ZacBlanco merged 2 commits intoprestodb:masterfrom
ZacBlanco merged 2 commits intoprestodb:masterfrom
Conversation
hantangwangd
previously approved these changes
Feb 6, 2025
Member
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for this fix, LGTM. Just one little nit.
presto-iceberg/src/test/java/com/facebook/presto/iceberg/hive/TestIcebergHiveStatistics.java
Outdated
Show resolved
Hide resolved
f78fc09 to
0a1a136
Compare
hantangwangd
previously approved these changes
Feb 6, 2025
tdcmeehan
previously approved these changes
Feb 6, 2025
imjalpreet
reviewed
Feb 6, 2025
presto-iceberg/src/test/java/com/facebook/presto/iceberg/hive/TestIcebergHiveStatistics.java
Outdated
Show resolved
Hide resolved
In the case of partial miss on the StatisticsFileCache, the loaded statistics were not combined with the cached statistics, causing discrepancies in query planning
Previously, only the cache stats were available through the JMX plugin because only the CacheStatsMBean was exported. The file size and column count distributions were not available. This fixes the issue by problem by exporting the StatisticsFileCache object instead and embedding the cache stats object
723d643
0a1a136 to
723d643
Compare
imjalpreet
approved these changes
Feb 6, 2025
hantangwangd
approved these changes
Feb 6, 2025
aaneja
reviewed
Feb 7, 2025
| statisticsFileCache.put(new StatisticsFileCacheKey(file, key), value); | ||
| finalResult.put(key, value); | ||
| }); | ||
| finalResult.putAll(cachedStats); |
Contributor
There was a problem hiding this comment.
nit: Let's add a comment // Include already cached stats (those that were not missing)
| statistics = getTableStatistics(queryRunner, session, "lineitem"); | ||
| RuntimeMetric partialMiss = runtimeStats.getMetrics().keySet().stream().filter(name -> name.contains("PartialMiss")).findFirst() | ||
| .map(runtimeStats::getMetric) | ||
| .orElseThrow(() -> new RuntimeException("partial miss on statistics cache should have occurred")); |
Contributor
There was a problem hiding this comment.
Instead lets use Assert.fail and log the runtimeStats
| TransactionId txid = getQueryRunner().getTransactionManager().beginTransaction(false); | ||
| Session txnSession = session.beginTransactionId(txid, getQueryRunner().getTransactionManager(), new AllowAllAccessControl()); | ||
| Map<String, ColumnHandle> columnHandles = getColumnHandles(table, txnSession); | ||
| Metadata meta = queryRunner.getMetadata(); |
Contributor
There was a problem hiding this comment.
nit: Use full variable name metadata
| .ifPresent(stat -> assertEquals(32, runtimeStats.getMetric(stat).getSum())); | ||
| runtimeStats.getMetrics().keySet().stream().filter(name -> name.contains("PuffinFileSize")).findFirst() | ||
| .ifPresent(stat -> assertTrue(runtimeStats.getMetric(stat).getSum() > 1024)); | ||
| // get them again to trigger retrieval of _some_ cached statistics |
Contributor
There was a problem hiding this comment.
Can we assert that eviction count was more than 0 ? This proves that the cache was maxed out and not all read statistics have been cached
This was referenced Mar 10, 2025
30 tasks
30 tasks
This was referenced May 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR has two commits which fixes two minor bugs in statistics file caching. The first is related to returning wrong statistics on partial cache misses. The second as to do with missing stats in the reported JMX statistics on the StatisticsFileCache object.
Motivation and Context
Bug fixes. See commit messages for details
Impact
Test Plan
Contributor checklist
Release Notes