feat(testing): Add HiveDistributedBenchmarkRunner for optimizer benchmarks#27344
Conversation
Reviewer's GuideIntroduces a reusable HiveDistributedBenchmarkRunner utility for running and comparing Hive-backed optimizer benchmarks with correctness verification, and a specific BenchmarkGroupingSetsPreAggregation test that benchmarks grouping sets pre-aggregation strategies under different session configurations. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
…chmarks Add a reusable HiveDistributedBenchmarkRunner utility that runs SQL benchmarks against a Hive-backed DistributedQueryRunner, supporting side-by-side comparison of multiple session property configurations with warmup/measured iterations and correctness verification. Also add BenchmarkGroupingSetsPreAggregation which uses the runner to compare PreAggregateBeforeGroupId vs AddExchangesBelowPartialAggregation vs baseline on a GROUP BY CUBE query.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
HiveDistributedBenchmarkRunner.run(),baselineAvgis taken fromaverages.values().iterator().next()without checking that any scenarios were registered; consider throwing a clear exception ifscenariosis empty to avoid a confusingNoSuchElementException. - The
resultsMatchmethod comparesMaterializedResultinstances viaHashSetof rows, which ignores both ordering and duplicate row counts; if duplicates are possible, consider a multiset-style comparison or using an existing Presto result equality helper instead ofHashSet. - For the filesystem write in
HiveDistributedBenchmarkRunner.run(), consider using localimportstatements forFilesandPaths(rather than fully qualified names) and optionally including a timestamp or query identifier in the filename to avoid overwriting previous benchmark results.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `HiveDistributedBenchmarkRunner.run()`, `baselineAvg` is taken from `averages.values().iterator().next()` without checking that any scenarios were registered; consider throwing a clear exception if `scenarios` is empty to avoid a confusing `NoSuchElementException`.
- The `resultsMatch` method compares `MaterializedResult` instances via `HashSet` of rows, which ignores both ordering and duplicate row counts; if duplicates are possible, consider a multiset-style comparison or using an existing Presto result equality helper instead of `HashSet`.
- For the filesystem write in `HiveDistributedBenchmarkRunner.run()`, consider using local `import` statements for `Files` and `Paths` (rather than fully qualified names) and optionally including a timestamp or query identifier in the filename to avoid overwriting previous benchmark results.
## Individual Comments
### Comment 1
<location path="presto-hive/src/test/java/com/facebook/presto/hive/benchmark/HiveDistributedBenchmarkRunner.java" line_range="152-156" />
<code_context>
+ return avg;
+ }
+
+ private static boolean resultsMatch(MaterializedResult a, MaterializedResult b)
+ {
+ return a.getMaterializedRows().size() == b.getMaterializedRows().size()
+ && new java.util.HashSet<>(a.getMaterializedRows())
+ .equals(new java.util.HashSet<>(b.getMaterializedRows()));
+ }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The correctness check ignores row multiplicity; consider tightening it and adding tests to lock in the intended semantics.
`resultsMatch` compares only row count and `HashSet` equality, so it treats results that differ only in row multiplicities as equal. Since this underpins `runWithVerification` used by benchmarks, this can hide optimizer correctness issues.
Please either:
- Tighten the comparison to reflect the intended contract (e.g., multiset or ordered equality), or
- Clearly document that multiplicities are intentionally ignored and add unit tests that cover differing multiplicities and assert the expected outcome.
This will make the verification contract explicit and guard against accidental changes in future edits.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| private static boolean resultsMatch(MaterializedResult a, MaterializedResult b) | ||
| { | ||
| return a.getMaterializedRows().size() == b.getMaterializedRows().size() | ||
| && new java.util.HashSet<>(a.getMaterializedRows()) | ||
| .equals(new java.util.HashSet<>(b.getMaterializedRows())); |
There was a problem hiding this comment.
issue (bug_risk): The correctness check ignores row multiplicity; consider tightening it and adding tests to lock in the intended semantics.
resultsMatch compares only row count and HashSet equality, so it treats results that differ only in row multiplicities as equal. Since this underpins runWithVerification used by benchmarks, this can hide optimizer correctness issues.
Please either:
- Tighten the comparison to reflect the intended contract (e.g., multiset or ordered equality), or
- Clearly document that multiplicities are intentionally ignored and add unit tests that cover differing multiplicities and assert the expected outcome.
This will make the verification contract explicit and guard against accidental changes in future edits.
7967758 to
2c6763e
Compare
Summary
HiveDistributedBenchmarkRunner, a reusable utility that runs SQL benchmarks against a Hive-backedDistributedQueryRunnerwith side-by-side comparison of multiple session property configurationsThe runner supports warmup/measured iterations, writes results to a temp file, and includes correctness verification (all scenarios must produce identical results).
Test plan
HiveDistributedBenchmarkRunnercompiles successfullyDistributedQueryRunnerand supports multiple named scenariosrunWithVerificationworks as expected== NO RELEASE NOTE ==