Add timeout for HBO stats fetch#19933
Conversation
eff74bd to
169e486
Compare
There was a problem hiding this comment.
When HBO register function failed to load statistics, add warning and also add to logging
There was a problem hiding this comment.
Here we consider empty stats returned as failure of loading history
There was a problem hiding this comment.
Remove query ID when we invalidate the cache
There was a problem hiding this comment.
Pass timeout limit to stats provider
2e2cf6d to
d745c1a
Compare
There was a problem hiding this comment.
directly return if load of history failed
There was a problem hiding this comment.
Store the query ID which failed to get history stats
pranjalssh
left a comment
There was a problem hiding this comment.
Can you check if it catches timeout appropriately for pipeline in D46899410
A lot of cost will come from HistoricalStatisticsEquivalentPlanMarkingOptimizer where we prefetch all hbo stats
a8ef310 to
d8204b5
Compare
fb07a4a to
cdcbe41
Compare
There was a problem hiding this comment.
In unit tests, we want to test for queries which do not have join/aggregation too.
There was a problem hiding this comment.
If we only care about query plans with join/aggregation, return false here, and skip adding stats equivalent node for this query plan.
There was a problem hiding this comment.
Added for unit tests, where queries without join/aggregation are tested.
There was a problem hiding this comment.
if false returned, we will skip adding stats equivalent node in HBO marking optimizer.
There was a problem hiding this comment.
Even if fetch history timeouts or return empty, we still need StatsEquivalentPlanNode as it will be used in populating HBO history.
I debug the pipeline in this diff, the problem is not in fetching stats, but from calculating plan node hashes, which includes inputing table meta data access that is the problem here. This PR does not solve this problem, as I do not find a way to pass timeout to meta data access here, and solving this problem should be from a separate PR imo. But still, this PR can solve the following problems: 1) reduce overhead, as I added a new field to track queries which fail to fetch history stats, and skip trying to get HBO stats in this case 2) add new field in logging for timeout queries, so that we can track these queries. |
presto-spi/src/main/java/com/facebook/presto/spi/eventlistener/PlanOptimizerInformation.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
do we want to log additional information about the failure?
do we plan to use this for other usecases beyond HBO timeout?
There was a problem hiding this comment.
perhaps make the function more general and pass in the optimizer name as argument
There was a problem hiding this comment.
do we want to log additional information about the failure?
Maybe just this for now, we can add more later if needed.
do we plan to use this for other usecases beyond HBO timeout?
Other optimizers can populate this field as well.
perhaps make the function more general and pass in the optimizer name as argument
This is a private method in HBO optimizer, will not be used by other optimizers
There was a problem hiding this comment.
why is this is a set? do we expect to have more than 1 item there? or is it so we can quickly check for containment without worrying if it is set or not
There was a problem hiding this comment.
The query ID will be removed after the query completes. Yes, there can be more than 1 item, when multiple queries are running concurrently.
The life span of the query ID will be the same as in other Maps in the same class.
There was a problem hiding this comment.
that's cool!
I think another use case for this field could be tracking bugs in disabled optimizers: when checking if an optimizer is applicable (when verbose_optimizer_info_enabled=true) I added a try/block to prevent the main loop crashing with a buggy (but disabled) optimizer: instead of silently ignoring this, we can record that we found a bug for an applicable optimizer (see PlanOptimizer::isApplicable)
cdcbe41 to
22f0893
Compare
There was a problem hiding this comment.
lets call it "optimizer.history-based-optimizer-timeout". And use Duration instead of int. Config can then read values like "1s" or "100ms"
8944c73 to
99182f3
Compare
Add timeout to HBO mark optimizer, the timeout value is specified by session property history_based_optimizer_timeout_limit.
99182f3 to
e27ce32
Compare
Fix issue #20355
Depended by https://github.com/facebookexternal/presto-facebook/pull/2394
Test plan - (Please fill in how you tested your changes)
Tested locally end to end.