Skip to content

Add session property to enforce timeout for query registration in HBO#23354

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:hbo_timeout
Aug 7, 2024
Merged

Add session property to enforce timeout for query registration in HBO#23354
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:hbo_timeout

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Aug 1, 2024

Description

HBO optimizer has several external dependency, it retrieves history statistics from external storage, get input table size information from metastore. It also canonicalize and hash the whole query plan, which can also be expensive when the query plan is large.
While latency of HBO optimizer is generally low, we have observed high latency for some queries occasionally. This has prohibited the usage of HBO optimizer in interactive workload with high latency requirement. Since HBO is an opportunistic optimization, i.e. queries still work without HBO, we can choose to skip it if it takes a long time to complete.
In this PR I add a session property to enforce timeout so that we can cap the maximum time spent in HBO optimizer

Motivation and Context

To enable HBO for interactive workload with high latency requirement

Impact

Enable HBO for interactive workload

Test Plan

Run end to end with production traffic, and also add unit tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add session property `enforce_history_based_optimizer_register_timeout` to enforce the maximum time HBO query registration can take :pr:`23354 `

@feilong-liu feilong-liu requested a review from presto-oss August 1, 2024 17:46
@feilong-liu feilong-liu marked this pull request as draft August 1, 2024 17:46
Comment on lines +128 to +139
boolean registerSucceed = false;
if (enforceHistoryBasedOptimizerRegistrationTimeout(session)) {
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<Boolean> future = executor.submit(() -> statsCalculator.registerPlan(newPlan, session, startTimeInNano, timeoutInMilliseconds));
try {
registerSucceed = future.get(timeoutInMilliseconds, MILLISECONDS);
}
catch (Exception ignored) {
}
finally {
executor.shutdownNow();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change

@steveburnett
Copy link
Contributor

== RELEASE NOTES ==

General Changes
* Add session property `enforce_history_based_optimizer_register_timeout` to enforce the maximum time HBO query registration can take :pr:`23354 `

@feilong-liu feilong-liu requested a review from rschlussel August 1, 2024 18:41
@feilong-liu feilong-liu marked this pull request as ready for review August 1, 2024 18:53
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the doc!

The config properties table is malformed, throwing this error in a local doc build:

/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/optimizer/history-based-optimization.rst:34: ERROR: Malformed table. Column span alignment problem in table line 3.

See my comment in PR #23335 and my own PR #23322 for how to fix the table.

@feilong-liu
Copy link
Contributor Author

Thanks for adding the doc!

The config properties table is malformed, throwing this error in a local doc build:

/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/optimizer/history-based-optimization.rst:34: ERROR: Malformed table. Column span alignment problem in table line 3.

See my comment in PR #23335 and my own PR #23322 for how to fix the table.

For the two PRs referenced above, I see imbalance between line 32 and line34. However in my case these two lines match. Is there any fix which can be done here?

@steveburnett
Copy link
Contributor

For the two PRs referenced above, I see imbalance between line 32 and line34. However in my case these two lines match. Is there any fix which can be done here?

I apologize, I must have been on a wrong branch from another PR. I have restarted my review confirming that I am working with your branch in this PR and you are correct, you don't have this error. Please ignore my previous comment.

steveburnett
steveburnett previously approved these changes Aug 1, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pulled this branch, made a new local doc build, and the new property entries in both tables look good. Thanks!

}

@Test
public void testHistoryBasedStatsCalculatorEnforceTimeOut()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it tests cases where the timeout is enabled, but never exceeded. Can you add a test case where the timeout gets enforced, eg. you set the timeout really low (like to 0?) and then check that we don't use hbo stats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test with time limit to be 0

"SELECT * FROM nation where substr(name, 1, 1) = 'A'",
anyTree(node(FilterNode.class, any()).withOutputRowCount(Double.NaN)));

// HBO Statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this using hbo statistics? Shouldn't registration fail due to exceeding the time limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to test the case where HBO statistics are written successfully (with large timeout limit), and later it tests that read of HBO statistics either succeed (if timeout limit is large) or fail (if timeout limit is 0).

I just added a few new test cases too, which is HBO statistics write failed (with 0 timeout limit), and no HBO statistics will be read.

public void noProcessQueryEvents()
{
try {
assertFalse(semaphore.tryAcquire(10, TimeUnit.SECONDS));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semaphore is released during the putStats function call. If HBO failed, there will be no putStats function call, hence we will not get the semaphore here. Use this function to assert the case where HBO timeout

@feilong-liu feilong-liu requested review from rschlussel and removed request for rschlussel August 1, 2024 22:22
@feilong-liu feilong-liu merged commit 39e74cc into prestodb:master Aug 7, 2024
@feilong-liu feilong-liu deleted the hbo_timeout branch August 7, 2024 21:00
@tdcmeehan tdcmeehan mentioned this pull request Aug 23, 2024
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants