Skip to content

Conversation

@mwd410
Copy link
Contributor

@mwd410 mwd410 commented Aug 7, 2025

Description

When a query being executed by DirectTrinoClient takes longer than the query.client.timeout, the query is then canceled. This client is used in some cases when executing queries from within the coordinator

Additional context and related issues

The QueryTracker uses this timeout in its private boolean isAbandoned(T query) method to check it vs the last heartbeat. @lukasz-stec recommended i periodically record a heartbeat while waiting for this future to avoid falling into that case.

Release notes

(X) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Aug 7, 2025
@mwd410 mwd410 requested a review from lukasz-stec August 7, 2025 18:52
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please add a test with a small query.client.timeout and a query running longer

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test it it works? For example you can have a query that takes 5 seconds to complete and query.client.timeout set to 1 second, and see that you are able to run such query.

@mwd410 mwd410 force-pushed the mdeady/directTrinoClientHeartbeat branch 3 times, most recently from dbba1b5 to 1245cde Compare August 8, 2025 13:55
@mwd410 mwd410 requested review from kokosing and lukasz-stec August 8, 2025 13:56
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comments

@mwd410 mwd410 force-pushed the mdeady/directTrinoClientHeartbeat branch from 1245cde to 3a9fe4a Compare August 8, 2025 14:21
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kokosing
Copy link
Member

kokosing commented Aug 8, 2025

Ping me when it is green

@mwd410 mwd410 force-pushed the mdeady/directTrinoClientHeartbeat branch 4 times, most recently from 8ff4dbd to b4da9b4 Compare August 8, 2025 17:10
When a query being executed by DirectTrinoClient takes longer than the query.client.timeout,
the query is then canceled. This client is used in some cases when executing queries from
within the coordinator
@mwd410 mwd410 force-pushed the mdeady/directTrinoClientHeartbeat branch from b4da9b4 to 8d4cdb9 Compare August 8, 2025 17:33
@kokosing kokosing merged commit 026c712 into trinodb:master Aug 8, 2025
95 checks passed
@kokosing
Copy link
Member

kokosing commented Aug 8, 2025

Thanks!

@github-actions github-actions bot added this to the 477 milestone Aug 8, 2025
catch (TimeoutException e) {
// continue waiting until the query state changes or the exchange client is blocked.
// we need to periodically record the heartbeat to prevent the query from being canceled
dispatchQuery.recordHeartbeat();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I’ve seen abandoned queries before, especially when the cluster was under high pressure — it have been the actual cause back then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants