Skip to content

Fix blocking resource groups when query queue full#18910

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
rmarduga:Fix-blocking-resource-groups-when-query-queue-full
Jan 18, 2023
Merged

Fix blocking resource groups when query queue full#18910
tdcmeehan merged 1 commit intoprestodb:masterfrom
rmarduga:Fix-blocking-resource-groups-when-query-queue-full

Conversation

@rmarduga
Copy link
Contributor

@rmarduga rmarduga commented Jan 10, 2023

When Presto Coordinator cannot run query nor queue query, it fails the error. And Coordinator does it under root resource group lock. Failing query triggers query state change that in particular triggers event listener handler that could be expensive. Doing it under resource group lock essentially blocks the resource group and severely degrades performance.

Test plan - (Please fill in how you tested your changes)
I ran the coordinator without and with the change. Within 1 hour I sent around 60K queries to the cluster. Less than 2K queries successfully executed and all other failed. Around 75% of the queries failures were due to QUERY_QUEUE_FULL error. I recorded the sessions with Java Flight Recorder. Picture below shows selected threads that were locked on RootInternalResourceGroup lock.
image

Here, the yellow blocks depict the thread waiting on the lock.
The picture below demonstrate the same settings with applied this code change.
image

Overall, the locking time for RootInternalResourceGroup monitor type reduced from 3 hour 24 min to 1 min 11 sec and for InternalResourceGroupManager monitor type it reduced from 4 days 4 hour to 9 sec.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

Resource Groups Changes
* Fix blocking resource groups when query queue full

@rmarduga rmarduga requested a review from swapsmagic January 10, 2023 18:00
@rmarduga rmarduga force-pushed the Fix-blocking-resource-groups-when-query-queue-full branch from 005fdb0 to 892aadb Compare January 10, 2023 18:25
@rmarduga rmarduga marked this pull request as ready for review January 10, 2023 23:47
@rmarduga rmarduga requested a review from a team as a code owner January 10, 2023 23:47
@rmarduga rmarduga requested a review from presto-oss January 10, 2023 23:47
@rmarduga rmarduga requested a review from tdcmeehan January 11, 2023 00:47
@rmarduga rmarduga force-pushed the Fix-blocking-resource-groups-when-query-queue-full branch 2 times, most recently from c659e4b to b84a51d Compare January 11, 2023 22:36
@rmarduga rmarduga requested a review from tdcmeehan January 11, 2023 22:41
When Presto Coordinator cannot run query nor queue query, it fails the error.
And Coordinator does it under root resource group lock. Failing query triggers query state change that in particular triggers event listener handler that could be expensive. Doing it under resource group lock essentially blocks the resource group and severely degrades performance.
@rmarduga rmarduga force-pushed the Fix-blocking-resource-groups-when-query-queue-full branch from b84a51d to 5bf3073 Compare January 17, 2023 16:59
@tdcmeehan tdcmeehan merged commit f2808c2 into prestodb:master Jan 18, 2023
@rmarduga rmarduga deleted the Fix-blocking-resource-groups-when-query-queue-full branch January 18, 2023 15:48
@wanglinsong wanglinsong mentioned this pull request Feb 25, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants