Skip to content

Decrease lock contention in PipelinedStageExecution#14030

Closed
radek-kondziolka wants to merge 1 commit intotrinodb:masterfrom
radek-kondziolka:rk/mitigate_lock_contention_by_tracker2
Closed

Decrease lock contention in PipelinedStageExecution#14030
radek-kondziolka wants to merge 1 commit intotrinodb:masterfrom
radek-kondziolka:rk/mitigate_lock_contention_by_tracker2

Conversation

@radek-kondziolka
Copy link
Copy Markdown
Contributor

@radek-kondziolka radek-kondziolka commented Sep 7, 2022

Description

We observed that there is a high lock contention on io.trino.execution.scheduler.PipelinedStageExecution's monitor.
Lock contention before:

new_master

Lock contention after:

pr

We measured the throughput (40 workers r5.4xlarge, 64 concurrent queries) before/after

Throughput (queries/h) before:
12230 queries / h

Throughput (queries/h) after:
13897 queries / h (12% difference)

Non-technical explanation

The Trino is able to process more queries within one hour.

Release notes

(* ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Sep 7, 2022
@radek-kondziolka radek-kondziolka requested review from arhimondr, losipiuk, lukasz-stec and sopel39 and removed request for losipiuk September 7, 2022 09:05
@losipiuk
Copy link
Copy Markdown
Member

losipiuk commented Sep 7, 2022

please set the PR title so it is more eye-pleasing :)

@radek-kondziolka radek-kondziolka changed the title Rk/mitigate lock contention by tracker2 Mitigate lock contention in PipelinedStageExecution and SqlStage Sep 7, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super disturbing :) This commit increases complexity a lot (I think above my PR review confidence). I am not convinced we want that.

Copy link
Copy Markdown
Contributor Author

@radek-kondziolka radek-kondziolka Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the most safe, readable and easy to reason about approach is just to use synchronized, I agree. But, as a side effect we can observe high contention, what we was actually did. The code was rewritten to use lock-free data structures and atomic counters. The decision to move the machine state from ANY_STATE to FLUSHING/FINISHED is basing on the one specific counter (runningTasks / finishedTasks) to make it easier to reason about.

We have a high lock contention what limits scaling capabilities. In some moment we need to do something with that. I would say that it is expected that lock-free code is more complex that the "lock-full" one.

Taking a monitor of io.trino.execution.scheduler.PipelinedStageExection
in the updateTaskStatus method causes a high lock contention. Make this
method lock-free.
@radek-kondziolka radek-kondziolka force-pushed the rk/mitigate_lock_contention_by_tracker2 branch from 0b188b9 to 445cd35 Compare September 7, 2022 10:36
@radek-kondziolka radek-kondziolka changed the title Mitigate lock contention in PipelinedStageExecution and SqlStage Mitigate lock contention in PipelinedStageExecution Sep 7, 2022
@radek-kondziolka radek-kondziolka changed the title Mitigate lock contention in PipelinedStageExecution Decrease lock contention in PipelinedStageExecution Sep 7, 2022
@losipiuk
Copy link
Copy Markdown
Member

losipiuk commented Sep 15, 2022

Replaced with #14138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants