Decrease a lock contention in PipelinedStageExecution by radek-kondziolka · Pull Request #14138 · trinodb/trino

radek-kondziolka · 2022-09-15T08:30:54Z

Description

We observed that there is a high lock contention on io.trino.execution.scheduler.PipelinedStageExecution's monitor.
Lock contention before:

Lock contention after:

We measured the throughput (80 trino workers, 40 r5.4xlarge nodes, 64 concurrent queries) before/after

Throughput (queries/h) before:
6000 queries / h

Throughput (queries/h) after:
6600 queries / h (10% difference)

Non-technical explanation

The Trino is able to process more queries within one hour.

Release notes

( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(* ) Release notes are required, with the following suggested text:
Increase throughput when running highly concurrent workloads on big Trino clusters

losipiuk

Looks good. Some questions and editorials.

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java

Taking a monitor of io.trino.execution.scheduler.PipelinedStageExection in the updateTaskStatus method causes a high lock contention. Make this method lock-less.

lukasz-stec

some comments added.
Additionally, I think using both allTasksLock and concurrent map + atomicIntegers does not increase thread safety.

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java

radek-kondziolka · 2022-09-19T13:09:33Z

Additionally, I think using both allTasksLock and concurrent map + atomicIntegers does not increase thread safety.

What do you mean?

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java

sopel39 · 2022-09-22T09:53:48Z

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java

+
+    private void updateTaskStatus(TaskStatus taskStatus)
+    {
+        State stageState = stateMachine.getState();


I think this can be simplified without introducing too much complex multi-threaded code. A lot of the code in updateTaskStatus doesn't need to be synchronized:

private void updateTaskStatus(TaskStatus taskStatus) { State stageState = stateMachine.getState(); if (stageState.isDone()) { return; } TaskState taskState = taskStatus.getState(); switch (taskState) { case FAILED: RuntimeException failure = taskStatus.getFailures().stream() .findFirst() .map(this::rewriteTransportFailure) .map(ExecutionFailureInfo::toException) .orElse(new TrinoException(GENERIC_INTERNAL_ERROR, "A task failed for an unknown reason")); fail(failure); break; case CANCELED: // A task should only be in the canceled state if the STAGE is cancelled fail(new TrinoException(GENERIC_INTERNAL_ERROR, "A task is in the CANCELED state but stage is " + stageState)); break; case ABORTED: // A task should only be in the aborted state if the STAGE is done (ABORTED or FAILED) fail(new TrinoException(GENERIC_INTERNAL_ERROR, "A task is in the ABORTED state but stage is " + stageState)); break; case FLUSHING: addFlushingTask(taskStatus.getTaskId()); break; case FINISHED: addFinishedTask(taskStatus.getTaskId()); break; default: } if (stageState == SCHEDULED || stageState == RUNNING || stageState == FLUSHING) { if (taskState == TaskState.RUNNING) { stateMachine.transitionToRunning(); } if (isFlushing()) { stateMachine.transitionToFlushing(); } if (isAllTaskFinished()) { stateMachine.transitionToFinished(); } } } private synchronized void addFlushingTask(TaskId taskId) { flushingTasks.add(taskStatus.getTaskId()); } private synchronized void addFinishedTask(TaskId taskId) { finishedTasks.add(taskStatus.getTaskId()); flushingTasks.remove(taskStatus.getTaskId()); } private synchronized boolean isAllTaskFinished() { return finishedTasks.containsAll(allTasks); }

Then in subsequent commit I would probably make flushingTasks, finishedTasks lock free, e.g: use ConcurrentHashMap.newKeySet(); (still @Guarded(this) if method touches both finishedTasks and flushingTasks at same time) and:

private void addFlushingTask(TaskId taskId) { flushingTasks.add(taskStatus.getTaskId()); } private void addFinishedTask(TaskId taskId) { if (!finishedTasks.contains(taskId)) { synchronized(this) { // atomically move task to finished set. // nit: MAYBE it's not needed finishedTasks.add(taskStatus.getTaskId()); flushingTasks.remove(taskStatus.getTaskId()); } } }

Yes, it is easier but it does not resolve the source problem. You are still locking this in every call of updateTaskStatus. I tried that approach and there was still a high contention on this's monitor.

You can make addFlushingTask and addFinishedTask return boolean (true if element was added), e.g:

boolean taskStateChanged = addFlushingTask(taskStatus.getTaskId()); ... if (!stateChanged) { return; } if (stageState == SCHEDULED || stageState == RUNNING || stageState == FLUSHING) { if (taskState == TaskState.RUNNING) { stateMachine.transitionToRunning(); } if (isFlushing()) { stateMachine.transitionToFlushing(); } if (isAllTaskFinished()) { stateMachine.transitionToFinished(); } }

Alternatively we could probably make isFlushing() and isAllTaskFinished() lock-free somehow

I've implemented the version where the updateTaskStatus was not called when taskState was not changed.
It helped a bit, but the lock contention was still too high to be accepted. (like totally 1 day).

Alternatively we could probably make isFlushing() and isAllTaskFinished() lock-free somehow

This what I did in that PR.

I've implemented the version where the updateTaskStatus was not called when taskState was not changed.
It helped a bit, but the lock contention was still too high to be accepted. (like totally 1 day).

It could be because the whole updateTaskStatus was synchronized. That seems like a waste. There is no reason why transotionToXX should be synchronized and they do some non-trivial stuff like firing executor task

Well, it could be and could not be. It is hard to say. I did not check the version:
(1) decrease the number of calls updateTaskStatus & (2) change the scope of synchronized section
when I've checked that (1) does not help and (2) does not help (separately) I decided to make this method completly lock-free (lock-less)

radek-kondziolka · 2022-09-30T09:55:08Z

closed in favor of #14395

cla-bot bot added the cla-signed label Sep 15, 2022

radek-kondziolka requested review from arhimondr, losipiuk, lukasz-stec and sopel39 September 15, 2022 08:31

losipiuk reviewed Sep 15, 2022

View reviewed changes

losipiuk mentioned this pull request Sep 15, 2022

Decrease lock contention in PipelinedStageExecution #14030

Closed

radek-kondziolka force-pushed the rk/mitigate_lock_contention_by_tracker2__volatile_wasFlushing branch 2 times, most recently from b6067e5 to 35a509f Compare September 16, 2022 10:11

losipiuk reviewed Sep 16, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java Outdated Show resolved Hide resolved

radek-kondziolka force-pushed the rk/mitigate_lock_contention_by_tracker2__volatile_wasFlushing branch from 35a509f to 84b53f7 Compare September 16, 2022 10:22

losipiuk approved these changes Sep 16, 2022

View reviewed changes

Decrease a lock contention in PipelinedStageExecution

b09f551

Taking a monitor of io.trino.execution.scheduler.PipelinedStageExection in the updateTaskStatus method causes a high lock contention. Make this method lock-less.

radek-kondziolka force-pushed the rk/mitigate_lock_contention_by_tracker2__volatile_wasFlushing branch from 84b53f7 to b09f551 Compare September 19, 2022 06:54

lukasz-stec reviewed Sep 19, 2022

View reviewed changes

sopel39 reviewed Sep 22, 2022

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/execution/scheduler/PipelinedStageExecution.java Show resolved Hide resolved

sopel39 reviewed Sep 22, 2022

View reviewed changes

radek-kondziolka closed this Sep 30, 2022

Conversation

radek-kondziolka commented Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Non-technical explanation

Release notes

Uh oh!

losipiuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasz-stec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radek-kondziolka commented Sep 19, 2022

Uh oh!

Uh oh!

Uh oh!

sopel39 Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

radek-kondziolka Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

sopel39 Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

radek-kondziolka Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

radek-kondziolka Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

radek-kondziolka commented Sep 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

radek-kondziolka commented Sep 15, 2022 •

edited

Loading

sopel39 Sep 22, 2022 •

edited

Loading

sopel39 Sep 22, 2022 •

edited

Loading

radek-kondziolka Sep 22, 2022 •

edited

Loading

radek-kondziolka Sep 22, 2022 •

edited

Loading