add Init stage to SqlTask by jklamer · Pull Request #19962 · trinodb/trino

jklamer · 2023-11-30T00:43:33Z

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

losipiuk · 2023-12-01T16:39:31Z

core/trino-main/src/main/java/io/trino/execution/SqlTask.java

-            // notify that task state changed (apart from initial RUNNING state notification)
-            if (newState != RUNNING) {
+            // notify that task state changed (apart from initial INITIALIZING state notification)
+            if (newState != INITIALIZING) {


do we want to notify about RUNNING now?

good question, if notifying useful for terminating states only then probably not?

if we don't care about notification from changing state from INITIALIZING -> RUNNING, then why an extra state is needed?

Extra state is needed to block execution of the query before all catalogs are loaded on the worker.

The reasoning for not notifying on certain changes is not something I fully understand

losipiuk · 2023-12-01T16:41:37Z

core/trino-main/src/main/java/io/trino/execution/SqlTask.java

    {
        requireNonNull(catalogs, "catalogs is null");
-        return this.catalogs.compareAndSet(null, requireNonNull(catalogs, "catalogs is null"));
+        return this.catalogs.compareAndSet(null, catalogs);


can it be called twice? Should you verify that compareAndSet returns true

This is allowed to be called more than once. In case of coordinator -> worker connection drop, or just timeout, or polling while waiting for task state == RUNNING

I think split assignments also come through here?

losipiuk · 2023-12-01T16:50:27Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

+                    ReentrantReadWriteLock.ReadLock catalogInitLock = catalogsLock.readLock();
+                    catalogInitLock.lock();
+                    try {
+                        connectorServicesProvider.ensureCatalogsLoaded(session, activeCatalogs);


looks like ensureCatalogsLoaded could return ListenableFuture

Wasn't sure how to ensure that the lock is acquired for the execution of the Future only when its needed

Or it cannot because it has to be complete its work with the lock boundary? what are those locks gaurding. Should't those be internal detail of ensureCatalogsLoaded

There is a race condition with pruneCatalog, where a catalog can be pruned immediately after creation, but before use. This lock guards against that

add code comment

losipiuk · 2023-12-01T16:56:00Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

+                try {
+                    catalogLoading.get(5, SECONDS);
+                    sqlTask.setCatalogsLoaded(immediateVoidFuture());
+                }


why special casing here with active wait. Why not always use the listener of future.

To avoid adding a round trip to most task creates mostly.

Most cases catalogs will be "loaded" in well under 5 seconds, because they are already loaded.

I am not a big fan of blocking http server thread for 5 seconds. But maybe this is not a big deal. I will let others chime in.

cc: @dain, @sopel39, @findepi

losipiuk

This is probably mostly fine. But I want more eyes on this one.

@sopel39 @dain @findepi

losipiuk · 2023-12-11T10:38:14Z

core/trino-main/src/main/java/io/trino/execution/TaskState.java

Update comment what this state means

losipiuk · 2023-12-11T10:42:45Z

core/trino-main/src/main/java/io/trino/execution/SqlTask.java

nit: use proper variable name in error message

losipiuk · 2023-12-11T10:45:03Z

core/trino-main/src/main/java/io/trino/execution/SqlTask.java

I think this will be triggered on failure on future too. You do not want that.

Replace those two lines with:

addCallback(catalogsLoadedFuture, new FutureCallback<>() { @Override public void onSuccess(Void result) { taskStateMachine.transitionToRunning(); } @Override public void onFailure(Throwable t) { taskStateMachine.failed(t); } }, directExecutor());

losipiuk · 2023-12-11T10:46:51Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

+                try {
+                    catalogLoading.get(5, SECONDS);
+                    sqlTask.setCatalogsLoaded(immediateVoidFuture());
+                }


I am not a big fan of blocking http server thread for 5 seconds. But maybe this is not a big deal. I will let others chime in.

cc: @dain, @sopel39, @findepi

losipiuk · 2023-12-11T10:54:45Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

+                    ReentrantReadWriteLock.ReadLock catalogInitLock = catalogsLock.readLock();
+                    catalogInitLock.lock();
+                    try {
+                        connectorServicesProvider.ensureCatalogsLoaded(session, activeCatalogs);


add code comment

losipiuk · 2023-12-11T10:56:03Z

core/trino-main/src/test/java/io/trino/execution/BaseTestSqlTaskManager.java

+ 1 is not obvious here. Explain in code comment what is going on here.

losipiuk · 2023-12-11T10:57:27Z

@pettyjamesm you can take a look too

sopel39 · 2023-12-12T10:35:51Z

core/trino-main/src/main/java/io/trino/execution/SqlTask.java

-            // notify that task state changed (apart from initial RUNNING state notification)
-            if (newState != RUNNING) {
+            // notify that task state changed (apart from initial INITIALIZING state notification)
+            if (newState != INITIALIZING) {


if we don't care about notification from changing state from INITIALIZING -> RUNNING, then why an extra state is needed?

sopel39 · 2023-12-12T10:36:54Z

core/trino-main/src/main/java/io/trino/execution/SqlTask.java

throw an error if compareAndSet failed. Can it be called multiple times?

yes. Update task can be called multiple times. Believe split assignments also come through this endpoint

sopel39 · 2023-12-12T10:37:39Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskExecutionFactory.java

When using this factory without SqlTaskManager

Is it only for testing? add a comment

Correct. Will do.

sopel39 · 2023-12-12T10:38:45Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

nit: Opt -> Optional

sopel39 · 2023-12-12T10:40:31Z

core/trino-main/src/main/java/io/trino/server/remotetask/HttpRemoteTask.java

you shouldn't update when pendingSourceSplitCount == 0, should you?

BTW: why if (taskStatusFetcher.getTaskStatus().getState() == INITIALIZING) is needed? It seems it would work without it. Also, ignoring splitAssignments looks wrong. Assertion (that it's empty) would be useful at very least

the task isn't running, and coordinator shouldn't treat it as running. So an update loop until it is running was my plan

I felt continuing to assign splits to a task that is still in init stage didn't make sense, because they won't get processes.

sopel39 · 2023-12-12T10:44:52Z

core/trino-main/src/main/java/io/trino/server/remotetask/HttpRemoteTask.java

Add a comment why you return here. Is it needed?

Additionally, you need to schedule update when task transitions from INITIALIZE to RUNNING.

You should add a test that query won't deadlock, e.g:

Task is scheduled in INITIALIZE state

All splits (single split) is scheduled in HttpRemoteTask

Task transitions to RUNNING state

Expected: HttpRemoteTask will schedule task update on transition from INITIALIZE -> RUNNING

I believe incrementing pendingRequestsCounter.incrementAndGet(); causes a reschedule of the update and will spin the update until the state is transitioned away. (The call site for this function has that logic).

pendingRequestsCounter.incrementAndGet();

It won't cause reschedule on it's own. You need to call triggerUpdate.

and will spin the update until the state is transitioned away

We shouldn't spin TaskUpdateRequest actively until stage goes away, because TaskUpdateRequest (and TaskInfo responses) are expensive. The proper way to do it is to listen for state changes from taskStatusFetcher and act upon these (e.g. call triggerUpdate)

BTW: is it OK to send splits while task is initializing? If so, maybe you don't need special handling in HttpRemoteTask

Is the taskStatusFetcher always polling? And can used to block query execution until state ~= Initializing?

I think I found a way to block stage execution until initialized is complete. I can probably remove this. But our latency would be limited by the latency of ContinuousTaskStatusFetcher?

Is it a problem if a task fails to initialize after accepting splits?

Is the taskStatusFetcher always polling? And can used to block query execution until state ~= Initializing?

Yes. It's using long pulling.

Is it a problem if a task fails to initialize after accepting splits?

Maybe not, but I don't know exactly how dynamic catalogs work

sopel39 · 2023-12-12T10:51:14Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

why lock is needed? Can catalogs be updated concurrently? If so, how do you guarantee that they are updated in correct order?

The lock is needed for a race condition with catalog pruning. The lock ensures proper ordering

The lock ensures proper ordering

What does prevent execution of:

init of catalogs prune catalogs

or

prune catalogs init of catalogs

lock on it's own doesn't enforce order. Do I miss something?

This is the PR for it: was able to replicate in unit test. The general order was that while prune was collecting the catalogs for a task, it was possible for a task to be assigned catalogs, and load them, only for the pruning to run right after before the query executed.

#19683 was able to replicate in unit test.

How does that PR relates to change here?

That PR introduced the lock. This PR just move the location of the lock

sopel39 · 2023-12-12T10:52:47Z

core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java

this will introduce additional query latency (task won't be updated with splits for 5s as only one request at a time is allowed). We need to figure another way to do this.

Let's chat with @dain. This comes directly from conversations with him. My understanding is that this will only wait 5 seconds if the catalogs are still loading ( and the query cannot execute anyway)

My understanding is that this will only wait 5 seconds if the catalogs are still loading ( and the query cannot execute anyway)

Do I understand correctly that catalogs can be different per query? What if catalogs load in 10 milliseconds? In that case split assignments will be blocked for remainder of 5s.

Lastly, is 5s always sufficient?

Is that how the future timeout works? It always blocks 5s?

5s is not always sufficient for loading all catalogs, which is why I need to block execution until the task has initialized all catalogs.

Just tested. I believe this early terminates

5s is not always sufficient for loading all catalogs, which is why I need to block execution until the task has initialized all catalogs.

I think this means it should not be time based, but event based (e.g. some future)

losipiuk · 2023-12-14T14:46:09Z

@dain can you please take a look a this one?

findepi · 2023-12-15T12:47:21Z

I don't have opinion about the code changes yet, but would be nice to fill PR Description to explain why it's a good change to have.

jklamer · 2023-12-19T22:21:58Z

Will try and get some testing started to prove this out as a POC and see what's missing.

jklamer · 2023-12-20T00:16:07Z

testing/trino-tests/src/test/java/io/trino/tests/TestTaskInitialization.java

These execute but I can't tell if this is sufficient to test

jklamer · 2024-01-26T22:12:19Z

scrapping

cla-bot bot added the cla-signed label Nov 30, 2023

losipiuk reviewed Dec 1, 2023

View reviewed changes

jklamer force-pushed the jklamer/TaskInitingState branch 3 times, most recently from 98549d3 to ed5fc1d Compare December 6, 2023 23:47

losipiuk reviewed Dec 11, 2023

View reviewed changes

sopel39 reviewed Dec 12, 2023

View reviewed changes

losipiuk requested a review from dain December 14, 2023 14:46

jklamer commented Dec 20, 2023

View reviewed changes

add Init stage to SqlTask

be6899b

jklamer force-pushed the jklamer/TaskInitingState branch 3 times, most recently from e17ba4c to d1492fa Compare December 20, 2023 16:23

SQL stage blocks on initing tasks

c802901

jklamer force-pushed the jklamer/TaskInitingState branch from d1492fa to c18746d Compare December 26, 2023 22:52

Soln to block task execution until catalogs loaded

1c9b52d

jklamer force-pushed the jklamer/TaskInitingState branch from c18746d to 1c9b52d Compare December 26, 2023 22:55

jklamer closed this Jan 26, 2024

Conversation

jklamer commented Nov 30, 2023

Description

Additional context and related issues

Release notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jklamer Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

losipiuk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

losipiuk commented Dec 11, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jklamer Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

jklamer Dec 12, 2023 •

edited

Loading

sopel39 Dec 12, 2023 •

edited

Loading

jklamer Dec 12, 2023 •

edited

Loading

sopel39 Dec 12, 2023 •

edited

Loading

jklamer Dec 19, 2023 •

edited

Loading