[ML] Adds progress reporting for transforms #41278

benwtrent · 2019-04-16T22:09:13Z

This admittedly large PR adds progress reporting to data frame transforms. The majority of the size is due to refactoring cause by yak-shaving[0] :(.

Design decisions

I opted to put the progress reporting into its own object (so we can add more fields as we desire in the future), and put it directly under the State object. I kept it separated from the checkpoint information just for simplicity's sake as the two pieces of information (checkpoint status and progress) are two separate pieces of information.
Also, due to yak-shaving, much refactoring was done in this PR. All the refactoring done in this PR would have to be done eventually when we cancel the task on _stop and I needed part of it for gathering progress information when the ES Node executes the task.
I am now having the task automatically start when the node executor kicks it off. This is part of the yak-shaving refactoring. It makes sense that if _start creates the task (and the executor sees that it is a new task) that it should automatically start without having to call start() on the allocated task on the node.
Progress information is now stored in the state, gathering the "remaining docs" via a query could require a very costly query. Specifically range queries against terms are very expensive.
Total number of docs is a simple enough query.

Considerations

This is a "good enough" progress reporting. No guarantees are made as the index could be updated so that the cursor actually hits more or fewer docs than initially gathered.

Future work

Have the total docs query take checkpointing into account. Right now, it only utilizes the dataframe source query. As new checkpoints are executed, the query will have to change to give an accurate count of the total docs expected to be processed in that checkpoint.

[0] https://en.wiktionary.org/wiki/yak_shaving

…culate-docs-left

elasticmachine · 2019-04-16T22:09:15Z

Pinging @elastic/ml-core

...n/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/DateHistogramGroupSource.java

.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/HistogramGroupSource.java

benwtrent · 2019-04-16T22:14:37Z

x-pack/plugin/data-frame/qa/single-node-tests/build.gradle

  testCompile project(path: xpackModule('data-frame'), configuration: 'runtime')
 }

+integTestRunner {


I know very little in how our build system works. I added fields from what is in the ml native integration tests until the tests started executing. If anybody wants to give me a run down, and show me how most of these changes are unnecessary for my native client tests, please do :).

I have since paired down to only what is necessary. Next commit will have fewer changes to the build file

...rc/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameTransformProgressIT.java

benwtrent · 2019-04-16T22:18:04Z

...in/java/org/elasticsearch/xpack/dataframe/action/TransportStartDataFrameTransformAction.java

-                        persistentTaskActionListener.onResponse(existingTask);
+                        // If the task already exists but is not assigned to a node, something is weird
+                        // return a failure that includes the current assignment explanation (if one exists)
+                        if (existingTask.isAssigned() == false) {


This is a check we should have been making to begin with and not doing it causes a bug. If the task exists, is not started, and is NOT assigned, that is an issue. We should not have tried to wait for its allocation and then cancel it.

Instead, I am opting for trying to start it directly if it is allocated, if not, return an error saying that the allocation failed.

benwtrent · 2019-04-16T22:19:55Z

...in/java/org/elasticsearch/xpack/dataframe/action/TransportStartDataFrameTransformAction.java

+        // If it is not able to be assigned to a node all together, we should just close the task completely
+        private boolean isNotStopped(PersistentTasksCustomMetaData.PersistentTask<?> task) {
+            DataFrameTransformState state = (DataFrameTransformState)task.getState();
+            return state != null && state.getTaskState().equals(DataFrameTransformTaskState.STOPPED) == false;


Since I am now starting the indexer in the node executor class, a new task should move away from the state of STOPPED eventually, after being allocated. Of course, if it is not allocated, that will never happen (hence keeping the allocation checks).

As for only verifying that the task is NOT stopped, setting it to a failed state is a real possibility and since the executor has no direct means of notifying us, setting the task to failed with a reason is good enough.

benwtrent · 2019-04-16T22:21:43Z

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

+        indexer.set(indexerBuilder.build(this));
+    }
+
+    static class ClientDataFrameIndexerBuilder {


This class was necessary as the executor gathers the necessary dependencies in individual async calls. I needed a place to store them. Initially I kept them in local state in the executor, but that conflated ownership.

benwtrent · 2019-04-16T22:23:04Z

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

+        }
+    }
+
+    static class ClientDataFrameIndexer extends DataFrameIndexer {


I moved this to a static class as it seems to me that it should have been one from the get go. Conflating where the task's responsibilities ends and the indexer's starts has been a continuous problem with this design. This is just a first step in trying to make sure that there is at least some separation of concerns between the two.

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

benwtrent · 2019-04-16T22:24:49Z

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

            ActionListener<PersistentTasksCustomMetaData.PersistentTask<?>> updateClusterStateListener = ActionListener.wrap(
                task -> {
-                    // Make a copy of the previousStats so that they are not constantly updated when `merge` is called
-                    DataFrameIndexerTransformStats tempStats = new DataFrameIndexerTransformStats(previousStats).merge(getStats());


There is no longer a need to keep track of previous stats and merge them as the stats dependency is given to the indexer constructor now and increments can continue as normal.

hendrikmuhs

Looks good, I like the separation of the task and client indexer.

Mostly nits, just 1 thing I think needs to be addressed.

...java/org/elasticsearch/client/dataframe/transforms/hlrc/DataFrameTransformProgressTests.java

.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

hendrikmuhs · 2019-04-17T12:32:35Z

...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java

-            logger.info("Updating persistent state of transform [" + transform.getId() + "] to [" + state.toString() + "]");
+                transformTask.currentCheckpoint.get(),
+                transformTask. stateReason.get());
+            logger.info("Updating persistent state of transform [" + transformConfig.getId() + "] to [" + state.toString() + "]");


I know, not changed, but would be good to reduce from info to debug.

...me/src/main/java/org/elasticsearch/xpack/dataframe/transforms/TransformProgressGatherer.java

.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformProgress.java

...me/src/main/java/org/elasticsearch/xpack/dataframe/transforms/TransformProgressGatherer.java

…culate-docs-left

...src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformState.java

.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/HistogramGroupSource.java

...s/src/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameGetAndGetStatsIT.java

...rc/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameTransformProgressIT.java

...in/java/org/elasticsearch/xpack/dataframe/action/TransportStartDataFrameTransformAction.java

.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java

hendrikmuhs

LGTM

benwtrent · 2019-04-23T13:26:40Z

run elasticsearch-ci/1

davidkyle

LGTM

…culate-docs-left

hendrikmuhs · 2019-04-25T05:35:57Z

run elasticsearch-ci/1

* [ML] Adds progress reporting for transforms * fixing after master merge * Addressing PR comments * removing unused imports * Adjusting afterKey handling and percentage to be 100* * Making sure it is a linked hashmap for serialization * removing unused import * addressing PR comments * removing unused import * simplifying code, only storing total docs and decrementing * adjusting for rewrite * removing initial progress gathering from executor

benwtrent added 3 commits April 16, 2019 16:06

[ML] Adds progress reporting for transforms

69f7f31

Merge remote-tracking branch 'upstream/master' into feature/ml-df-cal…

3ea17b1

…culate-docs-left

fixing after master merge

3494666

benwtrent added >non-issue v8.0.0 v7.2.0 :ml/Transform Transform labels Apr 16, 2019

benwtrent commented Apr 16, 2019

View reviewed changes

hendrikmuhs reviewed Apr 17, 2019

View reviewed changes

davidkyle reviewed Apr 17, 2019

View reviewed changes

.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformProgress.java Show resolved Hide resolved

...me/src/main/java/org/elasticsearch/xpack/dataframe/transforms/TransformProgressGatherer.java Outdated Show resolved Hide resolved

benwtrent added 4 commits April 17, 2019 09:02

Addressing PR comments

0cb84d9

removing unused imports

c2a3b5c

Merge remote-tracking branch 'upstream/master' into feature/ml-df-cal…

23fb708

…culate-docs-left

Adjusting afterKey handling and percentage to be 100*

5ac8e5b

benwtrent requested review from davidkyle and hendrikmuhs April 17, 2019 16:15

davidkyle reviewed Apr 17, 2019

View reviewed changes

...src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformState.java Show resolved Hide resolved

benwtrent added 2 commits April 17, 2019 13:22

Making sure it is a linked hashmap for serialization

2e4adf3

removing unused import

aeb5fbe

davidkyle reviewed Apr 18, 2019

View reviewed changes

benwtrent added 4 commits April 18, 2019 09:03

addressing PR comments

edeb74b

removing unused import

1d8f80a

simplifying code, only storing total docs and decrementing

d6530fa

adjusting for rewrite

ba37849

benwtrent requested a review from davidkyle April 18, 2019 19:12

benwtrent added 2 commits April 23, 2019 07:37

Merge branch 'master' into feature/ml-df-calculate-docs-left

8e5aba4

removing initial progress gathering from executor

fccbf68

hendrikmuhs approved these changes Apr 23, 2019

View reviewed changes

davidkyle approved these changes Apr 23, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feature/ml-df-cal…

8912717

…culate-docs-left

benwtrent merged commit 9bf8b5a into elastic:master Apr 25, 2019

benwtrent deleted the feature/ml-df-calculate-docs-left branch April 25, 2019 12:54

benwtrent mentioned this pull request Apr 25, 2019

[ML] Adds progress reporting for transforms (#41278) #41529

Merged

benwtrent mentioned this pull request May 1, 2019

[ML] Correct indexer state on task re-allocation #41724

Merged

hendrikmuhs mentioned this pull request May 2, 2019

[ML-DataFrame] simplify indexer by moving members to base class #41741

Merged

benwtrent mentioned this pull request May 2, 2019

[ML] Correct indexer state on task re-allocation (#41724) #41751

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

[ML] Adds progress reporting for transforms #41278

[ML] Adds progress reporting for transforms #41278

Uh oh!

Conversation

benwtrent commented Apr 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design decisions

Considerations

Future work

Uh oh!

elasticmachine commented Apr 16, 2019

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hendrikmuhs left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Apr 23, 2019

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

hendrikmuhs commented Apr 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

benwtrent commented Apr 16, 2019 •

edited

Loading