-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ML] Adds progress reporting for transforms #41278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Adds progress reporting for transforms #41278
Conversation
|
Pinging @elastic/ml-core |
...n/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/DateHistogramGroupSource.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/HistogramGroupSource.java
Outdated
Show resolved
Hide resolved
| testCompile project(path: xpackModule('data-frame'), configuration: 'runtime') | ||
| } | ||
|
|
||
| integTestRunner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know very little in how our build system works. I added fields from what is in the ml native integration tests until the tests started executing. If anybody wants to give me a run down, and show me how most of these changes are unnecessary for my native client tests, please do :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have since paired down to only what is necessary. Next commit will have fewer changes to the build file
...rc/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameTransformProgressIT.java
Show resolved
Hide resolved
| persistentTaskActionListener.onResponse(existingTask); | ||
| // If the task already exists but is not assigned to a node, something is weird | ||
| // return a failure that includes the current assignment explanation (if one exists) | ||
| if (existingTask.isAssigned() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a check we should have been making to begin with and not doing it causes a bug. If the task exists, is not started, and is NOT assigned, that is an issue. We should not have tried to wait for its allocation and then cancel it.
Instead, I am opting for trying to start it directly if it is allocated, if not, return an error saying that the allocation failed.
| // If it is not able to be assigned to a node all together, we should just close the task completely | ||
| private boolean isNotStopped(PersistentTasksCustomMetaData.PersistentTask<?> task) { | ||
| DataFrameTransformState state = (DataFrameTransformState)task.getState(); | ||
| return state != null && state.getTaskState().equals(DataFrameTransformTaskState.STOPPED) == false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I am now starting the indexer in the node executor class, a new task should move away from the state of STOPPED eventually, after being allocated. Of course, if it is not allocated, that will never happen (hence keeping the allocation checks).
As for only verifying that the task is NOT stopped, setting it to a failed state is a real possibility and since the executor has no direct means of notifying us, setting the task to failed with a reason is good enough.
| indexer.set(indexerBuilder.build(this)); | ||
| } | ||
|
|
||
| static class ClientDataFrameIndexerBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class was necessary as the executor gathers the necessary dependencies in individual async calls. I needed a place to store them. Initially I kept them in local state in the executor, but that conflated ownership.
| } | ||
| } | ||
|
|
||
| static class ClientDataFrameIndexer extends DataFrameIndexer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to a static class as it seems to me that it should have been one from the get go. Conflating where the task's responsibilities ends and the indexer's starts has been a continuous problem with this design. This is just a first step in trying to make sure that there is at least some separation of concerns between the two.
...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java
Show resolved
Hide resolved
| ActionListener<PersistentTasksCustomMetaData.PersistentTask<?>> updateClusterStateListener = ActionListener.wrap( | ||
| task -> { | ||
| // Make a copy of the previousStats so that they are not constantly updated when `merge` is called | ||
| DataFrameIndexerTransformStats tempStats = new DataFrameIndexerTransformStats(previousStats).merge(getStats()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no longer a need to keep track of previous stats and merge them as the stats dependency is given to the indexer constructor now and increments can continue as normal.
hendrikmuhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I like the separation of the task and client indexer.
Mostly nits, just 1 thing I think needs to be addressed.
...java/org/elasticsearch/client/dataframe/transforms/hlrc/DataFrameTransformProgressTests.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java
Outdated
Show resolved
Hide resolved
...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java
Show resolved
Hide resolved
...frame/src/main/java/org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformTask.java
Show resolved
Hide resolved
| logger.info("Updating persistent state of transform [" + transform.getId() + "] to [" + state.toString() + "]"); | ||
| transformTask.currentCheckpoint.get(), | ||
| transformTask. stateReason.get()); | ||
| logger.info("Updating persistent state of transform [" + transformConfig.getId() + "] to [" + state.toString() + "]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, not changed, but would be good to reduce from info to debug.
...me/src/main/java/org/elasticsearch/xpack/dataframe/transforms/TransformProgressGatherer.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformProgress.java
Show resolved
Hide resolved
...me/src/main/java/org/elasticsearch/xpack/dataframe/transforms/TransformProgressGatherer.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/core/dataframe/transforms/DataFrameTransformState.java
Show resolved
Hide resolved
.../main/java/org/elasticsearch/xpack/core/dataframe/transforms/pivot/HistogramGroupSource.java
Outdated
Show resolved
Hide resolved
...s/src/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameGetAndGetStatsIT.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameTransformProgressIT.java
Show resolved
Hide resolved
...rc/test/java/org/elasticsearch/xpack/dataframe/integration/DataFrameTransformProgressIT.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/dataframe/action/TransportStartDataFrameTransformAction.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/xpack/dataframe/action/TransportStartDataFrameTransformAction.java
Show resolved
Hide resolved
.../org/elasticsearch/xpack/dataframe/transforms/DataFrameTransformPersistentTasksExecutor.java
Outdated
Show resolved
Hide resolved
hendrikmuhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run elasticsearch-ci/1 |
davidkyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run elasticsearch-ci/1 |
* [ML] Adds progress reporting for transforms * fixing after master merge * Addressing PR comments * removing unused imports * Adjusting afterKey handling and percentage to be 100* * Making sure it is a linked hashmap for serialization * removing unused import * addressing PR comments * removing unused import * simplifying code, only storing total docs and decrementing * adjusting for rewrite * removing initial progress gathering from executor
* [ML] Adds progress reporting for transforms * fixing after master merge * Addressing PR comments * removing unused imports * Adjusting afterKey handling and percentage to be 100* * Making sure it is a linked hashmap for serialization * removing unused import * addressing PR comments * removing unused import * simplifying code, only storing total docs and decrementing * adjusting for rewrite * removing initial progress gathering from executor
* [ML] Adds progress reporting for transforms * fixing after master merge * Addressing PR comments * removing unused imports * Adjusting afterKey handling and percentage to be 100* * Making sure it is a linked hashmap for serialization * removing unused import * addressing PR comments * removing unused import * simplifying code, only storing total docs and decrementing * adjusting for rewrite * removing initial progress gathering from executor
* [ML] Adds progress reporting for transforms * fixing after master merge * Addressing PR comments * removing unused imports * Adjusting afterKey handling and percentage to be 100* * Making sure it is a linked hashmap for serialization * removing unused import * addressing PR comments * removing unused import * simplifying code, only storing total docs and decrementing * adjusting for rewrite * removing initial progress gathering from executor
This admittedly large PR adds progress reporting to data frame transforms. The majority of the size is due to refactoring cause by yak-shaving[0] :(.
Design decisions
_stopand I needed part of it for gathering progress information when the ES Node executes the task._startcreates the task (and the executor sees that it is a new task) that it should automatically start without having to callstart()on the allocated task on the node.Considerations
Future work
[0] https://en.wiktionary.org/wiki/yak_shaving