[ML] handles compressed model stream from native process #58009

benwtrent · 2020-06-11T19:24:12Z

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

ModelSizeInfo which contains model size information
TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

model_size_info is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.

Native side change: elastic/ml-cpp#1349

…ics-handle-compressed-model-stream

elasticmachine · 2020-06-26T18:18:54Z

Pinging @elastic/ml-core (:ml)

…ics-handle-compressed-model-stream

benwtrent · 2020-06-30T11:03:11Z

run elasticsearch-ci/2

hendrikmuhs

added some comments

...ClusterTest/java/org/elasticsearch/xpack/ml/integration/ChunkedTrainedMoodelPersisterIT.java

...rc/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelDefinitionDoc.java

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

hendrikmuhs · 2020-06-30T11:33:56Z

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

+                                        Consumer<Exception> failureHandler,
+                                        ExtractedFields extractedFields) {
+        this.provider = provider;
+        this.currentModelId = new AtomicReference<>("");


why not initialize it empty?

it is empty?

well, ... I mean new AtomicReference<>()

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

hendrikmuhs · 2020-06-30T11:45:33Z

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

+        try {
+            readyToStoreNewModel = false;
+            if (latch.await(30, TimeUnit.SECONDS) == false) {
+                LOGGER.error("[{}] Timed out (30s) waiting for inference model metadata to be stored", analytics.getId());


if this happens, it seems the persister can get stuck, because the doc gets never be stored and readyToStoreNewModel is never reset? Correct me if I am wrong.

yeah, it should reset. Good catch

I do not think it should switch back in this timeout check. If it times out, it just took a long time to persist.

If the persistence itself fails, then I will reset the boolean flag.

Similar behavior for the persistence of the definition docs. exception being, if the definition doc is the eos, then I will reset the flag.

tveasey

Looks good (the expected format all looks correct from the C++ side) just a few minor comments.

...ClusterTest/java/org/elasticsearch/xpack/ml/integration/ChunkedTrainedMoodelPersisterIT.java

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

...est/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersisterTests.java

...ClusterTest/java/org/elasticsearch/xpack/ml/integration/ChunkedTrainedMoodelPersisterIT.java

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

…ics-handle-compressed-model-stream

tveasey

Nice tidy up! However, I think the readyToStoreNewModel flag gets reset too soon.

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

tveasey

LGTM

davidkyle

LGTM

davidkyle · 2020-07-01T08:39:16Z

.../internalClusterTest/java/org/elasticsearch/xpack/ml/integration/TrainedModelProviderIT.java

+                .setCompressedString(chunks.get(i))
+                .setCompressionVersion(TrainedModelConfig.CURRENT_DEFINITION_COMPRESSION_VERSION)
+                .setDefinitionLength(chunks.get(i).length())
+                .setEos(i == chunks.size() - 1)


IntStream.range is end exclusive so we will never get to i == chunks.size() - 1

Should it be set to false as Eos is set on the last doc in the list in line 223

benwtrent · 2020-07-01T11:06:19Z

@elasticmachine update branch

…ics-handle-compressed-model-stream

hendrikmuhs

LGTM, 2 more suggestions

hendrikmuhs · 2020-07-01T11:33:35Z

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java

+        CountDownLatch latch = storeTrainedModelDoc(trainedModelDefinitionDoc);
+        try {
+            if (latch.await(STORE_TIMEOUT_SEC, TimeUnit.SECONDS) == false) {
+                LOGGER.error("[{}] Timed out (30s) waiting for chunked inference definition to be stored", analytics.getId());


nit: now that STORE_TIMEOUT_SEC is a constant, the log message can use it as argument (in other places, too)

hendrikmuhs · 2020-07-01T11:42:46Z

...n/java/org/elasticsearch/xpack/ml/dataframe/process/results/TrainedModelDefinitionChunk.java

+
+    private final String definition;
+    private final int docNum;
+    private final Boolean eos;


do we need a 3rd state (null)?
in code it looks like null and false are false.

it seems simpler to me to use boolean and handle null as part of parsing

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: elastic/ml-cpp#1349

… (#58836) * [ML] handles compressed model stream from native process (#58009) This moves model storage from handling the fully parsed JSON string to handling two separate types of documents. 1. ModelSizeInfo which contains model size information 2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string. `model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition. Native side change: elastic/ml-cpp#1349

[ML] handles compressed model stream from native process

f3ccd19

benwtrent added >non-issue :ml Machine learning v8.0.0 v7.9.0 labels Jun 11, 2020

benwtrent force-pushed the feature/ml-analytics-handle-compressed-model-stream branch from c772433 to f3ccd19 Compare June 11, 2020 19:24

benwtrent added 4 commits June 26, 2020 07:54

Merge remote-tracking branch 'upstream/master' into feature/ml-analyt…

7082531

…ics-handle-compressed-model-stream

fixing after merge

3a5ce93

adjusting doc storage format

354fc25

fixing model storage

ff192a4

benwtrent mentioned this pull request Jun 26, 2020

[ML] Output inference definition in compressed format elastic/ml-cpp#1349

Merged

benwtrent marked this pull request as ready for review June 26, 2020 18:18

benwtrent added 3 commits June 26, 2020 15:29

Merge remote-tracking branch 'upstream/master' into feature/ml-analyt…

ad877b0

…ics-handle-compressed-model-stream

unmuting tests

40571e2

unmuting tests

8a340bf

hendrikmuhs reviewed Jun 30, 2020

View reviewed changes

tveasey reviewed Jun 30, 2020

View reviewed changes

davidkyle reviewed Jun 30, 2020

View reviewed changes

benwtrent added 2 commits June 30, 2020 10:36

addressing pr comments

b20fb05

Merge remote-tracking branch 'upstream/master' into feature/ml-analyt…

3ad327e

…ics-handle-compressed-model-stream

benwtrent requested review from davidkyle, hendrikmuhs and tveasey June 30, 2020 14:37

tveasey reviewed Jun 30, 2020

View reviewed changes

...src/main/java/org/elasticsearch/xpack/ml/dataframe/process/ChunkedTrainedModelPersister.java Outdated Show resolved Hide resolved

moving boolean flag

398d9f9

benwtrent requested a review from tveasey June 30, 2020 14:59

tveasey approved these changes Jun 30, 2020

View reviewed changes

davidkyle approved these changes Jul 1, 2020

View reviewed changes

fixing test

caed315

Merge remote-tracking branch 'upstream/master' into feature/ml-analyt…

dc25e38

…ics-handle-compressed-model-stream

hendrikmuhs approved these changes Jul 1, 2020

View reviewed changes

benwtrent merged commit e881ea4 into elastic:master Jul 1, 2020

benwtrent deleted the feature/ml-analytics-handle-compressed-model-stream branch July 1, 2020 13:01

benwtrent mentioned this pull request Jul 1, 2020

[7.x] [ML] handles compressed model stream from native process (#58009) #58836

Merged

alvarezmelissa87 mentioned this pull request Jul 1, 2020

[ML] DF Analytics: Re-enable regression and classification functional tests elastic/kibana#70455

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

[ML] handles compressed model stream from native process #58009

[ML] handles compressed model stream from native process #58009

Uh oh!

Conversation

benwtrent commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jun 26, 2020

Uh oh!

benwtrent commented Jun 30, 2020

Uh oh!

hendrikmuhs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Jul 1, 2020

Uh oh!

hendrikmuhs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

benwtrent commented Jun 11, 2020 •

edited

Loading