Skip to content

Conversation

@benwtrent
Copy link
Member

@benwtrent benwtrent commented Jun 11, 2020

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

  1. ModelSizeInfo which contains model size information
  2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

model_size_info is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.

Native side change: elastic/ml-cpp#1349

@benwtrent benwtrent force-pushed the feature/ml-analytics-handle-compressed-model-stream branch from c772433 to f3ccd19 Compare June 11, 2020 19:24
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent
Copy link
Member Author

run elasticsearch-ci/2

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments

Consumer<Exception> failureHandler,
ExtractedFields extractedFields) {
this.provider = provider;
this.currentModelId = new AtomicReference<>("");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not initialize it empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is empty?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, ... I mean new AtomicReference<>()

try {
readyToStoreNewModel = false;
if (latch.await(30, TimeUnit.SECONDS) == false) {
LOGGER.error("[{}] Timed out (30s) waiting for inference model metadata to be stored", analytics.getId());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this happens, it seems the persister can get stuck, because the doc gets never be stored and readyToStoreNewModel is never reset? Correct me if I am wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it should reset. Good catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it should switch back in this timeout check. If it times out, it just took a long time to persist.

If the persistence itself fails, then I will reset the boolean flag.

Similar behavior for the persistence of the definition docs. exception being, if the definition doc is the eos, then I will reset the flag.

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (the expected format all looks correct from the C++ side) just a few minor comments.

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tidy up! However, I think the readyToStoreNewModel flag gets reset too soon.

@benwtrent benwtrent requested a review from tveasey June 30, 2020 14:59
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.setCompressedString(chunks.get(i))
.setCompressionVersion(TrainedModelConfig.CURRENT_DEFINITION_COMPRESSION_VERSION)
.setDefinitionLength(chunks.get(i).length())
.setEos(i == chunks.size() - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntStream.range is end exclusive so we will never get to i == chunks.size() - 1

Should it be set to false as Eos is set on the last doc in the list in line 223

@benwtrent
Copy link
Member Author

@elasticmachine update branch

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 2 more suggestions

CountDownLatch latch = storeTrainedModelDoc(trainedModelDefinitionDoc);
try {
if (latch.await(STORE_TIMEOUT_SEC, TimeUnit.SECONDS) == false) {
LOGGER.error("[{}] Timed out (30s) waiting for chunked inference definition to be stored", analytics.getId());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that STORE_TIMEOUT_SEC is a constant, the log message can use it as argument (in other places, too)


private final String definition;
private final int docNum;
private final Boolean eos;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a 3rd state (null)?
in code it looks like null and false are false.

it seems simpler to me to use boolean and handle null as part of parsing

@benwtrent benwtrent merged commit e881ea4 into elastic:master Jul 1, 2020
@benwtrent benwtrent deleted the feature/ml-analytics-handle-compressed-model-stream branch July 1, 2020 13:01
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jul 1, 2020
This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: elastic/ml-cpp#1349
benwtrent added a commit that referenced this pull request Jul 1, 2020
… (#58836)

* [ML] handles compressed model stream from native process (#58009)

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: elastic/ml-cpp#1349
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants