Table not initialized when serving model #237

rcrowe-google · 2021-04-27T21:22:20Z

We are blocked on experimenting with a new Tensorflow model in production because it fails to inference with this error:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.

We have narrowed down the issue to a bit of our code that applies a bm25 transformation in a Tensorflow-Transform job. As part of applying that transformation, it learns and applies a vocabulary however when we inference the model it fails to initialize the table from that vocabulary file on this line. Here is the BM25 code we are using and the line where it fails:
https://gist.github.com/awadalaa/e9290cf6674884d8e197fe315ed7d832#file-gistfile1-txt-L176-L177

More background:
We run a Tensorflow-Transform Beam/Dataflow job that executes this transformation and saves the transform graph. Later when we train our model, we save it with a signature that applies the TFT layer: transformed_features = model.tft_layer(parsed_features). We noticed that the exported model/assets directory does not include the intermediate vocabulary used by the above BM25 transformation although it does include every other vocabulary file learned in the TFT job. Any ideas why the above transformation would fail to export the vocabulary assets for a saved model?

Stack trace here:

Traceback (most recent call last):
File "/Users/aawad/Desktop/keras_predict.py", line 174, in <module>
print("prediction_output", predict(inference_data))
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1655, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1673, in _call_impl
return self._call_with_flat_signature(args, kwargs, cancellation_manager)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1722, in _call_with_flat_signature
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 106, in _call_flat
cancellation_manager)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/transform_features_layer/StatefulPartitionedCall/transform/apply_haystack_vocabulary_query_ngram_substrings_tags_ngram_substrings/hash_table_Lookup/LookupTableFindV2}}]] [Op:__inference_signature_wrapper_23443]

Function call stack:
signature_wrapper

The text was updated successfully, but these errors were encountered:

varshaan · 2021-04-27T21:40:36Z

What is the version of TFT being used?

abhijeetrao1988 · 2021-04-27T23:28:10Z

apache-beam[gcp]==2.28.0
tensorflow-transform==0.28.0
tensorflow==2.4.1

varshaan · 2021-04-29T05:15:19Z

Re: "We noticed that the exported model/assets directory does not include the intermediate vocabulary used by the above BM25 transformation" --> is this the model exported post training or the output of TFT? If the former, could you clarify if the file exists in the transform output?

awadalaa · 2021-04-29T13:45:16Z

hi @varshaan! Thank you for looking into this. The TFT dataflow job does export the assets. I see the vocab file under transform_fn/assets/needle_vocabulary

However, these vocab files do not appear in the trained model's model/assets/ directory. Both the TFT job and Training jobs were successful. We only noticed the error when attempting to reload and inference the model.

I also managed to reproduce the issue using this transformation:

def get_tfidf(self, feature_dict: Dict[str, tf.Tensor]) -> Dict[str, tf.Tensor]:
    outputs = dict()
    VOCAB_SIZE = 100000
    DELIMITERS = ".,!?() "
    for key, feature in feature_dict.items():
        word_tokens = tf.compat.v1.string_split(feature, DELIMITERS)
        word_indices = tft.compute_and_apply_vocabulary(
            word_tokens, top_k=VOCAB_SIZE
        )
        bow_indices, tfidf_weight = tft.tfidf(word_indices, VOCAB_SIZE + 1)
        tfidf_score = tf.math.reduce_mean(tf.sparse.to_dense(tfidf_weight), axis=-1)
        outputs[f"{key}_tfidf_score"] = tf.where(
            tf.math.is_nan(tfidf_score), tf.zeros_like(tfidf_score), tfidf_score
        )
    return outputs

In both cases (bm25 and tfidf), it seems to fail at prediction time on the apply_vocabulary step. For example the above transformation failed with:

tensorflow.python.framework.errors_impl.FailedPreconditionError:  Table not initialized.
	 [[{{node StatefulPartitionedCall/StatefulPartitionedCall/transform_features_layer/StatefulPartitionedCall/transform/compute_and_apply_vocabulary_1/apply_vocab/hash_table_Lookup/LookupTableFindV2}}]] [Op:__inference_signature_wrapper_2787]

varshaan · 2021-04-29T16:29:08Z

Since the table does exist in the Transform output, do you mind sharing the code snippet for how the trained model is being exported? In particular, is the tft_layer assigned to an attribute of the exported model [1]? I am assuming this is a Keras model from the stacktrace.

[1] https://github.com/tensorflow/transform/blob/master/examples/census_example_v2.py#L120

awadalaa · 2021-04-29T17:22:16Z

yep, it's a Keras model. The TFT layer is attached as an attribute of the keras model:

model.tft_layer = self.tft_transform_output.transform_features_layer()

This is the bit of code where we export the model https://gist.github.com/awadalaa/bcafb5da46ced7d9373f0d51ce389aa3#file-gistfile1-txt-L24

awadalaa · 2021-04-29T23:10:59Z

hi @varshaan I put together a small example repository that consistently reproduces the issue based on the census example you linked: https://github.com/awadalaa/TFTReproduceIssue

you can clone the repo and run this to reproduce the problem:

pip install -r requirements.txt
python -m data.task
python -m trainer.task
python -m inference.task

varshaan · 2021-05-04T06:15:37Z

Hi, That repro has 2 keras models. The "full_model" [1] does not track the tft layer. Adding full_model.tft_layer=self.tft_transform_output.transform_features_layer() after l69 in [1] fixes the repro. Normally no asset files would have been exported to the trainer model. However, since you define categorical feature columns for all the vocabularies other than the ones used to evaluate tfidf, the feature columns ended up tracking those asset files in the full_model and hence they got exported fine. The missing asset files evaluate features defined as numeric columns and hence this tracking through the feature columns didn't exist for them.

[1] https://github.com/awadalaa/TFTReproduceIssue/blob/main/trainer/model.py#L69

rcrowe-google · 2021-05-05T23:53:17Z

@awadalaa Does that fix the problem? If so then we should close this issue.

awadalaa · 2021-05-06T18:32:52Z

thank you @rcrowe-google and @varshaan! Attaching the tft_layer to the full_model does unblock us!

I'm not sure if the issue should be closed though. It was unexpected because the tft_layer was attached through the prediction signature and the predictions failed when using the signature. I would have expected that failure mode if I had made the predictions using the model.predict or model.__call__ explicitly but not when using the prediction signature. Any reason why the full_model needs to track the tft_layer here rather than rely on the prediction signatures tft_layer?

varshaan · 2021-05-11T17:48:08Z

My understanding is that Keras expects that all resources that need to be tracked are tracked by the main object that is being saved (in this case the full_model). I suspect it isn't common that the signatures are on a model different from the one being saved. I will try and verify this and get back to you.

rcrowe-google added the Etsy label Apr 27, 2021

varshaan self-assigned this Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table not initialized when serving model #237

Table not initialized when serving model #237

rcrowe-google commented Apr 27, 2021 •

edited

Loading

varshaan commented Apr 27, 2021 •

edited by rcrowe-google

Loading

abhijeetrao1988 commented Apr 27, 2021

varshaan commented Apr 29, 2021

awadalaa commented Apr 29, 2021

varshaan commented Apr 29, 2021

awadalaa commented Apr 29, 2021

awadalaa commented Apr 29, 2021

varshaan commented May 4, 2021

rcrowe-google commented May 5, 2021

awadalaa commented May 6, 2021

varshaan commented May 11, 2021

Table not initialized when serving model #237

Table not initialized when serving model #237

Comments

rcrowe-google commented Apr 27, 2021 • edited Loading

varshaan commented Apr 27, 2021 • edited by rcrowe-google Loading

abhijeetrao1988 commented Apr 27, 2021

varshaan commented Apr 29, 2021

awadalaa commented Apr 29, 2021

varshaan commented Apr 29, 2021

awadalaa commented Apr 29, 2021

awadalaa commented Apr 29, 2021

varshaan commented May 4, 2021

rcrowe-google commented May 5, 2021

awadalaa commented May 6, 2021

varshaan commented May 11, 2021

rcrowe-google commented Apr 27, 2021 •

edited

Loading

varshaan commented Apr 27, 2021 •

edited by rcrowe-google

Loading