Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table not initialized when serving model #237

Open
rcrowe-google opened this issue Apr 27, 2021 · 11 comments
Open

Table not initialized when serving model #237

rcrowe-google opened this issue Apr 27, 2021 · 11 comments
Assignees
Labels

Comments

@rcrowe-google
Copy link

rcrowe-google commented Apr 27, 2021

Posting for @awadalaa

We are blocked on experimenting with a new Tensorflow model in production because it fails to inference with this error:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.

We have narrowed down the issue to a bit of our code that applies a bm25 transformation in a Tensorflow-Transform job. As part of applying that transformation, it learns and applies a vocabulary however when we inference the model it fails to initialize the table from that vocabulary file on this line. Here is the BM25 code we are using and the line where it fails:
https://gist.github.com/awadalaa/e9290cf6674884d8e197fe315ed7d832#file-gistfile1-txt-L176-L177

More background:
We run a Tensorflow-Transform Beam/Dataflow job that executes this transformation and saves the transform graph. Later when we train our model, we save it with a signature that applies the TFT layer: transformed_features = model.tft_layer(parsed_features). We noticed that the exported model/assets directory does not include the intermediate vocabulary used by the above BM25 transformation although it does include every other vocabulary file learned in the TFT job. Any ideas why the above transformation would fail to export the vocabulary assets for a saved model?

Stack trace here:

Traceback (most recent call last):
File "/Users/aawad/Desktop/keras_predict.py", line 174, in <module>
print("prediction_output", predict(inference_data))
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1655, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1673, in _call_impl
return self._call_with_flat_signature(args, kwargs, cancellation_manager)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1722, in _call_with_flat_signature
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 106, in _call_flat
cancellation_manager)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/transform_features_layer/StatefulPartitionedCall/transform/apply_haystack_vocabulary_query_ngram_substrings_tags_ngram_substrings/hash_table_Lookup/LookupTableFindV2}}]] [Op:__inference_signature_wrapper_23443]

Function call stack:
signature_wrapper

@varshaan
Copy link
Contributor

varshaan commented Apr 27, 2021

What is the version of TFT being used?

@varshaan varshaan self-assigned this Apr 27, 2021
@abhijeetrao1988
Copy link

apache-beam[gcp]==2.28.0
tensorflow-transform==0.28.0
tensorflow==2.4.1

@varshaan
Copy link
Contributor

Re: "We noticed that the exported model/assets directory does not include the intermediate vocabulary used by the above BM25 transformation" --> is this the model exported post training or the output of TFT? If the former, could you clarify if the file exists in the transform output?

@awadalaa
Copy link

hi @varshaan! Thank you for looking into this. The TFT dataflow job does export the assets. I see the vocab file under transform_fn/assets/needle_vocabulary

However, these vocab files do not appear in the trained model's model/assets/ directory. Both the TFT job and Training jobs were successful. We only noticed the error when attempting to reload and inference the model.

I also managed to reproduce the issue using this transformation:

def get_tfidf(self, feature_dict: Dict[str, tf.Tensor]) -> Dict[str, tf.Tensor]:
    outputs = dict()
    VOCAB_SIZE = 100000
    DELIMITERS = ".,!?() "
    for key, feature in feature_dict.items():
        word_tokens = tf.compat.v1.string_split(feature, DELIMITERS)
        word_indices = tft.compute_and_apply_vocabulary(
            word_tokens, top_k=VOCAB_SIZE
        )
        bow_indices, tfidf_weight = tft.tfidf(word_indices, VOCAB_SIZE + 1)
        tfidf_score = tf.math.reduce_mean(tf.sparse.to_dense(tfidf_weight), axis=-1)
        outputs[f"{key}_tfidf_score"] = tf.where(
            tf.math.is_nan(tfidf_score), tf.zeros_like(tfidf_score), tfidf_score
        )
    return outputs

In both cases (bm25 and tfidf), it seems to fail at prediction time on the apply_vocabulary step. For example the above transformation failed with:

tensorflow.python.framework.errors_impl.FailedPreconditionError:  Table not initialized.
	 [[{{node StatefulPartitionedCall/StatefulPartitionedCall/transform_features_layer/StatefulPartitionedCall/transform/compute_and_apply_vocabulary_1/apply_vocab/hash_table_Lookup/LookupTableFindV2}}]] [Op:__inference_signature_wrapper_2787]

@varshaan
Copy link
Contributor

Since the table does exist in the Transform output, do you mind sharing the code snippet for how the trained model is being exported? In particular, is the tft_layer assigned to an attribute of the exported model [1]? I am assuming this is a Keras model from the stacktrace.

[1] https://github.com/tensorflow/transform/blob/master/examples/census_example_v2.py#L120

@awadalaa
Copy link

yep, it's a Keras model. The TFT layer is attached as an attribute of the keras model:

model.tft_layer = self.tft_transform_output.transform_features_layer()

This is the bit of code where we export the model https://gist.github.com/awadalaa/bcafb5da46ced7d9373f0d51ce389aa3#file-gistfile1-txt-L24

@awadalaa
Copy link

hi @varshaan I put together a small example repository that consistently reproduces the issue based on the census example you linked: https://github.com/awadalaa/TFTReproduceIssue

you can clone the repo and run this to reproduce the problem:

pip install -r requirements.txt
python -m data.task
python -m trainer.task
python -m inference.task

@varshaan
Copy link
Contributor

varshaan commented May 4, 2021

Hi, That repro has 2 keras models. The "full_model" [1] does not track the tft layer. Adding full_model.tft_layer=self.tft_transform_output.transform_features_layer() after l69 in [1] fixes the repro. Normally no asset files would have been exported to the trainer model. However, since you define categorical feature columns for all the vocabularies other than the ones used to evaluate tfidf, the feature columns ended up tracking those asset files in the full_model and hence they got exported fine. The missing asset files evaluate features defined as numeric columns and hence this tracking through the feature columns didn't exist for them.

[1] https://github.com/awadalaa/TFTReproduceIssue/blob/main/trainer/model.py#L69

@rcrowe-google
Copy link
Author

@awadalaa Does that fix the problem? If so then we should close this issue.

@awadalaa
Copy link

awadalaa commented May 6, 2021

thank you @rcrowe-google and @varshaan! Attaching the tft_layer to the full_model does unblock us!

I'm not sure if the issue should be closed though. It was unexpected because the tft_layer was attached through the prediction signature and the predictions failed when using the signature. I would have expected that failure mode if I had made the predictions using the model.predict or model.__call__ explicitly but not when using the prediction signature. Any reason why the full_model needs to track the tft_layer here rather than rely on the prediction signatures tft_layer?

@varshaan
Copy link
Contributor

My understanding is that Keras expects that all resources that need to be tracked are tracked by the main object that is being saved (in this case the full_model). I suspect it isn't common that the signatures are on a model different from the one being saved. I will try and verify this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants