-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tft.compute_and_apply_vocabulary
is not robust to RaggedTensor
type
#265
Comments
Could you please attach the full traceback? (that shows the tensorflow_transform stack) The transformed data that you've pasted above is the flat representation that's useful when encoding it again as a |
Sorry for the delay. Full traceback is appended. The example above was sort of for my own purposes to try and understand the problem. If TFT / TFX is able to handle that as any normal tensor, then that's fine, I was just concerned that if the transformed output had keys which the input training graph didn't know / understand that it could lead to trouble.
|
Tensorflow version
2.8.0
, TFT version1.7.0
.I am currently working on constructing a module which has some multivalent inputs, as well as a multi-hot label endpoint. Both of these need similar transform and feature engineering: convert a string into tokens then map the tokens into a sequence of integers which are fed to an embedding table. However, the number of tokens in a given example string is not constant, and
tft.compute_and_apply_vocabulary
seems to be unable to parse the output oftf.string.split
. In the context of the full model:which lands me at (snipped for brevity)
Commenting out the
tf.string.split
(and thus leaving them as whole strings) allows the pipeline execution to continue. Despite efforts, I cannot reproduce this exactly with a smaller example (this is work to scale up the pipeline through TFX, and providing that is out of the scope of the issue here). I am able to produce aRaggedTensor
with the output of a similar function in a working example with faked inputs. However, I have a hard time believing that the tensor produced by that example would be useable by the Embedding layer which it is putatively going to be connected to:It is highly undesirable, though perhaps acceptable, if there is a way to generate a normal tensor from the ragged one. The change:
However, that meets the same problem, as it appears earlier in
compute_and_apply_vocabulary
.Any advice is appreciated.
The text was updated successfully, but these errors were encountered: