TFX.components.transform id #6278

raminmohammadi · 2023-09-13T16:22:30Z

If the bug is related to a specific library below, please raise an issue in the
respective repo directly:

TensorFlow Data Validation Repo

TensorFlow Model Analysis Repo

TensorFlow Transform Repo

TensorFlow Serving Repo

System information

Have I specified the code to reproduce the issue (Yes, No): yes
Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
Interactive Notebook, Google Cloud, etc): Linux, Notebook, Colab
TensorFlow version: 2.13.0
TFX Version: 1.14.0
Python version: 3.8
Python dependencies (from pip freeze output):
requirements.txt

Describe the current behavior:

this problem only happens when i use the transfrom as part of the tfx. I'm encountering an issue while working with the "transform" function, which involves processing individual input data items. Each of these data inputs consists of two keys: 'entities' and 'text'.

My specific task is to perform a transformation on the "text" dimension of the input tensor, breaking it down into individual characters. For example, given the input "This is a test," I intend to follow these steps:

Split the text into character arrays: [['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['t', 'e', 's', 't']]

Code 1: tf.strings.unicode_split(tf.strings.split('This is a test'), input_encoding='UTF-8')
Map each character to a dictionary, obtain its index, and pad each word to a width of 12 characters.

Code 2: tf.map_fn(get_index, text, fn_output_signature=tf.TensorSpec(shape=(1, Wlength), dtype=tf.int64, name=None))

currently transform only returns one vector starting with 1 and rest 0:
example = [[1, 0,0,0,0,0,0,0,0]]

Describe the expected behavior

expected output should be:

<tf.Tensor: shape=(4, 1, 12), dtype=int64, numpy=
array([[[58, 20, 21, 31, 0, 0, 0, 0, 0, 0, 0, 0]],

   [[21, 31,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

   [[13,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

   [[32, 17, 31, 32,  0,  0,  0,  0,  0,  0,  0,  0]]])>

Standalone code to reproduce the issue

Providing a bare minimum test case or step(s) to reproduce the problem will
greatly help us to debug the issue. If possible, please share a link to
Colab/Jupyter/any notebook.

https://colab.research.google.com/drive/1ap8Gycu7s--mz0VAxp4W2DphAd1HW1yi?usp=sharing

Name of your Organization (Optional)

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.

The text was updated successfully, but these errors were encountered:

singhniraj08 · 2023-09-19T08:58:10Z

@raminmohammadi,

I am unable to run the shared notebook. My environment crashes while using tf.data.experimental.TFRecordWriter to write the TF Record file. Looking at the transform component, it should produce similar results within or outside TFX pipeline.

Can you please make sure the example notebook works so that we can replicate the issue on our end. Thank you!

raminmohammadi · 2023-09-26T20:26:18Z

not sure how to run this! I am able to run the jupyter on a local machine but on colab it fails at the moment. Will appreciate any feedback on this or if you can run this locally.

singhniraj08 · 2023-09-29T09:29:29Z

@raminmohammadi, I tried but was unable to create a local setup to test your notebook because of some permission issues.

@zoyahav, Can you please give some feedback why the transform output in TFX pipeline is different from expected output when running the transformation outside TFX pipeline. Thanks.

raminmohammadi · 2023-10-10T15:58:34Z

Any updates on this issue? Tnx

raminmohammadi added the type:bug label Sep 13, 2023

singhniraj08 self-assigned this Sep 14, 2023

singhniraj08 added the stat:awaiting response label Sep 19, 2023

singhniraj08 mentioned this issue Sep 19, 2023

tfx.components.Transform returns invalid results tensorflow/transform#308

Closed

google-ml-butler bot removed the stat:awaiting response label Sep 26, 2023

singhniraj08 assigned zoyahav Sep 29, 2023

singhniraj08 added the stat:awaiting tensorflower label Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFX.components.transform id #6278

TFX.components.transform id #6278

raminmohammadi commented Sep 13, 2023

singhniraj08 commented Sep 19, 2023

raminmohammadi commented Sep 26, 2023

singhniraj08 commented Sep 29, 2023

raminmohammadi commented Oct 10, 2023

TFX.components.transform id #6278

TFX.components.transform id #6278

Comments

raminmohammadi commented Sep 13, 2023

singhniraj08 commented Sep 19, 2023

raminmohammadi commented Sep 26, 2023

singhniraj08 commented Sep 29, 2023

raminmohammadi commented Oct 10, 2023