Add tfio style test for tensorflow_io/BigTable #171

henrytansetiawan · 2019-04-04T17:56:18Z

The original contrib.BigTable has its own internal test, but it would be fantastic if we add tensorflow_io style test using BigTable emulator.

yongtang · 2019-04-04T19:57:41Z

@henrytansetiawan 👍 The PubSub emulator is similar, I could help with BigTable emulator if needed.

ghost · 2019-12-06T20:45:18Z

Hello @yongtang, can I please work on adding tests for gcloud bigtable using bigtable emulator? If that is not being taken up already?
Also, I see that pubsub related tests are skipped for now, any particular reason for that?

yongtang · 2019-12-06T21:00:00Z

@ricky3dec That would be great! 👍 Let me know if you run into any issues.

For pubsub, at one point I encountered some issue with credential file on macOS so that was disabled at the moment. We definitely plan to re-enable it very soon. Help would be appreciated as well.

ghost · 2019-12-08T14:04:16Z

Sure @yongtang , please assign the issue to me.

Could you please point me to "original contrib.BigTable has its own internal test"? Would like to know what kind of tests were written earlier. Though I tried a bit to find out, but looks like this structure has changed over last few releases.

Thanks!

yongtang · 2019-12-08T15:53:17Z

Thanks @ricky3dec!

It seems that google Bigtable has a emulator (similar to PubSub):
https://cloud.google.com/bigtable/docs/emulator

I think that might make our life a little easy.

Not sure about the original tensorflow.contrib.bigtable's internal test. Though from my past experience quite a few tensorflow.contrib.'s google cloud implementation use a C++ mock.

I am not sure we need to implement C++ mock as it may not exactly match the google cloud behavior, and it is (unnecessarily) harder to implement.

I would suggest to look into Bigtable emulator, and add python test against emulator. This might be a better way to test bigtable.

ghost · 2019-12-08T17:04:23Z

Yes @yongtang , I had gone through the emulator part, as you had originally suggested above.

I just wanted to know, if there are already some baseline tests that minimally we should have. But considering your comments, would try to write tests that cover basic usage of the operation that interfaces with "BigTable" alongwith some exception scenarios.

Would be sending out the initial review soon. Thanks again!

yongtang · 2019-12-08T17:20:01Z

Thanks @ricky3dec! We have ongoing discussions about standardize tests in #65

So far io/tests/test_io_dataset_eager.py is the initial efforts which tries to standardize the dataset.

You can also suggest any input to make the tests more consistent.

There were some confusions about operations in tf.data.Dataset but now it mostly sorted out:

To make tf.data.Dataset work with tf.keras inference, a dataset should support iter as the minimum:
```
dataset = ....
entries = [e for e in dataset] # <= this line should work
```

To make tf.data.Dataset work with tf.keras training, a dataset should support multiple runs of iter with consistent result:

dataset = ....
entries_1 = [e for e in dataset]
entries_2 = [e for e in dataset] # <= if this line fails then dataset can not be feed into model.fit
assert entries_1 == entries_2 # <= if this line error then dataset will not work with model.fit.

Ideally tf.data.Dataset could be inside another tf.data.Dataset, which means everything runs in graph mode. This could be helpful to do complex operations. The following is a simple test that guarantees:

@tf.function # <= tf.function is optional, as tf.data.Dataset.map will use autograph anyway.
def f(filename):
   return tfio.IODataset.from_...

 dataset = tf.data.Dataset.from_tensor_slices(['filename'])
 dataset = dataset.map(f)

I would suggest you to start with one (maybe test 1) for inference-compatible test), then we could expand to add more supports with more tests. You can submit multiple PRs to make the process easier.

yongtang · 2019-12-08T17:23:15Z

@ricky3dec sorry the discussions about standardize tests is in #651 ~~(not #65)~~

ghost · 2019-12-08T17:29:31Z

Sure @yongtang , would go through the guidelines above, and yes it makes sense to have reviews in mini batch mode :-)

yongtang · 2019-12-08T20:07:59Z

@ricky3dec I added a PR #667 to split tests into different feature supporting groups. It should be easier now to add additional tests (for different feature-compatibilities). You can take a look and see if it makes sense. You can also add parametrized tests on top of it one by one (for Bigtable, etc).

ghost · 2019-12-08T21:08:48Z

Sure @yongtang , thanks for following up on this and guiding me. 🔖

yongtang · 2019-12-13T03:45:17Z

@ricky3dec The PubSub tests have been re-enabled.

ghost · 2019-12-13T17:52:13Z

Thanks @yongtang. Hope to send the initial review soon, didn't get chance to work on this until recently.

ghost · 2020-03-29T15:25:14Z

Hello @yongtang , finally got chance to work on it now.:-)

Need some clarification. I was just adding a simple test using Bigtable emulator. I have already created the script for starting docker instance of Bigtable emulator and was adding a test in "test_io_dataset_eager.py" as suggested by you earlier.

However I observe most of the tests in this file are for operations that are under "tensorflow_io/core/python/experimental" and all of them have their stream methods defined for e.g : "tfio.experimental.IODataset.stream().from_video_capture".

So just wanted to know if I have to create a similar invocation "from_bigtable", if yes where is the right place:

tensorflow_io/core/python/ops/io_dataset.py OR
tensorflow_io/core/python/ops/io_dataset_ops.py OR
tensorflow_io/core/python/experimental/io_dataset_ops.py

Please let me know if I am missing anything. Thanks

yongtang · 2020-03-29T17:06:30Z

@ricky3dec I think tensorflow_io/core/python/experimental/io_dataset_ops.py would be a place to start. Once the API stabilizing then we could move to non-experimental API.

On a separate note, (not immediately related to Bigtable now, but soon-to-be), I am reconsidering the grouping of APIs. At the beginning of the project, we start with one-ops in one directory/module. So we ends up with many modules tensorflow_io.kafka, tensorflow_io.kinesis, tensorflow_io.azfs, tensorflow_io.lmdb..., which quickly grows into an unmanageable situation. Since then, we try to consolidate the API to group dataset into IODataset.from_..., which more or less makes module clean. But now the IODataset.from_... is getting more and more crowded as well. At some points, we need to find a balance between module vs APIs. I am thinking of groups related ops into modules. For example, Arrow, Avro, Parquet, Feather, etc all could be groups under tfio.columnar, as they are essentially all columnar. BigTable, BigQuery, and Google cloud related ops could be grouped into tfio.google. /cc @terrytangyuan @BryanCutler do you think if this makes sense?

@ricky3dec For BigTable, I think stay with tensorflow_io/core/python/experimental/io_dataset_ops.py would be good for now, we can adjust the namespace as we move forward.

ghost · 2020-03-31T14:12:26Z

sure, thanks @yongtang !

ghost · 2020-04-08T16:45:12Z

Hello @yongtang , I raised an Initial PR for review only, request you to help with some clarifications I asked in PR. Thanks

yongtang added the enhancement Enhancement request label Apr 4, 2019

yongtang assigned ghost Dec 8, 2019

yongtang mentioned this issue Apr 9, 2020

Avro Dataset for TensorFlow IO #879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tfio style test for tensorflow_io/BigTable #171

Add tfio style test for tensorflow_io/BigTable #171

henrytansetiawan commented Apr 4, 2019

yongtang commented Apr 4, 2019

ghost commented Dec 6, 2019

yongtang commented Dec 6, 2019

ghost commented Dec 8, 2019

yongtang commented Dec 8, 2019

ghost commented Dec 8, 2019 •

edited by ghost

Loading

yongtang commented Dec 8, 2019

yongtang commented Dec 8, 2019

ghost commented Dec 8, 2019

yongtang commented Dec 8, 2019

ghost commented Dec 8, 2019

yongtang commented Dec 13, 2019

ghost commented Dec 13, 2019

ghost commented Mar 29, 2020 •

edited by ghost

Loading

yongtang commented Mar 29, 2020

ghost commented Mar 31, 2020

ghost commented Apr 8, 2020

Add tfio style test for tensorflow_io/BigTable #171

Add tfio style test for tensorflow_io/BigTable #171

Comments

henrytansetiawan commented Apr 4, 2019

yongtang commented Apr 4, 2019

ghost commented Dec 6, 2019

yongtang commented Dec 6, 2019

ghost commented Dec 8, 2019

yongtang commented Dec 8, 2019

ghost commented Dec 8, 2019 • edited by ghost Loading

yongtang commented Dec 8, 2019

yongtang commented Dec 8, 2019

ghost commented Dec 8, 2019

yongtang commented Dec 8, 2019

ghost commented Dec 8, 2019

yongtang commented Dec 13, 2019

ghost commented Dec 13, 2019

ghost commented Mar 29, 2020 • edited by ghost Loading

yongtang commented Mar 29, 2020

ghost commented Mar 31, 2020

ghost commented Apr 8, 2020

ghost commented Dec 8, 2019 •

edited by ghost

Loading

ghost commented Mar 29, 2020 •

edited by ghost

Loading