Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tfio style test for tensorflow_io/BigTable #171

Open
henrytansetiawan opened this issue Apr 4, 2019 · 17 comments
Open

Add tfio style test for tensorflow_io/BigTable #171

henrytansetiawan opened this issue Apr 4, 2019 · 17 comments
Labels
enhancement Enhancement request

Comments

@henrytansetiawan
Copy link
Contributor

The original contrib.BigTable has its own internal test, but it would be fantastic if we add tensorflow_io style test using BigTable emulator.

@yongtang yongtang added the enhancement Enhancement request label Apr 4, 2019
@yongtang
Copy link
Member

yongtang commented Apr 4, 2019

@henrytansetiawan 👍 The PubSub emulator is similar, I could help with BigTable emulator if needed.

@ghost
Copy link

ghost commented Dec 6, 2019

Hello @yongtang, can I please work on adding tests for gcloud bigtable using bigtable emulator? If that is not being taken up already?
Also, I see that pubsub related tests are skipped for now, any particular reason for that?

@yongtang
Copy link
Member

yongtang commented Dec 6, 2019

@ricky3dec That would be great! 👍 Let me know if you run into any issues.

For pubsub, at one point I encountered some issue with credential file on macOS so that was disabled at the moment. We definitely plan to re-enable it very soon. Help would be appreciated as well.

@ghost
Copy link

ghost commented Dec 8, 2019

Sure @yongtang , please assign the issue to me.

Could you please point me to "original contrib.BigTable has its own internal test"? Would like to know what kind of tests were written earlier. Though I tried a bit to find out, but looks like this structure has changed over last few releases.

Thanks!

@yongtang yongtang assigned ghost Dec 8, 2019
@yongtang
Copy link
Member

yongtang commented Dec 8, 2019

Thanks @ricky3dec!

It seems that google Bigtable has a emulator (similar to PubSub):
https://cloud.google.com/bigtable/docs/emulator

I think that might make our life a little easy.

Not sure about the original tensorflow.contrib.bigtable's internal test. Though from my past experience quite a few tensorflow.contrib.'s google cloud implementation use a C++ mock.

I am not sure we need to implement C++ mock as it may not exactly match the google cloud behavior, and it is (unnecessarily) harder to implement.

I would suggest to look into Bigtable emulator, and add python test against emulator. This might be a better way to test bigtable.

@ghost
Copy link

ghost commented Dec 8, 2019

Yes @yongtang , I had gone through the emulator part, as you had originally suggested above.

I just wanted to know, if there are already some baseline tests that minimally we should have. But considering your comments, would try to write tests that cover basic usage of the operation that interfaces with "BigTable" alongwith some exception scenarios.

Would be sending out the initial review soon. Thanks again!

@yongtang
Copy link
Member

yongtang commented Dec 8, 2019

Thanks @ricky3dec! We have ongoing discussions about standardize tests in #65

So far io/tests/test_io_dataset_eager.py is the initial efforts which tries to standardize the dataset.

You can also suggest any input to make the tests more consistent.

There were some confusions about operations in tf.data.Dataset but now it mostly sorted out:

  1. To make tf.data.Dataset work with tf.keras inference, a dataset should support iter as the minimum:
    dataset = ....
    entries = [e for e in dataset] # <= this line should work
    
  2. To make tf.data.Dataset work with tf.keras training, a dataset should support multiple runs of iter with consistent result:
    dataset = ....
    entries_1 = [e for e in dataset]
    entries_2 = [e for e in dataset] # <= if this line fails then dataset can not be feed into model.fit
    assert entries_1 == entries_2 # <= if this line error then dataset will not work with model.fit.
    
  3. Ideally tf.data.Dataset could be inside another tf.data.Dataset, which means everything runs in graph mode. This could be helpful to do complex operations. The following is a simple test that guarantees:
    @tf.function # <= tf.function is optional, as tf.data.Dataset.map will use autograph anyway.
    def f(filename):
       return tfio.IODataset.from_...
    
     dataset = tf.data.Dataset.from_tensor_slices(['filename'])
     dataset = dataset.map(f)
    

I would suggest you to start with one (maybe test 1) for inference-compatible test), then we could expand to add more supports with more tests. You can submit multiple PRs to make the process easier.

@yongtang
Copy link
Member

yongtang commented Dec 8, 2019

@ricky3dec sorry the discussions about standardize tests is in #651 (not #65)

@ghost
Copy link

ghost commented Dec 8, 2019

Sure @yongtang , would go through the guidelines above, and yes it makes sense to have reviews in mini batch mode :-)

@yongtang
Copy link
Member

yongtang commented Dec 8, 2019

@ricky3dec I added a PR #667 to split tests into different feature supporting groups. It should be easier now to add additional tests (for different feature-compatibilities). You can take a look and see if it makes sense. You can also add parametrized tests on top of it one by one (for Bigtable, etc).

@ghost
Copy link

ghost commented Dec 8, 2019

Sure @yongtang , thanks for following up on this and guiding me. 🔖

@yongtang
Copy link
Member

@ricky3dec The PubSub tests have been re-enabled.

@ghost
Copy link

ghost commented Dec 13, 2019

Thanks @yongtang. Hope to send the initial review soon, didn't get chance to work on this until recently.

@ghost
Copy link

ghost commented Mar 29, 2020

Hello @yongtang , finally got chance to work on it now.:-)

Need some clarification. I was just adding a simple test using Bigtable emulator. I have already created the script for starting docker instance of Bigtable emulator and was adding a test in "test_io_dataset_eager.py" as suggested by you earlier.

However I observe most of the tests in this file are for operations that are under "tensorflow_io/core/python/experimental" and all of them have their stream methods defined for e.g : "tfio.experimental.IODataset.stream().from_video_capture".

So just wanted to know if I have to create a similar invocation "from_bigtable", if yes where is the right place:

tensorflow_io/core/python/ops/io_dataset.py OR
tensorflow_io/core/python/ops/io_dataset_ops.py OR
tensorflow_io/core/python/experimental/io_dataset_ops.py

Please let me know if I am missing anything. Thanks

@yongtang
Copy link
Member

@ricky3dec I think tensorflow_io/core/python/experimental/io_dataset_ops.py would be a place to start. Once the API stabilizing then we could move to non-experimental API.

On a separate note, (not immediately related to Bigtable now, but soon-to-be), I am reconsidering the grouping of APIs. At the beginning of the project, we start with one-ops in one directory/module. So we ends up with many modules tensorflow_io.kafka, tensorflow_io.kinesis, tensorflow_io.azfs, tensorflow_io.lmdb..., which quickly grows into an unmanageable situation. Since then, we try to consolidate the API to group dataset into IODataset.from_..., which more or less makes module clean. But now the IODataset.from_... is getting more and more crowded as well. At some points, we need to find a balance between module vs APIs. I am thinking of groups related ops into modules. For example, Arrow, Avro, Parquet, Feather, etc all could be groups under tfio.columnar, as they are essentially all columnar. BigTable, BigQuery, and Google cloud related ops could be grouped into tfio.google. /cc @terrytangyuan @BryanCutler do you think if this makes sense?

@ricky3dec For BigTable, I think stay with tensorflow_io/core/python/experimental/io_dataset_ops.py would be good for now, we can adjust the namespace as we move forward.

@ghost
Copy link

ghost commented Mar 31, 2020

sure, thanks @yongtang !

@ghost
Copy link

ghost commented Apr 8, 2020

Hello @yongtang , I raised an Initial PR for review only, request you to help with some clarifications I asked in PR. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement request
Projects
None yet
Development

No branches or pull requests

2 participants