Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin API for generator support with tensorflow-io #151

Open
yongtang opened this issue Mar 15, 2019 · 3 comments
Open

Plugin API for generator support with tensorflow-io #151

yongtang opened this issue Mar 15, 2019 · 3 comments

Comments

@yongtang
Copy link
Member

yongtang commented Mar 15, 2019

It is possible to use generator to output tf.data in tensorflow through tf.data.Dataset.from_generator. The implementation follows similar paths as tf.py_func so it does have some limitations.

A plugin-like API may not be difficult to implement in order to output tf.data from within C/C++. Essentially, we could define a C API for a plugin to be exposed. Each plugin will be built as a dynamic shared library (.so or .dll). The shared library will be loaded dynamically so that the C API could be called to generate the data. In the kernel of tensorflow-io we use the generated data to output to tf.data. (to be used by tf.keras etc.)

Note we define C API for plugin to expose, but there is no limitation on the implementation. A plugin could be implemented in C++ internally, and expose the C API.

Note: This could be part of the GSoC (TensorFlow) project:
https://docs.google.com/document/d/1zT57PFMGZ04A4CvHxAKVpMTgXjsO92_oKeSKwZMc0Gs/edit

@yongtang
Copy link
Member Author

The idea comes from one issue I faced before. At one point I was doing some research in defense IDN homograph attacks. We needed images for each unicode character for analysis. That is around 1 million pictures. At one point we opened many processes on quite a few machines to generate the pictures concurrently. Which is kind of tedious. We probably could actually use tensorflow to do the task, then use tensorflow for training directly.

@suphoff
Copy link
Contributor

suphoff commented Mar 15, 2019

I really like the idea.
TF does not make it easy with using initializers at library load time to register OPs. In addition loading of the library has to be initiated by tensorflow or the registration through the initializers are disregarded.
( I implemented a module with both TF custom Ops and SWIG interface and had to ensure that TF would load the module first so that OPs worked - Python uses tables so was fine with being second)
The best workaround I could come up with is using a "trampoline" library that both the external C interface module and the TF Generator OPS depend on. This prevents external code (libraries) to automatically load tensorflow/io modules containing OPS at inappropriate times.

@yongtang yongtang changed the title Plugin API for generator support with tensorflow-io [GSoC] Plugin API for generator support with tensorflow-io Mar 28, 2019
@terrytangyuan terrytangyuan changed the title [GSoC] Plugin API for generator support with tensorflow-io Plugin API for generator support with tensorflow-io May 13, 2019
@yongtang
Copy link
Member Author

Note #246 could be related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants