Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planned support for hive table? #54

Open
zhaiyuyong opened this issue Mar 11, 2019 · 2 comments
Open

planned support for hive table? #54

zhaiyuyong opened this issue Mar 11, 2019 · 2 comments

Comments

@zhaiyuyong
Copy link

No description provided.

@paulgc
Copy link
Member

paulgc commented Mar 20, 2019

@zhaiyuyong TFDV uses Apache Beam for reading input data. Beam Python currently doesn't support reading Hive table out of the box. There are two possible options currently:

  1. Export your hive table as a CSV/tfrecord file and then use TFDV.
  2. Write a custom Beam transform to read hive table and decode it to TFDV's inmemory dictionary representation. Follow the instructions here to construct a pipeline with a custom decoder.

@paulgc
Copy link
Member

paulgc commented Mar 20, 2019

@katsiapis @aaltay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants