Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue using allowlist_features and denylist_features in visualize_statistics #212

Open
wronk opened this issue May 16, 2022 · 0 comments
Open

Comments

@wronk
Copy link

wronk commented May 16, 2022

Overview

I'm having issues specifying the features to include/exclude when visualizing stats in TFDV. It seems like the allowlist_features and denylist_features require a tensorflow_data_validation.types.FeaturePath object, which took a bit to figure out how to construct. This doesn't seem that user friendly -- was it intended to allow a list of strings to be passed?

Code to reproduce

I can reproduce the problem in the public colab example. In the "Compute and Visualize Statistics" section of the above notebook, update the visualize_statistics call to be:
tfdv.visualize_statistics(train_stats, denylist_features=['pickup_community_area']). The first feature shouldn't exist in the visualized example (if I'm calling this correctly).

image

Workaround code

To make this work, I have to manually construct a tensorflow_data_validation.types.FeaturePath object. Perhaps it would be better to do the filter comparison on each feature's path string?

# Show string name of feature
first_feat = train_stats.datasets[0].features[0]
print(first_feat.path)

# Construct necessary object to make `allowlist_feature` filter work
from tensorflow_data_validation import types
print(types.FeaturePath.from_proto(first_feat.path))

# docs-infra: no-execute
tfdv.visualize_statistics(train_stats, allowlist_features=[types.FeaturePath.from_proto(first_feat.path)])

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants