Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add correlations to Facets charts/tables #66

Open
ianhellstrom opened this issue May 17, 2019 · 6 comments
Open

Add correlations to Facets charts/tables #66

ianhellstrom opened this issue May 17, 2019 · 6 comments

Comments

@ianhellstrom
Copy link

TensorFlow Data Validation is a great tool to look at the data. One feature that might make it even better is if it would also compute correlations among the variables, so that if two variables are highly correlated you can avoid multicollinearities by dropping one of the correlated variables. Having that available in the facets visualization would make it easier to spot issues with the data.

@paulgc
Copy link
Member

paulgc commented May 17, 2019

@jameswex

@jameswex
Copy link

Two pieces here:

  1. Calculating correlations between features. Does anything in TFDV do this currently? If not, would need to define a proto format for capturing this data, and build a pipeline to calculate it.
  2. Once that is done, would need a visualization to best show this information. It's possible it could be part of Facets Overview, but also possible that it might work best as a new visualization, as Facets Overview hasn't been designed with cross-feature statistics (such as correlation) in mind.

@paulgc
Copy link
Member

paulgc commented May 17, 2019

@jameswex We are currently planning to compute correlation statistics in TFDV and probably update TF.Metadata statistics proto to capture these statistics.

@robinvanschaik
Copy link

Any updates on this / where it is on the roadmap?

I agree, this tool is excellent and the correlations are the only thing missing at the moment.

As such, I was happy to see that it was already raised.

Cheers.

@brills
Copy link
Contributor

brills commented Apr 7, 2021

We alrady have a stats generator (tensorflow_data_validation/statistics/generators/cross_feature_stats_generator.py). You can try enabling it by specifying it in StatsOptions.generators

But currently Facets does not visualize the results.

We could attach the cross stats as custom stats (like the LiftStatsGenerator does).

@AndresMontero
Copy link

Hello, is there an update about the possibility to have the correlation in tfdv.visualize_statistics() ?
great tool ! but I think this is needed
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants