Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Risk samples #1411

Merged
merged 3 commits into from
Mar 19, 2018
Merged

add Risk samples #1411

merged 3 commits into from
Mar 19, 2018

Conversation

averikitsch
Copy link
Contributor

No description provided.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 16, 2018
dlp/risk.py Outdated
# See the License for the specific language governing permissions and
# limitations under the License.

"""Sample app that uses the Data Loss Prevent API to """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is unfinished. Maybe "to perform risk anaylsis"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dlp/risk.py Outdated
def numerical_risk_analysis(project, table_project_id, dataset_id, table_id,
column_name, topic_id, subscription_id,
timeout=180):
"""Uses the Data Loss Prevention API to computes risk metrics of a column
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to compute risk metrics"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dlp/risk.py Outdated
print('Value Range: [{}, {}]'.format(
results.min_value.integer_value,
results.max_value.integer_value))
print('25% quantile value: {}\n' # NOTE: change this to unique
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this #NOTE mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a note to ask you about switching this to be more like the NodeJS samples. The NodeJS samples only print the quantiles with unique values. Here I am summarizing by providing the quartiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on the output, since I don't really understand the functionality of the endpoint very well. = ) It should be fine as it is.

dlp/risk.py Outdated
.categorical_stats_result
.value_frequency_histogram_buckets)
# Print bucket stats
for i in range(len(histogram_buckets)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally in python you would do "for bucket in histogram_buckets" instead of indexing them by number. Likewise in the examples below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dlp/risk.py Outdated

def k_map_estimate_analysis(project, table_project_id, dataset_id, table_id,
topic_id, subscription_id, quasi_ids, info_types,
region_code='USA', timeout=180):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default for region_code also needs to be added as "default=" to the corresponding command-line flag. Otherwise, the flag's default value of None will override this default when it's actually called.

Alternatively, if region_code can be safely omitted, maybe this should default to None in the first place?

GCLOUD_PROJECT = os.getenv('GCLOUD_PROJECT')
TOPIC_ID = 'dlp-test'
SUBSCRIPTION_ID = 'dlp-test-subscription'
DATASET_ID = 'integration_tests_dlp'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these datasets public? (So, will they work for our test runner as well?)

@andrewsg andrewsg merged commit 099d7c5 into GoogleCloudPlatform:dlp Mar 19, 2018
@averikitsch averikitsch deleted the dlp-risks branch July 16, 2018 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants