Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google.api_core.exceptions.InvalidArgument: 400 The requested routine projects/c2c-dwh-dev/datasets/jsons_rep_data_warehouse/routines/mask_name is not found or an invalid masking routine #13313

Open
1 task done
nyck33 opened this issue Nov 30, 2024 · 1 comment
Assignees
Labels
triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@nyck33
Copy link

nyck33 commented Nov 30, 2024

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Summary of the issue

Context
trying to set data policy to map policy tags to UDFs in my BigQuery Dataset but get this error:

(bigquery) nyck33@lenovo-gtx1650:/mnt/d/c2c/sdp-masking-nov28-2024/policy_tags_udfs/data_policies$ p
ython map_datapolicy_to_udf.py
Traceback (most recent call last):
  File "/mnt/d/c2c/sdp-masking-nov28-2024/policy_tags_udfs/data_policies/map_datapolicy_to_udf.py", line 69, in <module>
    response = client.create_data_policy(
  File "/home/nyck33/miniconda3/envs/bigquery/lib/python3.9/site-packages/google/cloud/bigquery_datapolicies_v1/services/data_policy_service/client.py", line 769, in create_data_policy
    response = rpc(
  File "/home/nyck33/miniconda3/envs/bigquery/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/nyck33/miniconda3/envs/bigquery/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
    return retry_target(
  File "/home/nyck33/miniconda3/envs/bigquery/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
    _retry_error_helper(
  File "/home/nyck33/miniconda3/envs/bigquery/lib/python3.9/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
    raise final_exc from source_exc
  File "/home/nyck33/miniconda3/envs/bigquery/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target

Expected Behavior:
this code to map my policy tag on columns on my table to the JavaScript UDF I have stored in my dataset:

from google.cloud import bigquery_datapolicies_v1
import json
import logging
import os
from google.cloud import bigquery
from google.cloud.bigquery.dataset import AccessEntry
from google.oauth2 import service_account

from dotenv import load_dotenv

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# Load environment variables from a .env file
load_dotenv()

# Retrieve environment variables
SERVICE_ACCOUNT_JSON = os.getenv("GCP_C2C_SERVICE_ACCOUNT_JSON")
PROJECT_ID = os.getenv("GCP_C2C_PROJECT_ID")
DATASET_ID = "jsons_rep_data_warehouse"
#MASKED_READER_EMAIL = os.getenv("MASKEDREADER_EMAIL")
#FINEGRAINEDREADER_EMAIL = os.getenv("FINEGRAINEDREADER_EMAIL")

credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_JSON)
# Initialize the Data Policy client
client = bigquery_datapolicies_v1.DataPolicyServiceClient(credentials=credentials)

# Define the project and location
project_id = PROJECT_ID
location = 'us'  # e.g., 'us-central1'

# Specify the policy tag and UDF
policy_tag = 'projects/c2c-dwh-dev/locations/us/taxonomies/2093754720879615579/policyTags/2621461886793509723'
udf_resource_name = 'projects/c2c-dwh-dev/datasets/jsons_rep_data_warehouse/routines/mask_name'

# Define the data policy
'''
data_policy = bigquery_datapolicies_v1.DataPolicy(
    policy_tag=policy_tag,
    data_masking_policy=bigquery_datapolicies_v1.DataMaskingPolicy(
        predefined_expression=bigquery_datapolicies_v1.DataMaskingPolicy.PredefinedExpression.CUSTOMER_SUPPLIED, # does not exist
        routine=udf_resource_name
    )
)
'''
# Define the data masking policy with the custom routine
data_masking_policy = bigquery_datapolicies_v1.DataMaskingPolicy(
    routine=udf_resource_name
)



# Create the data policy with a valid data_policy_id
parent = f'projects/{project_id}/locations/{location}'
data_policy_id = 'mask_jp_name_fields'  # Replace with a valid identifier

# Define the data policy with a unique name
data_policy = bigquery_datapolicies_v1.DataPolicy(
    name=f'projects/{project_id}/locations/{location}/dataPolicies/{data_policy_id}',
    data_policy_id=data_policy_id,
    policy_tag=policy_tag,
    data_masking_policy=data_masking_policy,
    data_policy_type=bigquery_datapolicies_v1.DataPolicy.DataPolicyType.DATA_MASKING_POLICY,

)
response = client.create_data_policy(
    parent=parent,
    data_policy=data_policy
)


print(f'Data Policy created: {response.name}')

Actual Behavior:
I get that error so are JS UDFs not allowed? It says invalid masking routine but it works if I invoke it in the Console.

API client name and version

google-cloud-bigquery-datapolicies 0.6.10

Reproduction steps: code

from google.cloud import bigquery_datapolicies_v1
import json
import logging
import os
from google.cloud import bigquery
from google.cloud.bigquery.dataset import AccessEntry
from google.oauth2 import service_account

from dotenv import load_dotenv

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# Load environment variables from a .env file
load_dotenv()

# Retrieve environment variables
SERVICE_ACCOUNT_JSON = os.getenv("GCP_C2C_SERVICE_ACCOUNT_JSON")
PROJECT_ID = os.getenv("GCP_C2C_PROJECT_ID")
DATASET_ID = "jsons_rep_data_warehouse"
#MASKED_READER_EMAIL = os.getenv("MASKEDREADER_EMAIL")
#FINEGRAINEDREADER_EMAIL = os.getenv("FINEGRAINEDREADER_EMAIL")

credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_JSON)
# Initialize the Data Policy client
client = bigquery_datapolicies_v1.DataPolicyServiceClient(credentials=credentials)

# Define the project and location
project_id = PROJECT_ID
location = 'us'  # e.g., 'us-central1'

# Specify the policy tag and UDF
policy_tag = 'projects/c2c-dwh-dev/locations/us/taxonomies/2093754720879615579/policyTags/2621461886793509723'
udf_resource_name = 'projects/c2c-dwh-dev/datasets/jsons_rep_data_warehouse/routines/mask_name'

# Define the data policy
'''
data_policy = bigquery_datapolicies_v1.DataPolicy(
    policy_tag=policy_tag,
    data_masking_policy=bigquery_datapolicies_v1.DataMaskingPolicy(
        predefined_expression=bigquery_datapolicies_v1.DataMaskingPolicy.PredefinedExpression.CUSTOMER_SUPPLIED, # does not exist
        routine=udf_resource_name
    )
)
'''
# Define the data masking policy with the custom routine
data_masking_policy = bigquery_datapolicies_v1.DataMaskingPolicy(
    routine=udf_resource_name
)



# Create the data policy with a valid data_policy_id
parent = f'projects/{project_id}/locations/{location}'
data_policy_id = 'mask_jp_name_fields'  # Replace with a valid identifier

# Define the data policy with a unique name
data_policy = bigquery_datapolicies_v1.DataPolicy(
    name=f'projects/{project_id}/locations/{location}/dataPolicies/{data_policy_id}',
    data_policy_id=data_policy_id,
    policy_tag=policy_tag,
    data_masking_policy=data_masking_policy,
    data_policy_type=bigquery_datapolicies_v1.DataPolicy.DataPolicyType.DATA_MASKING_POLICY,

)
response = client.create_data_policy(
    parent=parent,
    data_policy=data_policy
)


print(f'Data Policy created: {response.name}')```

### Reproduction steps: supporting files

_No response_

### Reproduction steps: actual results

_No response_

### Reproduction steps: expected results

_No response_

### OS & version + platform

Ubuntu 24.04 WSL

### Python environment

Python 3.9.20

### Python dependencies

onda list nyck33@lenovo-gtx1650:/mnt/d/c2c/sdp-masking-nov28-2024/policy_tags_udfs/data_policies$

packages in environment at /home/nyck33/miniconda3/envs/bigquery:

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
annotated-types 0.7.0 pypi_0 pypi
anyio 4.6.2.post1 pypi_0 pypi
asttokens 2.4.1 pyhd8ed1ab_0 conda-forge
attrs 24.2.0 pypi_0 pypi
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
ca-certificates 2024.8.30 hbcca054_0 conda-forge
cachetools 5.5.0 pypi_0 pypi
certifi 2024.8.30 pypi_0 pypi
charset-normalizer 3.4.0 pypi_0 pypi
comm 0.2.2 pyhd8ed1ab_0 conda-forge
contourpy 1.3.0 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
db-dtypes 1.3.0 pypi_0 pypi
debugpy 1.6.7 py39h6a678d5_0
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
distro 1.9.0 pypi_0 pypi
dnspython 1.16.0 pypi_0 pypi
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
exceptiongroup 1.2.2 pypi_0 pypi
executing 2.1.0 pyhd8ed1ab_0 conda-forge
fastjsonschema 2.20.0 pypi_0 pypi
fonttools 4.54.1 pypi_0 pypi
google-api-core 2.21.0 pypi_0 pypi
google-api-python-client 2.154.0 pypi_0 pypi
google-auth 2.35.0 pypi_0 pypi
google-auth-httplib2 0.2.0 pypi_0 pypi
google-auth-oauthlib 1.2.1 pypi_0 pypi
google-cloud-bigquery 3.26.0 pypi_0 pypi
google-cloud-bigquery-datapolicies 0.6.10 pypi_0 pypi
google-cloud-bigquery-storage 2.27.0 pypi_0 pypi
google-cloud-core 2.4.1 pypi_0 pypi
google-cloud-datacatalog 3.23.0 pypi_0 pypi
google-cloud-dlp 3.25.1 pypi_0 pypi
google-cloud-translate 3.18.0 pypi_0 pypi
google-crc32c 1.6.0 pypi_0 pypi
google-resumable-media 2.7.2 pypi_0 pypi
googleapis-common-protos 1.65.0 pypi_0 pypi
grpc-google-iam-v1 0.13.1 pypi_0 pypi
grpcio 1.67.0 pypi_0 pypi
grpcio-status 1.67.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
httpcore 1.0.7 pypi_0 pypi
httplib2 0.22.0 pypi_0 pypi
httpx 0.27.2 pypi_0 pypi
idna 3.10 pypi_0 pypi
importlib-resources 6.4.5 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
ipykernel 6.29.5 pyh3099207_0 conda-forge
ipython 8.12.0 pyh41d4057_0 conda-forge
jedi 0.19.1 pyhd8ed1ab_0 conda-forge
jiter 0.7.1 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2024.10.1 pypi_0 pypi
jupyter_client 7.3.4 pyhd8ed1ab_0 conda-forge
jupyter_core 5.7.2 pyh31011fe_1 conda-forge
kiwisolver 1.4.7 pypi_0 pypi
ld_impl_linux-64 2.40 h12ee557_0
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libsodium 1.0.18 h36c2ea0_1 conda-forge
libstdcxx-ng 11.2.0 h1234567_1
matplotlib 3.9.2 pypi_0 pypi
matplotlib-inline 0.1.7 pyhd8ed1ab_0 conda-forge
nbformat 5.10.4 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge
numpy 2.0.2 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openai 1.55.0 pypi_0 pypi
openssl 3.0.15 h5eee18b_0
packaging 24.1 pyhd8ed1ab_0 conda-forge
pandas 2.2.3 pypi_0 pypi
pandas-gbq 0.24.0 pypi_0 pypi
parso 0.8.4 pyhd8ed1ab_0 conda-forge
pexpect 4.9.0 pyhd8ed1ab_0 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 11.0.0 pypi_0 pypi
pip 24.2 py39h06a4308_0
platformdirs 4.3.6 pyhd8ed1ab_0 conda-forge
plotly 5.24.1 pypi_0 pypi
pluggy 1.5.0 pypi_0 pypi
prompt-toolkit 3.0.48 pyha770c72_0 conda-forge
prompt_toolkit 3.0.48 hd8ed1ab_0 conda-forge
proto-plus 1.25.0 pypi_0 pypi
protobuf 5.28.3 pypi_0 pypi
psutil 5.9.0 py39h5eee18b_0
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pure_eval 0.2.3 pyhd8ed1ab_0 conda-forge
pyarrow 17.0.0 pypi_0 pypi
pyasn1 0.6.1 pypi_0 pypi
pyasn1-modules 0.4.1 pypi_0 pypi
pydantic 2.10.1 pypi_0 pypi
pydantic-core 2.27.1 pypi_0 pypi
pydata-google-auth 1.8.2 pypi_0 pypi
pygments 2.18.0 pyhd8ed1ab_0 conda-forge
pymongo 3.9.0 pypi_0 pypi
pyparsing 3.2.0 pypi_0 pypi
pytest 8.3.3 pypi_0 pypi
python 3.9.20 he870216_1
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
python_abi 3.9 2_cp39 conda-forge
pytz 2024.2 pypi_0 pypi
pyzmq 25.1.2 py39h6a678d5_0
readline 8.2 h5eee18b_0
referencing 0.35.1 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
requests-oauthlib 2.0.0 pypi_0 pypi
rpds-py 0.20.0 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scikit-learn 1.5.2 pypi_0 pypi
scipy 1.13.1 pypi_0 pypi
seaborn 0.13.2 pypi_0 pypi
setuptools 75.1.0 py39h06a4308_0
six 1.16.0 pyh6c4a22f_0 conda-forge
sniffio 1.3.1 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0
stack_data 0.6.2 pyhd8ed1ab_0 conda-forge
tenacity 9.0.0 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tk 8.6.14 h39e8969_0
tomli 2.0.2 pypi_0 pypi
tornado 6.1 py39hb9d737c_3 conda-forge
tqdm 4.67.0 pypi_0 pypi
traitlets 5.14.3 pyhd8ed1ab_0 conda-forge
typing_extensions 4.12.2 pyha770c72_0 conda-forge
tzdata 2024.2 pypi_0 pypi
uritemplate 4.1.1 pypi_0 pypi
urllib3 2.2.3 pypi_0 pypi
wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge
wheel 0.44.0 py39h06a4308_0
xz 5.4.6 h5eee18b_1
zeromq 4.3.5 h6a678d5_0
zipp 3.20.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_1
python --version3@lenovo-gtx1650:/mnt/d/c2c/sdp-masking-nov28-2024/policy_tags_udfs/data_policies$
Python 3.9.20
(bigquery) nyck33@lenovo-gtx1650:/mnt/d/c2c/sdp-masking-nov28-2024/policy_tags_udfs/data_policies$


### Additional context

_No response_
@nyck33 nyck33 added triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 30, 2024
@chalmerlowe
Copy link
Contributor

@nyck33
Thanks for submitting a question.
The error (as shown in the title) says that there are two possible reasons for this error:
google.api_core.exceptions.InvalidArgument: 400 The requested routine projects/c2c-dwh-dev/datasets/jsons_rep_data_warehouse/routines/mask_name

  • is not found
  • or an invalid masking routine

If the routine works in the Console but does not appear to work when invoked via the client, I would start first with considering whether the path to the routine as used in the client might have a subtle error. Are we sure about the path to the routine and the routine's name?

'projects/c2c-dwh-dev/datasets/jsons_rep_data_warehouse/routines/mask_name'

@chalmerlowe chalmerlowe self-assigned this Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants