Add script to run full zipline quickstart suite #292

david-zlai · 2025-01-29T00:31:32Z

Summary

python distribution/run_zipline_quickstart.py

This runs the full zipline suite of commands against a test quickstart groupby.

Example:

davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: We will wait for PT6H for the job to complete
25/01/30 18:18:49 INFO BigTableKVStoreImpl: Export job completed successfully
25/01/30 18:18:49 INFO Driver$GroupByUploadToKVBulkLoad$: Uploaded GroupByUpload data to KV store for GroupBy: quickstart.purchases.v1_test; partition: 2023-12-01 in 1 seconds
Job [c29097e9-b845-4ad7-843a-c89b622c5cfe] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c29097e9-b845-4ad7-843a-c89b622c5cfe/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c29097e9-b845-4ad7-843a-c89b622c5cfe/driveroutput
jobUuid: c29097e9-b845-4ad7-843a-c89b622c5cfe
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c29097e9-b845-4ad7-843a-c89b622c5cfe
  projectId: canary-443022
sparkJob:
  args:
  - groupby-upload-bulk-load
  - --conf-path=purchases.v1_test
  - --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  - --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl
  - --conf-type=group_bys
  - --partition-string=2023-12-01
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:49.641298Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:18:38.893434Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:18:38.924869Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:18:39.144132Z'
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<< .....................................METADATA-UPLOAD.....................................>>>>>\033[0m'
<<<<< .....................................METADATA-UPLOAD.....................................>>>>>
+ touch tmp_metadata_upload.out
+ zipline run --mode metadata-upload --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_metadata_upload.out
Running with args: {'mode': 'metadata-upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'metadata-upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(metadata-upload, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(metadata-upload, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: ac577d8c-95e2-4dda-a863-2d9fb94f022f
Dataproc submitter job id: ac577d8c-95e2-4dda-a863-2d9fb94f022f
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_metadata-upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter metadata-upload --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_metadata-upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter metadata-upload --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_metadata_upload.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ METADATA_UPLOAD_JOB_ID=ac577d8c-95e2-4dda-a863-2d9fb94f022f
+ check_dataproc_job_state ac577d8c-95e2-4dda-a863-2d9fb94f022f
+ JOB_ID=ac577d8c-95e2-4dda-a863-2d9fb94f022f
+ '[' -z ac577d8c-95e2-4dda-a863-2d9fb94f022f ']'
+ gcloud dataproc jobs wait ac577d8c-95e2-4dda-a863-2d9fb94f022f --region=us-central1
Waiting for job output...
25/01/30 18:19:15 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:19:15 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:19:17 INFO MetadataDirWalker: Uploading Chronon configs from purchases.v1_test
25/01/30 18:19:19 INFO MetadataStore: Creating dataset: CHRONON_METADATA
25/01/30 18:19:19 INFO BigTableKVStoreImpl: Table CHRONON_METADATA already exists
25/01/30 18:19:19 INFO MetadataStore: Successfully created dataset: CHRONON_METADATA
25/01/30 18:19:20 INFO MetadataStore: Creating dataset: CHRONON_ENTITY_BY_TEAM
25/01/30 18:19:20 INFO BigTableKVStoreImpl: Table CHRONON_ENTITY_BY_TEAM already exists
25/01/30 18:19:20 INFO MetadataStore: Successfully created dataset: CHRONON_ENTITY_BY_TEAM
25/01/30 18:19:20 ERROR ManagedChannelOrphanWrapper: *~*~*~ Previous channel ManagedChannelImpl{logId=43, target=bigtableadmin.googleapis.com:443} was garbage collected without being shut down! ~*~*~*
    Make sure to call shutdown()/shutdownNow()
java.lang.RuntimeException: ManagedChannel allocation site
        at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:102) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:60) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:51) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelImplBuilder.build(ManagedChannelImplBuilder.java:710) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.ForwardingChannelBuilder2.build(ForwardingChannelBuilder2.java:272) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:497) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:106) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:84) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:267) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:260) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:225) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.stub.EnhancedBigtableTableAdminStub.createEnhanced(EnhancedBigtableTableAdminStub.java:61) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.BigtableTableAdminClient.create(BigtableTableAdminClient.java:158) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.integrations.cloud_gcp.GcpApiImpl.genKvStore(GcpApiImpl.scala:65) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore(Driver.scala:621) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore$(Driver.scala:620) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$Args.metaDataStore(Driver.scala:749) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17$adapted(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at scala.collection.immutable.List.foreach(List.scala:431) ~[scala-library-2.12.18.jar:?]
        at ai.chronon.spark.Driver$MetadataUploader$.run(Driver.scala:760) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$.main(Driver.scala:1054) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver.main(Driver.scala) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1124) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1133) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.12-3.5.1.jar:3.5.1]
25/01/30 18:19:20 ERROR ManagedChannelOrphanWrapper: *~*~*~ Previous channel ManagedChannelImpl{logId=87, target=bigtableadmin.googleapis.com:443} was garbage collected without being shut down! ~*~*~*
    Make sure to call shutdown()/shutdownNow()
java.lang.RuntimeException: ManagedChannel allocation site
        at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:102) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:60) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:51) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelImplBuilder.build(ManagedChannelImplBuilder.java:710) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.ForwardingChannelBuilder2.build(ForwardingChannelBuilder2.java:272) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:497) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:106) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:84) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:267) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:260) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:225) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.stub.EnhancedBigtableTableAdminStub.createEnhanced(EnhancedBigtableTableAdminStub.java:61) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.BigtableTableAdminClient.create(BigtableTableAdminClient.java:158) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.integrations.cloud_gcp.GcpApiImpl.genKvStore(GcpApiImpl.scala:65) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore(Driver.scala:621) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore$(Driver.scala:620) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$Args.metaDataStore(Driver.scala:749) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17$adapted(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at scala.collection.immutable.List.foreach(List.scala:431) ~[scala-library-2.12.18.jar:?]
        at ai.chronon.spark.Driver$MetadataUploader$.run(Driver.scala:760) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$.main(Driver.scala:1054) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver.main(Driver.scala) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1124) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1133) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.12-3.5.1.jar:3.5.1]
25/01/30 18:19:20 INFO MetadataStore: Putting metadata for
dataset: CHRONON_METADATA
key: purchases.v1_test
conf: List({"metaData":{"name":"quickstart.purchases.v1_test","online":1,"customJson":"{\"lag\": 0, \"groupby_tags\": null, \"column_tags\": {}}","dependencies":["{\"name\": \"wait_for_data.purchases_ds\", \"spec\": \"data.purchases/ds={{ ds }}\", \"start\": null, \"end\": null}"],"tableProperties":{"source":"chronon"},"outputNamespace":"canary-443022.data","team":"quickstart","offlineSchedule":"@daily"},"sources":[{"events":{"table":"data.purchases","query":{"selects":{"user_id":"user_id","purchase_price":"purchase_price"},"timeColumn":"ts","setups":[]}}}],"keyColumns":["user_id"],"aggregations":[{"inputColumn":"purchase_price","operation":7,"argMap":{},"windows":[{"length":3,"timeUnit":1},{"length":14,"timeUnit":1},{"length":30,"timeUnit":1}]},{"inputColumn":"purchase_price","operation":6,"argMap":{},"windows":[{"length":3,"timeUnit":1},{"length":14,"timeUnit":1},{"length":30,"timeUnit":1}]},{"inputColumn":"purchase_price","operation":8,"argMap":{},"windows":[{"length":3,"timeUnit":1},{"length":14,"timeUnit":1},{"length":30,"timeUnit":1}]},{"inputColumn":"purchase_price","operation":13,"argMap":{"k":"10"}}],"backfillStartDate":"2023-11-01"})
25/01/30 18:19:20 INFO MetadataStore: Putting 1 configs to KV Store, dataset=CHRONON_METADATA
25/01/30 18:19:20 INFO BigTableKVStoreImpl: Performing multi-put for 1 requests
25/01/30 18:19:21 INFO MetadataStore: Putting metadata for
dataset: CHRONON_ENTITY_BY_TEAM
key: group_bys/quickstart
conf: List(purchases.v1_test)
25/01/30 18:19:21 INFO MetadataStore: Putting 1 configs to KV Store, dataset=CHRONON_ENTITY_BY_TEAM
25/01/30 18:19:21 INFO BigTableKVStoreImpl: Performing multi-put for 1 requests
25/01/30 18:19:21 INFO Driver$MetadataUploader$: Uploaded Chronon Configs to the KV store, success count = 2, failure count = 0
Job [ac577d8c-95e2-4dda-a863-2d9fb94f022f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/ac577d8c-95e2-4dda-a863-2d9fb94f022f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/ac577d8c-95e2-4dda-a863-2d9fb94f022f/driveroutput
jobUuid: ac577d8c-95e2-4dda-a863-2d9fb94f022f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: ac577d8c-95e2-4dda-a863-2d9fb94f022f
  projectId: canary-443022
sparkJob:
  args:
  - metadata-upload
  - --conf-path=purchases.v1_test
  - --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  - --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:19:22.536103Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:19:11.138458Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:19:11.162118Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:19:11.387274Z'
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe ac577d8c-95e2-4dda-a863-2d9fb94f022f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................FETCH.....................................>>>>>\033[0m'
<<<<<.....................................FETCH.....................................>>>>>
+ touch tmp_fetch.out
+ zipline run --mode fetch --type group-by --name quickstart/purchases.v1_test -k '{"user_id":"5"}'
+ tee /dev/tty tmp_fetch.out
+ grep -q purchase_price_average_14d
Running with args: {'mode': 'fetch', 'conf': None, 'env': 'dev', 'dataproc': False, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
--- [FETCHED RESULT] ---
{
  "purchase_price_average_14d" : 72.5,
  "purchase_price_average_30d" : 250.6,
  "purchase_price_average_3d" : null,
  "purchase_price_count_14d" : 2,
  "purchase_price_count_30d" : 5,
  "purchase_price_count_3d" : null,
  "purchase_price_last10" : [ 76, 69, 367, 466, 275 ],
  "purchase_price_sum_14d" : 145,
  "purchase_price_sum_30d" : 1253,
  "purchase_price_sum_3d" : null
}
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_fetch
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp/target/scala-2.12/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.spark.Driver fetch --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --type group-by --name quickstart/purchases.v1_test -k {"user_id":"5"}
+ cat tmp_fetch.out
+ grep purchase_price_average_14d
  "purchase_price_average_14d" : 72.5,
+ '[' 0 -ne 0 ']'

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Added a Python script to automate Zipline quickstart setup.
- Introduced a Bash script for managing Zipline workflow and Google Cloud Dataproc jobs.
- Enhanced error handling and logging in the Bash script for improved visibility during execution.
- Implemented automated data processing and configuration management.
Chores
- Set up environment preparation and job execution utilities for Zipline workflow.

coderabbitai · 2025-01-29T00:31:39Z

Walkthrough

The pull request introduces two new scripts, run_zipline_quickstart.py and run_zipline_quickstart.sh, designed to streamline the setup and execution of Zipline workflows. The Python script manages a temporary directory and launches a Bash script that automates Zipline configuration, data processing, and Google Cloud Dataproc job management.

Changes

File	Change Summary
`distribution/run_zipline_quickstart.py`	Added `main()` function to create temp directory and execute shell script
`distribution/run_zipline_quickstart.sh`	Added `check_dataproc_job_state()` function for job state management, implemented Zipline workflow automation

Possibly related PRs

Add script to upload cloud gcp jars to appropriate bucket. #203: The new script build_and_upload_gcp_artifacts.sh involves automation related to Google Cloud services, similar to the functionalities introduced in run_zipline_quickstart.sh.

Suggested reviewers

chewy-zlai
tchow-zlai

Poem

🚀 Zipline's quickstart dance begins,
Temp dirs and scripts, where magic spins,
Cloud jobs flow like river's might,
Automation takes its playful flight!
Code's symphony, a joyful grin 🌈

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (5)

distribution/run_zipline_quickstart.py (1)
10-11: Improve line continuation style.

Use parentheses for line continuation.
-        quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
-                                     , "run_zipline_quickstart.sh")
+        quickstart_sh = os.path.join(
+            os.path.dirname(os.path.realpath(__file__)),
+            "run_zipline_quickstart.sh"
+        )
distribution/run_zipline_quickstart.sh (4)
10-11: Remove unused color variable.

RED variable is defined but never used.
 GREEN='\033[0;32m'
-RED='\033[0;31m'
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

25-25: Fix environment variable assignments.

Split declaration and assignment to avoid masking return values.
-  export SPARK_HOME=$(pwd)/spark-3.5.4-bin-hadoop3
+  SPARK_HOME=$(pwd)/spark-3.5.4-bin-hadoop3
+  export SPARK_HOME
-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+PYTHONPATH_NEW="${PYTHONPATH}:$(pwd)"
+export PYTHONPATH="$PYTHONPATH_NEW"
Also applies to: 47-47

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 25-25: Declare and assign separately to avoid masking return values.

(SC2155)

13-13: Extract configuration values.

Move hard-coded values to configuration variables at the top.
+# Configuration
+WHEEL_VERSION="0.1.0.dev0"
+PROJECT_ID="canary-443022"
+DATASET="data"
+GIT_BRANCH="davidhan/canary"
+
-WHEEL_FILE="zipline_ai-0.1.0.dev0-py3-none-any.whl"
+WHEEL_FILE="zipline_ai-${WHEEL_VERSION}-py3-none-any.whl"
-bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
-bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+bq rm -f -t "${PROJECT_ID}:${DATASET}.quickstart_purchases_v1_test"
+bq rm -f -t "${PROJECT_ID}:${DATASET}.quickstart_purchases_v1_test_upload"
-git fetch origin davidhan/canary
-git checkout davidhan/canary
+git fetch origin "${GIT_BRANCH}"
+git checkout "${GIT_BRANCH}"
Also applies to: 16-17, 33-34

18-18: Address TODO comment about bigtable rows.

Implement bigtable row deletion or create a tracking issue.

Would you like me to help implement the bigtable row deletion or create a tracking issue?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between d0ccf56 and c44606c.

📒 Files selected for processing (2)

distribution/run_zipline_quickstart.py (1 hunks)
distribution/run_zipline_quickstart.sh (1 hunks)

🧰 Additional context used

🪛 Shellcheck (0.10.0)

distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 25-25: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 30-30: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 47-47: Declare and assign separately to avoid masking return values.

(SC2155)

distribution/run_zipline_quickstart.py

coderabbitai · 2025-01-30T18:30:33Z

distribution/run_zipline_quickstart.py

+def main():
+    with tempfile.TemporaryDirectory() as temp_dir:
+        print(f"Created temporary directory: {temp_dir}")
+
+        quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
+                                     , "run_zipline_quickstart.sh")
+
+        # run the bash script run_zipline_quickstart.sh subprocess command
+        # with the temporary directory as the argument
+        subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True)
+


🛠️ Refactor suggestion

Add error handling for subprocess execution.

Add try-except block to handle subprocess failures gracefully.

def main(): with tempfile.TemporaryDirectory() as temp_dir: print(f"Created temporary directory: {temp_dir}") quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)), "run_zipline_quickstart.sh") - subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True) + try: + subprocess.run(["bash", quickstart_sh, temp_dir], check=True) + except subprocess.CalledProcessError as e: + print(f"Script execution failed with exit code {e.returncode}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def main():

with tempfile.TemporaryDirectory() as temp_dir:

print(f"Created temporary directory: {temp_dir}")

quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))

, "run_zipline_quickstart.sh")

# run the bash script run_zipline_quickstart.sh subprocess command

# with the temporary directory as the argument

subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True)

def main():

with tempfile.TemporaryDirectory() as temp_dir:

print(f"Created temporary directory: {temp_dir}")

quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)),

"run_zipline_quickstart.sh")

# run the bash script run_zipline_quickstart.sh subprocess command

# with the temporary directory as the argument

try:

subprocess.run(["bash", quickstart_sh, temp_dir], check=True)

except subprocess.CalledProcessError as e:

print(f"Script execution failed with exit code {e.returncode}")

raise

coderabbitai · 2025-01-30T18:30:33Z

distribution/run_zipline_quickstart.sh

+
+
+WORKING_DIR=$1
+cd $WORKING_DIR


⚠️ Potential issue

Add error handling for cd commands.

Add error handling for directory changes.

-cd $WORKING_DIR +cd "$WORKING_DIR" || exit 1 -cd cananry-confs +cd cananry-confs || exit 1

Also applies to: 30-30

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

distribution/run_zipline_quickstart.sh

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

distribution/run_zipline_quickstart.sh (3)
11-11: Remove unused RED color variable.

The RED color variable is defined but never used.
-RED='\033[0;31m'
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

18-18: Address TODO comment.

The TODO comment about deleting bigtable rows needs to be implemented.

Would you like me to help implement the bigtable row deletion?

39-39: Declare and assign PYTHONPATH separately.

Split declaration and assignment to avoid masking return values.
-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+old_path="${PYTHONPATH}"
+export PYTHONPATH="${old_path}:$(pwd)"
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 60ab641 and ca1f98d.

📒 Files selected for processing (2)

distribution/run_zipline_quickstart.py (1 hunks)
distribution/run_zipline_quickstart.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

distribution/run_zipline_quickstart.py

🧰 Additional context used

📓 Learnings (1)

distribution/run_zipline_quickstart.sh (1)

Learnt from: david-zlai
PR: zipline-ai/chronon#292
File: distribution/run_zipline_quickstart.sh:72-75
Timestamp: 2025-01-30T19:37:27.646Z
Learning: The `gcloud dataproc jobs wait` command is a blocking call that waits for job completion, so adding an external timeout wrapper is unnecessary.

🪛 Shellcheck (0.10.0)

distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (5)

distribution/run_zipline_quickstart.sh (5)

8-8: Add error handling for cd command.

Add error handling for directory change.
-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

22-22: Add error handling for cd command.

Add error handling for directory change.
-cd cananry-confs
+cd cananry-confs || exit 1
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

42-56: LGTM! Job state checking implementation is solid.

Good error handling and proper use of gcloud dataproc jobs wait.

60-97: LGTM! Job execution flow is well-structured.

Good use of logging and proper job state verification.

99-99: LGTM! Clear success indication.

tchow-zlai · 2025-01-31T21:24:05Z

distribution/run_zipline_quickstart.sh

+#TODO: delete bigtable rows
+
+# Clone the cananry configs
+git clone [email protected]:zipline-ai/cananry-confs.git


maybe add an rm -rf cananry-confs before this? otherwise you just cd into it if it exists and git fetch

ah so the way the integration test should run is via the python file which creates the temp directory first that'll be used to do the full installation and runs.

temp directory should have nothing in it

distribution/run_zipline_quickstart.sh

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

distribution/run_zipline_quickstart.sh (2)
8-8: ⚠️ Potential issue

Add error handling for directory change.
-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

22-22: ⚠️ Potential issue

Add error handling for directory change.
-cd cananry-confs
+cd cananry-confs || exit 1
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🧹 Nitpick comments (3)

distribution/run_zipline_quickstart.sh (3)
11-11: Remove unused RED variable.

The RED color variable is defined but never used.

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

18-18: Implement bigtable rows deletion.

I can help implement the missing bigtable cleanup functionality.

39-39: Separate PYTHONPATH declaration and assignment.
-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+pwd_dir=$(pwd)
+export PYTHONPATH="${PYTHONPATH}:${pwd_dir}"
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ca1f98d and 28e8889.

📒 Files selected for processing (1)

distribution/run_zipline_quickstart.sh (1 hunks)

🧰 Additional context used

📓 Learnings (1)

distribution/run_zipline_quickstart.sh (1)

Learnt from: david-zlai
PR: zipline-ai/chronon#292
File: distribution/run_zipline_quickstart.sh:72-75
Timestamp: 2025-01-30T19:37:27.646Z
Learning: The `gcloud dataproc jobs wait` command is a blocking call that waits for job completion, so adding an external timeout wrapper is unnecessary.

🪛 Shellcheck (0.10.0)

distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (3)

distribution/run_zipline_quickstart.sh (3)

42-56: LGTM! Job state checking looks good.

The implementation correctly uses gcloud's built-in blocking wait.

60-97: LGTM! Command execution flow is well structured.

Good error handling and output verification.

99-99: LGTM! Clear success indication.

## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …

## Summary ``` python distribution/run_zipline_quiour clientsstart.py ``` This runs the full zipline suite of commands against a test quiour clientsstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quiour clientsstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quiour clientsstart_purchases_v1_test + bq rm -f -t canary-443022:data.quiour clientsstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), paour clients-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git cheour clientsout davidhan/canary branch 'davidhan/canary' set up to traour clients 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting cliour clients (from zipline-ai==0.1.0.dev0) Using cached cliour clients-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached cliour clients-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected paour clientsages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, cliour clients, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 cliour clients-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quiour clientsstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quiour clientsstart/purchases.py GroupBy Team - quiour clientsstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quiour clientsstart/purchases.v1 GroupBy Team - quiour clientsstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quiour clientsstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_baour clientsfill.out + zipline run --conf production/group_bys/quiour clientsstart/purchases.v1_test --dataproc + tee /dev/tty tmp_baour clientsfill.out Running with args: {'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-baour clientsfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-baour clientsfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-baour clientsfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-baour clientsfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_baour clientsfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + cheour clients_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTraour clientser 25/01/30 18:16:50 INFO SparkEnv: Registering Bloour clientsManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering Bloour clientsManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quiour clientsstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quiour clientsstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quiour clientsstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quiour clientsstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quiour clientsstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quiour clientsstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quiour clientsstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quiour clientsstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quiour clientsstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-baour clientsfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quiour clientsstart.purchases.v1_test_baour clientsfill progress: 1.0 state: FINISHED traour clientsingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quiour clientsstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + cheour clients_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTraour clientser 25/01/30 18:17:51 INFO SparkEnv: Registering Bloour clientsManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering Bloour clientsManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quiour clientsstart.quiour clientsstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quiour clientsstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quiour clientsstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quiour clientsstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quiour clientsstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quiour clientsstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quiour clientsstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quiour clientsstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED traour clientsingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quiour clientsstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + cheour clients_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quiour clientsstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quiour clientsstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kiour clientsing off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ …

tchow-zlai changed the base branch from main to tchow/avro-date January 30, 2025 01:31

tchow-zlai force-pushed the davidhan/zipline_integration_script branch from 6d35a1b to ee99668 Compare January 30, 2025 01:31

tchow-zlai force-pushed the tchow/avro-date branch from 68b8740 to d97daf4 Compare January 30, 2025 01:32

Base automatically changed from tchow/avro-date to main January 30, 2025 02:01

tchow-zlai force-pushed the davidhan/zipline_integration_script branch from ee99668 to 939858b Compare January 30, 2025 17:20

david-zlai force-pushed the davidhan/zipline_integration_script branch from 939858b to c44606c Compare January 30, 2025 18:28

david-zlai marked this pull request as ready for review January 30, 2025 18:29

coderabbitai bot reviewed Jan 30, 2025

View reviewed changes

david-zlai requested a review from tchow-zlai January 30, 2025 18:38

david-zlai added 7 commits January 30, 2025 18:53

Add script to run full zipline quickstart suite

7df020e

fix grep for job state

d540898

remove exit

f93bbb5

Testing script changes.

3767ba9

more integration test changes:

60524be

integration script changes:

51db258

uncomment

ca1f98d

david-zlai force-pushed the davidhan/zipline_integration_script branch from 60ab641 to ca1f98d Compare January 31, 2025 02:53

coderabbitai bot reviewed Jan 31, 2025

View reviewed changes

tchow-zlai reviewed Jan 31, 2025

View reviewed changes

david-zlai commented Jan 31, 2025

View reviewed changes

distribution/run_zipline_quickstart.sh Outdated Show resolved Hide resolved

Update distribution/run_zipline_quickstart.sh

28e8889

coderabbitai bot reviewed Jan 31, 2025

View reviewed changes

tchow-zlai approved these changes Jan 31, 2025

View reviewed changes

david-zlai merged commit 4fef52c into main Feb 1, 2025
4 checks passed

david-zlai deleted the davidhan/zipline_integration_script branch February 1, 2025 00:03

coderabbitai bot mentioned this pull request Feb 14, 2025

AWS Jar Distribution #373

Merged

4 tasks

This was referenced Mar 4, 2025

Remove --break-system-packages #464

Merged

Only use shell scripts to run integration tests. #485

Merged

coderabbitai bot mentioned this pull request Mar 19, 2025

Add missing lineage.thrift in wheel gen scripts #526

Merged

4 tasks

coderabbitai bot mentioned this pull request Mar 28, 2025

Fix aws tests due to missing dash #557

Merged

4 tasks



		WORKING_DIR=$1
		cd $WORKING_DIR

Add script to run full zipline quickstart suite #292

Add script to run full zipline quickstart suite #292

Uh oh!

Conversation

david-zlai commented Jan 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tchow-zlai Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

david-zlai Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

david-zlai commented Jan 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 29, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)