Skip to content

Conversation

@david-zlai
Copy link
Contributor

@david-zlai david-zlai commented Jan 29, 2025

Summary

python distribution/run_zipline_quickstart.py

This runs the full zipline suite of commands against a test quickstart groupby.

Example:

davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: We will wait for PT6H for the job to complete
25/01/30 18:18:49 INFO BigTableKVStoreImpl: Export job completed successfully
25/01/30 18:18:49 INFO Driver$GroupByUploadToKVBulkLoad$: Uploaded GroupByUpload data to KV store for GroupBy: quickstart.purchases.v1_test; partition: 2023-12-01 in 1 seconds
Job [c29097e9-b845-4ad7-843a-c89b622c5cfe] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c29097e9-b845-4ad7-843a-c89b622c5cfe/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c29097e9-b845-4ad7-843a-c89b622c5cfe/driveroutput
jobUuid: c29097e9-b845-4ad7-843a-c89b622c5cfe
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c29097e9-b845-4ad7-843a-c89b622c5cfe
  projectId: canary-443022
sparkJob:
  args:
  - groupby-upload-bulk-load
  - --conf-path=purchases.v1_test
  - --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  - --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl
  - --conf-type=group_bys
  - --partition-string=2023-12-01
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:49.641298Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:18:38.893434Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:18:38.924869Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:18:39.144132Z'
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<< .....................................METADATA-UPLOAD.....................................>>>>>\033[0m'
<<<<< .....................................METADATA-UPLOAD.....................................>>>>>
+ touch tmp_metadata_upload.out
+ zipline run --mode metadata-upload --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_metadata_upload.out
Running with args: {'mode': 'metadata-upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'metadata-upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(metadata-upload, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(metadata-upload, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: ac577d8c-95e2-4dda-a863-2d9fb94f022f
Dataproc submitter job id: ac577d8c-95e2-4dda-a863-2d9fb94f022f
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_metadata-upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter metadata-upload --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_metadata-upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter metadata-upload --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_metadata_upload.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ METADATA_UPLOAD_JOB_ID=ac577d8c-95e2-4dda-a863-2d9fb94f022f
+ check_dataproc_job_state ac577d8c-95e2-4dda-a863-2d9fb94f022f
+ JOB_ID=ac577d8c-95e2-4dda-a863-2d9fb94f022f
+ '[' -z ac577d8c-95e2-4dda-a863-2d9fb94f022f ']'
+ gcloud dataproc jobs wait ac577d8c-95e2-4dda-a863-2d9fb94f022f --region=us-central1
Waiting for job output...
25/01/30 18:19:15 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:19:15 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:19:17 INFO MetadataDirWalker: Uploading Chronon configs from purchases.v1_test
25/01/30 18:19:19 INFO MetadataStore: Creating dataset: CHRONON_METADATA
25/01/30 18:19:19 INFO BigTableKVStoreImpl: Table CHRONON_METADATA already exists
25/01/30 18:19:19 INFO MetadataStore: Successfully created dataset: CHRONON_METADATA
25/01/30 18:19:20 INFO MetadataStore: Creating dataset: CHRONON_ENTITY_BY_TEAM
25/01/30 18:19:20 INFO BigTableKVStoreImpl: Table CHRONON_ENTITY_BY_TEAM already exists
25/01/30 18:19:20 INFO MetadataStore: Successfully created dataset: CHRONON_ENTITY_BY_TEAM
25/01/30 18:19:20 ERROR ManagedChannelOrphanWrapper: *~*~*~ Previous channel ManagedChannelImpl{logId=43, target=bigtableadmin.googleapis.com:443} was garbage collected without being shut down! ~*~*~*
    Make sure to call shutdown()/shutdownNow()
java.lang.RuntimeException: ManagedChannel allocation site
        at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:102) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:60) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:51) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelImplBuilder.build(ManagedChannelImplBuilder.java:710) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.ForwardingChannelBuilder2.build(ForwardingChannelBuilder2.java:272) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:497) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:106) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:84) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:267) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:260) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:225) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.stub.EnhancedBigtableTableAdminStub.createEnhanced(EnhancedBigtableTableAdminStub.java:61) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.BigtableTableAdminClient.create(BigtableTableAdminClient.java:158) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.integrations.cloud_gcp.GcpApiImpl.genKvStore(GcpApiImpl.scala:65) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore(Driver.scala:621) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore$(Driver.scala:620) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$Args.metaDataStore(Driver.scala:749) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17$adapted(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at scala.collection.immutable.List.foreach(List.scala:431) ~[scala-library-2.12.18.jar:?]
        at ai.chronon.spark.Driver$MetadataUploader$.run(Driver.scala:760) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$.main(Driver.scala:1054) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver.main(Driver.scala) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1124) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1133) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.12-3.5.1.jar:3.5.1]
25/01/30 18:19:20 ERROR ManagedChannelOrphanWrapper: *~*~*~ Previous channel ManagedChannelImpl{logId=87, target=bigtableadmin.googleapis.com:443} was garbage collected without being shut down! ~*~*~*
    Make sure to call shutdown()/shutdownNow()
java.lang.RuntimeException: ManagedChannel allocation site
        at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:102) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:60) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:51) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.internal.ManagedChannelImplBuilder.build(ManagedChannelImplBuilder.java:710) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at io.grpc.ForwardingChannelBuilder2.build(ForwardingChannelBuilder2.java:272) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:497) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:106) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:84) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:267) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:260) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:225) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.stub.EnhancedBigtableTableAdminStub.createEnhanced(EnhancedBigtableTableAdminStub.java:61) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at com.google.cloud.bigtable.admin.v2.BigtableTableAdminClient.create(BigtableTableAdminClient.java:158) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.integrations.cloud_gcp.GcpApiImpl.genKvStore(GcpApiImpl.scala:65) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore(Driver.scala:621) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$OnlineSubcommand.metaDataStore$(Driver.scala:620) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$Args.metaDataStore(Driver.scala:749) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$MetadataUploader$.$anonfun$run$17$adapted(Driver.scala:760) ~[cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at scala.collection.immutable.List.foreach(List.scala:431) ~[scala-library-2.12.18.jar:?]
        at ai.chronon.spark.Driver$MetadataUploader$.run(Driver.scala:760) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver$.main(Driver.scala:1054) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at ai.chronon.spark.Driver.main(Driver.scala) [cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1032) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1124) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1133) [spark-core_2.12-3.5.1.jar:3.5.1]
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.12-3.5.1.jar:3.5.1]
25/01/30 18:19:20 INFO MetadataStore: Putting metadata for
dataset: CHRONON_METADATA
key: purchases.v1_test
conf: List({"metaData":{"name":"quickstart.purchases.v1_test","online":1,"customJson":"{\"lag\": 0, \"groupby_tags\": null, \"column_tags\": {}}","dependencies":["{\"name\": \"wait_for_data.purchases_ds\", \"spec\": \"data.purchases/ds={{ ds }}\", \"start\": null, \"end\": null}"],"tableProperties":{"source":"chronon"},"outputNamespace":"canary-443022.data","team":"quickstart","offlineSchedule":"@daily"},"sources":[{"events":{"table":"data.purchases","query":{"selects":{"user_id":"user_id","purchase_price":"purchase_price"},"timeColumn":"ts","setups":[]}}}],"keyColumns":["user_id"],"aggregations":[{"inputColumn":"purchase_price","operation":7,"argMap":{},"windows":[{"length":3,"timeUnit":1},{"length":14,"timeUnit":1},{"length":30,"timeUnit":1}]},{"inputColumn":"purchase_price","operation":6,"argMap":{},"windows":[{"length":3,"timeUnit":1},{"length":14,"timeUnit":1},{"length":30,"timeUnit":1}]},{"inputColumn":"purchase_price","operation":8,"argMap":{},"windows":[{"length":3,"timeUnit":1},{"length":14,"timeUnit":1},{"length":30,"timeUnit":1}]},{"inputColumn":"purchase_price","operation":13,"argMap":{"k":"10"}}],"backfillStartDate":"2023-11-01"})
25/01/30 18:19:20 INFO MetadataStore: Putting 1 configs to KV Store, dataset=CHRONON_METADATA
25/01/30 18:19:20 INFO BigTableKVStoreImpl: Performing multi-put for 1 requests
25/01/30 18:19:21 INFO MetadataStore: Putting metadata for
dataset: CHRONON_ENTITY_BY_TEAM
key: group_bys/quickstart
conf: List(purchases.v1_test)
25/01/30 18:19:21 INFO MetadataStore: Putting 1 configs to KV Store, dataset=CHRONON_ENTITY_BY_TEAM
25/01/30 18:19:21 INFO BigTableKVStoreImpl: Performing multi-put for 1 requests
25/01/30 18:19:21 INFO Driver$MetadataUploader$: Uploaded Chronon Configs to the KV store, success count = 2, failure count = 0
Job [ac577d8c-95e2-4dda-a863-2d9fb94f022f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/ac577d8c-95e2-4dda-a863-2d9fb94f022f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/ac577d8c-95e2-4dda-a863-2d9fb94f022f/driveroutput
jobUuid: ac577d8c-95e2-4dda-a863-2d9fb94f022f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: ac577d8c-95e2-4dda-a863-2d9fb94f022f
  projectId: canary-443022
sparkJob:
  args:
  - metadata-upload
  - --conf-path=purchases.v1_test
  - --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  - --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:19:22.536103Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:19:11.138458Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:19:11.162118Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:19:11.387274Z'
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe ac577d8c-95e2-4dda-a863-2d9fb94f022f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................FETCH.....................................>>>>>\033[0m'
<<<<<.....................................FETCH.....................................>>>>>
+ touch tmp_fetch.out
+ zipline run --mode fetch --type group-by --name quickstart/purchases.v1_test -k '{"user_id":"5"}'
+ tee /dev/tty tmp_fetch.out
+ grep -q purchase_price_average_14d
Running with args: {'mode': 'fetch', 'conf': None, 'env': 'dev', 'dataproc': False, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
--- [FETCHED RESULT] ---
{
  "purchase_price_average_14d" : 72.5,
  "purchase_price_average_30d" : 250.6,
  "purchase_price_average_3d" : null,
  "purchase_price_count_14d" : 2,
  "purchase_price_count_30d" : 5,
  "purchase_price_count_3d" : null,
  "purchase_price_last10" : [ 76, 69, 367, 466, 275 ],
  "purchase_price_sum_14d" : 145,
  "purchase_price_sum_30d" : 1253,
  "purchase_price_sum_3d" : null
}
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_fetch
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp/target/scala-2.12/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.spark.Driver fetch --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --type group-by --name quickstart/purchases.v1_test -k {"user_id":"5"}
+ cat tmp_fetch.out
+ grep purchase_price_average_14d
  "purchase_price_average_14d" : 72.5,
+ '[' 0 -ne 0 ']'

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Added a Python script to automate Zipline quickstart setup.
    • Introduced a Bash script for managing Zipline workflow and Google Cloud Dataproc jobs.
    • Enhanced error handling and logging in the Bash script for improved visibility during execution.
    • Implemented automated data processing and configuration management.
  • Chores

    • Set up environment preparation and job execution utilities for Zipline workflow.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2025

Walkthrough

The pull request introduces two new scripts, run_zipline_quickstart.py and run_zipline_quickstart.sh, designed to streamline the setup and execution of Zipline workflows. The Python script manages a temporary directory and launches a Bash script that automates Zipline configuration, data processing, and Google Cloud Dataproc job management.

Changes

File Change Summary
distribution/run_zipline_quickstart.py Added main() function to create temp directory and execute shell script
distribution/run_zipline_quickstart.sh Added check_dataproc_job_state() function for job state management, implemented Zipline workflow automation

Possibly related PRs

Suggested reviewers

  • chewy-zlai
  • tchow-zlai

Poem

🚀 Zipline's quickstart dance begins,
Temp dirs and scripts, where magic spins,
Cloud jobs flow like river's might,
Automation takes its playful flight!
Code's symphony, a joyful grin 🌈

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tchow-zlai tchow-zlai changed the base branch from main to tchow/avro-date January 30, 2025 01:31
@tchow-zlai tchow-zlai force-pushed the davidhan/zipline_integration_script branch from 6d35a1b to ee99668 Compare January 30, 2025 01:31
Base automatically changed from tchow/avro-date to main January 30, 2025 02:01
@tchow-zlai tchow-zlai force-pushed the davidhan/zipline_integration_script branch from ee99668 to 939858b Compare January 30, 2025 17:20
@david-zlai david-zlai force-pushed the davidhan/zipline_integration_script branch from 939858b to c44606c Compare January 30, 2025 18:28
@david-zlai david-zlai marked this pull request as ready for review January 30, 2025 18:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
distribution/run_zipline_quickstart.py (1)

10-11: Improve line continuation style.

Use parentheses for line continuation.

-        quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
-                                     , "run_zipline_quickstart.sh")
+        quickstart_sh = os.path.join(
+            os.path.dirname(os.path.realpath(__file__)),
+            "run_zipline_quickstart.sh"
+        )
distribution/run_zipline_quickstart.sh (4)

10-11: Remove unused color variable.

RED variable is defined but never used.

 GREEN='\033[0;32m'
-RED='\033[0;31m'
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


25-25: Fix environment variable assignments.

Split declaration and assignment to avoid masking return values.

-  export SPARK_HOME=$(pwd)/spark-3.5.4-bin-hadoop3
+  SPARK_HOME=$(pwd)/spark-3.5.4-bin-hadoop3
+  export SPARK_HOME
-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+PYTHONPATH_NEW="${PYTHONPATH}:$(pwd)"
+export PYTHONPATH="$PYTHONPATH_NEW"

Also applies to: 47-47

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 25-25: Declare and assign separately to avoid masking return values.

(SC2155)


13-13: Extract configuration values.

Move hard-coded values to configuration variables at the top.

+# Configuration
+WHEEL_VERSION="0.1.0.dev0"
+PROJECT_ID="canary-443022"
+DATASET="data"
+GIT_BRANCH="davidhan/canary"
+
-WHEEL_FILE="zipline_ai-0.1.0.dev0-py3-none-any.whl"
+WHEEL_FILE="zipline_ai-${WHEEL_VERSION}-py3-none-any.whl"
-bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
-bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+bq rm -f -t "${PROJECT_ID}:${DATASET}.quickstart_purchases_v1_test"
+bq rm -f -t "${PROJECT_ID}:${DATASET}.quickstart_purchases_v1_test_upload"
-git fetch origin davidhan/canary
-git checkout davidhan/canary
+git fetch origin "${GIT_BRANCH}"
+git checkout "${GIT_BRANCH}"

Also applies to: 16-17, 33-34


18-18: Address TODO comment about bigtable rows.

Implement bigtable row deletion or create a tracking issue.

Would you like me to help implement the bigtable row deletion or create a tracking issue?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between d0ccf56 and c44606c.

📒 Files selected for processing (2)
  • distribution/run_zipline_quickstart.py (1 hunks)
  • distribution/run_zipline_quickstart.sh (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 25-25: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 30-30: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 47-47: Declare and assign separately to avoid masking return values.

(SC2155)

Comment on lines +6 to +16
def main():
with tempfile.TemporaryDirectory() as temp_dir:
print(f"Created temporary directory: {temp_dir}")

quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
, "run_zipline_quickstart.sh")

# run the bash script run_zipline_quickstart.sh subprocess command
# with the temporary directory as the argument
subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for subprocess execution.

Add try-except block to handle subprocess failures gracefully.

 def main():
     with tempfile.TemporaryDirectory() as temp_dir:
         print(f"Created temporary directory: {temp_dir}")
 
         quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)),
                                    "run_zipline_quickstart.sh")
 
-        subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True)
+        try:
+            subprocess.run(["bash", quickstart_sh, temp_dir], check=True)
+        except subprocess.CalledProcessError as e:
+            print(f"Script execution failed with exit code {e.returncode}")
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main():
with tempfile.TemporaryDirectory() as temp_dir:
print(f"Created temporary directory: {temp_dir}")
quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
, "run_zipline_quickstart.sh")
# run the bash script run_zipline_quickstart.sh subprocess command
# with the temporary directory as the argument
subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True)
def main():
with tempfile.TemporaryDirectory() as temp_dir:
print(f"Created temporary directory: {temp_dir}")
quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)),
"run_zipline_quickstart.sh")
# run the bash script run_zipline_quickstart.sh subprocess command
# with the temporary directory as the argument
try:
subprocess.run(["bash", quickstart_sh, temp_dir], check=True)
except subprocess.CalledProcessError as e:
print(f"Script execution failed with exit code {e.returncode}")
raise



WORKING_DIR=$1
cd $WORKING_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for cd commands.

Add error handling for directory changes.

-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
-cd cananry-confs
+cd cananry-confs || exit 1

Also applies to: 30-30

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

@david-zlai david-zlai requested a review from tchow-zlai January 30, 2025 18:38
@david-zlai david-zlai force-pushed the davidhan/zipline_integration_script branch from 60ab641 to ca1f98d Compare January 31, 2025 02:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
distribution/run_zipline_quickstart.sh (3)

11-11: Remove unused RED color variable.

The RED color variable is defined but never used.

-RED='\033[0;31m'
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


18-18: Address TODO comment.

The TODO comment about deleting bigtable rows needs to be implemented.

Would you like me to help implement the bigtable row deletion?


39-39: Declare and assign PYTHONPATH separately.

Split declaration and assignment to avoid masking return values.

-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+old_path="${PYTHONPATH}"
+export PYTHONPATH="${old_path}:$(pwd)"
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 60ab641 and ca1f98d.

📒 Files selected for processing (2)
  • distribution/run_zipline_quickstart.py (1 hunks)
  • distribution/run_zipline_quickstart.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/run_zipline_quickstart.py
🧰 Additional context used
📓 Learnings (1)
distribution/run_zipline_quickstart.sh (1)
Learnt from: david-zlai
PR: zipline-ai/chronon#292
File: distribution/run_zipline_quickstart.sh:72-75
Timestamp: 2025-01-30T19:37:27.646Z
Learning: The `gcloud dataproc jobs wait` command is a blocking call that waits for job completion, so adding an external timeout wrapper is unnecessary.
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (5)
distribution/run_zipline_quickstart.sh (5)

8-8: Add error handling for cd command.

Add error handling for directory change.

-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


22-22: Add error handling for cd command.

Add error handling for directory change.

-cd cananry-confs
+cd cananry-confs || exit 1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


42-56: LGTM! Job state checking implementation is solid.

Good error handling and proper use of gcloud dataproc jobs wait.


60-97: LGTM! Job execution flow is well-structured.

Good use of logging and proper job state verification.


99-99: LGTM! Clear success indication.

#TODO: delete bigtable rows

# Clone the cananry configs
git clone [email protected]:zipline-ai/cananry-confs.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an rm -rf cananry-confs before this? otherwise you just cd into it if it exists and git fetch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so the way the integration test should run is via the python file which creates the temp directory first that'll be used to do the full installation and runs.

temp directory should have nothing in it

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
distribution/run_zipline_quickstart.sh (2)

8-8: ⚠️ Potential issue

Add error handling for directory change.

-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


22-22: ⚠️ Potential issue

Add error handling for directory change.

-cd cananry-confs
+cd cananry-confs || exit 1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🧹 Nitpick comments (3)
distribution/run_zipline_quickstart.sh (3)

11-11: Remove unused RED variable.

The RED color variable is defined but never used.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


18-18: Implement bigtable rows deletion.

I can help implement the missing bigtable cleanup functionality.


39-39: Separate PYTHONPATH declaration and assignment.

-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+pwd_dir=$(pwd)
+export PYTHONPATH="${PYTHONPATH}:${pwd_dir}"
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ca1f98d and 28e8889.

📒 Files selected for processing (1)
  • distribution/run_zipline_quickstart.sh (1 hunks)
🧰 Additional context used
📓 Learnings (1)
distribution/run_zipline_quickstart.sh (1)
Learnt from: david-zlai
PR: zipline-ai/chronon#292
File: distribution/run_zipline_quickstart.sh:72-75
Timestamp: 2025-01-30T19:37:27.646Z
Learning: The `gcloud dataproc jobs wait` command is a blocking call that waits for job completion, so adding an external timeout wrapper is unnecessary.
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (3)
distribution/run_zipline_quickstart.sh (3)

42-56: LGTM! Job state checking looks good.

The implementation correctly uses gcloud's built-in blocking wait.


60-97: LGTM! Command execution flow is well structured.

Good error handling and output verification.


99-99: LGTM! Clear success indication.

@david-zlai david-zlai merged commit 4fef52c into main Feb 1, 2025
4 checks passed
@david-zlai david-zlai deleted the davidhan/zipline_integration_script branch February 1, 2025 00:03
nikhil-zlai pushed a commit that referenced this pull request Feb 4, 2025
## Summary

```
python distribution/run_zipline_quickstart.py
```

This runs the full zipline suite of commands against a test quickstart
groupby.

Example:
```
davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
@coderabbitai coderabbitai bot mentioned this pull request Feb 14, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Mar 28, 2025
4 tasks
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary

```
python distribution/run_zipline_quickstart.py
```

This runs the full zipline suite of commands against a test quickstart
groupby.

Example:
```
davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary

```
python distribution/run_zipline_quickstart.py
```

This runs the full zipline suite of commands against a test quickstart
groupby.

Example:
```
davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

```
python distribution/run_zipline_quickstart.py
```

This runs the full zipline suite of commands against a test quickstart
groupby.

Example:
```
davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

```
python distribution/run_zipline_quickstart.py
```

This runs the full zipline suite of commands against a test quickstart
groupby.

Example:
```
davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git checkout davidhan/canary
branch 'davidhan/canary' set up to track 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting click (from zipline-ai==0.1.0.dev0)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quickstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1
             GroupBy Team - quickstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_backfill.out
+ zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_backfill.out
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_backfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quickstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-backfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quickstart.purchases.v1_test_backfill
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quickstart.quickstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quickstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quickstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
           {"qualifierString": "value", "fieldName": ""}
        ]
      }
   ]
}'''
) AS
SELECT
  CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey,
  value_bytes as cf,
  TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP
FROM canary-443022.data.quickstart_purchases_v1_test_upload
WHERE ds = '2023-12-01'

25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1
25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

```
python distribution/run_zipline_quiour clientsstart.py
```

This runs the full zipline suite of commands against a test quiour clientsstart
groupby.

Example:
```
davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quiour clientsstart.py 
Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l
+ GREEN='\033[0;32m'
+ RED='\033[0;31m'
+ WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl
+ bq rm -f -t canary-443022:data.quiour clientsstart_purchases_v1_test
+ bq rm -f -t canary-443022:data.quiour clientsstart_purchases_v1_test_upload
+ '[' -z '' ']'
+ wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
--2025-01-30 10:16:21--  https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400879762 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’

spark-3.5.4-bin-hadoop3.tgz                                 100%[==========================================================================================================================================>] 382.31M  50.2MB/s    in 8.4s    

2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762]

+ tar -xzf spark-3.5.4-bin-hadoop3.tgz
++ pwd
+ export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3
+ git clone [email protected]:zipline-ai/cananry-confs.git
Cloning into 'cananry-confs'...
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 148 (delta 63), reused 139 (delta 60), paour clients-reused 0 (from 0)
Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
+ cd cananry-confs
+ git fetch origin davidhan/canary
From github.com:zipline-ai/cananry-confs
 * branch            davidhan/canary -> FETCH_HEAD
+ git cheour clientsout davidhan/canary
branch 'davidhan/canary' set up to traour clients 'origin/davidhan/canary'.
Switched to a new branch 'davidhan/canary'
+ python3 -m venv tmp_chronon
+ source tmp_chronon/bin/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ case "$(uname)" in
+++ uname
++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon
++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin
++ export PATH
++ VIRTUAL_ENV_PROMPT=tmp_chronon
++ export VIRTUAL_ENV_PROMPT
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(tmp_chronon) '
++ export PS1
++ hash -r
+ gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl .
Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl
  Completed files 1/1 | 371.1kiB/371.1kiB                                                                                                                                                                                                     
+ pip uninstall zipline-ai
WARNING: Skipping zipline-ai as it is not installed.
+ pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl
Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl
Collecting cliour clients (from zipline-ai==0.1.0.dev0)
  Using cached cliour clients-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0)
  Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl
Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0)
  Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB)
Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached google_crc32c-1.6.0-py3-none-any.whl
Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0)
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB)
Using cached cliour clients-8.1.8-py3-none-any.whl (98 kB)
Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB)
Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB)
Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Installing collected paour clientsages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, cliour clients, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai
Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 cliour clients-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0

[notice] A new release of pip is available: 24.2 -> 25.0
[notice] To update, run: pip install --upgrade pip
++ pwd
+ export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
+ DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id'
+ echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m'
<<<<<.....................................COMPILE.....................................>>>>>
+ zipline compile --conf=group_bys/quiour clientsstart/purchases.py
  Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs
     Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quiour clientsstart/purchases.py
             GroupBy Team - quiour clientsstart
             GroupBy Name - purchases.v1
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quiour clientsstart/purchases.v1
             GroupBy Team - quiour clientsstart
             GroupBy Name - purchases.v1_test
       Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quiour clientsstart/purchases.v1_test
Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production
+ echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m'
<<<<<.....................................BACKFILL.....................................>>>>>
+ touch tmp_baour clientsfill.out
+ zipline run --conf production/group_bys/quiour clientsstart/purchases.v1_test --dataproc
+ tee /dev/tty tmp_baour clientsfill.out
Running with args: {'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-baour clientsfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-baour clientsfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b

Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-baour clientsfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-baour clientsfill --conf-path=purchases.v1_test --end-date=2025-01-30  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_baour clientsfill.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ cheour clients_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b
+ JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b
+ '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']'
+ gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1
Waiting for job output...
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse
25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTraour clientser
25/01/30 18:16:50 INFO SparkEnv: Registering Bloour clientsManagerMaster
25/01/30 18:16:50 INFO SparkEnv: Registering Bloour clientsManagerMasterHeartbeat
25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:16:51 INFO Configuration: resource-types.xml not found
25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011
25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:16:55 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quiour clientsstart_purchases_v1_test is not reachable. Returning empty partitions.
2025/01/30 18:17:15 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:15 INFO  TableUtils.scala:622 - 
Unfilled range computation:
   Output table: canary-443022.data.quiour clientsstart_purchases_v1_test
   Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Input tables: data.purchases
   Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30]
   Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30]
   Unfilled ranges: [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30])
2025/01/30 18:17:15 INFO  GroupBy.scala:738 - Group By ranges to compute: 
    [2023-11-01...2023-11-30]

2025/01/30 18:17:15 INFO  GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1]
2025/01/30 18:17:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quiour clientsstart.purchases.v1_test]----
2025/01/30 18:17:20 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:17:20 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-11-01...2023-11-30]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-11-30]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-11-30]
   intersected range: [2023-11-01...2023-11-30]

2025/01/30 18:17:20 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:17:20 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-11-30])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:17:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-11-30'

2025/01/30 18:17:20 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-11-01 00:00:00
2025/01/30 18:17:20 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:17:22 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:17:25 INFO  TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quiour clientsstart_purchases_v1_test
2025/01/30 18:17:25 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quiour clientsstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition
2025/01/30 18:17:25 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:17:33 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quiour clientsstart_purchases_v1_test
2025/01/30 18:17:33 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quiour clientsstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33
2025/01/30 18:17:33 INFO  GroupBy.scala:757 - Wrote to table canary-443022.data.quiour clientsstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30]
2025/01/30 18:17:33 INFO  GroupBy.scala:759 - Wrote to table canary-443022.data.quiour clientsstart_purchases_v1_test for range: [2023-11-01...2023-11-30]
Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput
jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: 945d836f-20d8-4768-97fb-0889c00ed87b
  projectId: canary-443022
sparkJob:
  args:
  - group-by-baour clientsfill
  - --conf-path=purchases.v1_test
  - --end-date=2025-01-30
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:17:38.722934Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:16:43.326557Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:16:43.353624Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:16:43.597231Z'
yarnApplications:
- name: groupBy_quiour clientsstart.purchases.v1_test_baour clientsfill
  progress: 1.0
  state: FINISHED
  traour clientsingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m'
<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>
+ touch tmp_gbu.out
+ zipline run --mode upload --conf production/group_bys/quiour clientsstart/purchases.v1_test --ds 2023-12-01 --dataproc
+ tee /dev/tty tmp_gbu.out
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quiour clientsstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <default_env> setting EXECUTOR_CORES=1
From <default_env> setting EXECUTOR_MEMORY=8G
From <default_env> setting PARALLELISM=1000
From <default_env> setting MAX_EXECUTORS=1000
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quiour clientsstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01  --conf-type=group_bys    --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_gbu.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ cheour clients_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f
+ JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f
+ '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']'
+ gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1
Waiting for job output...
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse
25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml
25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTraour clientser
25/01/30 18:17:51 INFO SparkEnv: Registering Bloour clientsManagerMaster
25/01/30 18:17:51 INFO SparkEnv: Registering Bloour clientsManagerMasterHeartbeat
25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator
25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics
25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032
25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200
25/01/30 18:17:52 INFO Configuration: resource-types.xml not found
25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'.
25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012
25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030
25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ]
2025/01/30 18:17:56 INFO  SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration
2025/01/30 18:17:57 INFO  GroupByUpload.scala:229 - 
GroupBy upload for: quiour clientsstart.quiour clientsstart.purchases.v1_test
Accuracy: SNAPSHOT
Data Model: Events

2025/01/30 18:17:57 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quiour clientsstart.purchases.v1_test]----
2025/01/30 18:18:14 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:14 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:14 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:14 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:14 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:14 INFO  HopsAggregator.scala:147 - Left bounds: 1d->unbounded 
minQueryTs = 2023-12-01 00:00:00
2025/01/30 18:18:14 INFO  FastHashing.scala:52 - Generating key builder over keys:
  bigint : user_id

2025/01/30 18:18:15 INFO  KvRdd.scala:102 - 
key schema:
  {
  "type" : "record",
  "name" : "Key",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "user_id",
    "type" : [ "null", "long" ],
    "doc" : ""
  } ]
}
value schema:
  {
  "type" : "record",
  "name" : "Value",
  "namespace" : "ai.chronon.data",
  "doc" : "",
  "fields" : [ {
    "name" : "purchase_price_sum_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_sum_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_3d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_14d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_count_30d",
    "type" : [ "null", "long" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_3d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_14d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_average_30d",
    "type" : [ "null", "double" ],
    "doc" : ""
  }, {
    "name" : "purchase_price_last10",
    "type" : [ "null", {
      "type" : "array",
      "items" : "long"
    } ],
    "doc" : ""
  } ]
}

2025/01/30 18:18:15 INFO  GroupBy.scala:492 - 
----[Processing GroupBy: quiour clientsstart.purchases.v1_test]----
2025/01/30 18:18:19 INFO  TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases
2025/01/30 18:18:19 INFO  GroupBy.scala:618 - 
Computing intersected range as:
   query range: [2023-12-01...2023-12-01]
   query window: None
   source table: data.purchases
   source data range: [2023-11-01...2023-12-01]
   source start/end: null/null
   source data model: Events
   queryable data range: [null...2023-12-01]
   intersected range: [2023-11-01...2023-12-01]

2025/01/30 18:18:19 INFO  GroupBy.scala:658 - 
Time Mapping: Some((ts,ts))

2025/01/30 18:18:19 INFO  GroupBy.scala:668 - 
Rendering source query:
   intersected/effective scan range: Some([2023-11-01...2023-12-01])
   partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01')
   metaColumns: Map(ds -> null, ts -> ts)

2025/01/30 18:18:20 INFO  TableUtils.scala:759 -  Scanning data:
  table: data.purchases
  options: Map()
  format: Some(bigquery)
  selects:
    `ds`
    `ts`
    `user_id`
    `purchase_price`
  wheres:
    
  partition filters:
    ds >= '2023-11-01',
    ds <= '2023-12-01'

2025/01/30 18:18:20 INFO  GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined.
2025/01/30 18:18:20 INFO  GroupByUpload.scala:188 - 
Built GroupByServingInfo for quiour clientsstart.purchases.v1_test:
table: data.purchases / data-model: Events
     keySchema: Success(struct<user_id:bigint>)
   valueSchema: Success(struct<purchase_price:bigint>)
mutationSchema: Failure(java.lang.NullPointerException)
   inputSchema: Failure(java.lang.NullPointerException)
selectedSchema: Success(struct<purchase_price:bigint>)
  streamSchema: Failure(java.lang.NullPointerException)

2025/01/30 18:18:20 INFO  TableUtils.scala:459 - Repartitioning before writing...
2025/01/30 18:18:24 INFO  TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quiour clientsstart_purchases_v1_test_upload
2025/01/30 18:18:24 INFO  TableUtils.scala:531 - repartitioning data for table canary-443022.data.quiour clientsstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition
2025/01/30 18:18:24 INFO  TableUtils.scala:536 - Sorting within partitions with cols: List(ds)
2025/01/30 18:18:30 INFO  TableUtils.scala:469 - Finished writing to canary-443022.data.quiour clientsstart_purchases_v1_test_upload
2025/01/30 18:18:30 INFO  TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quiour clientsstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30
Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully.
done: true
driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/
driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput
jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f
placement:
  clusterName: zipline-canary-cluster
  clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0
reference:
  jobId: c672008e-7380-4a82-a121-4bb0cb46503f
  projectId: canary-443022
sparkJob:
  args:
  - group-by-upload
  - --conf-path=purchases.v1_test
  - --end-date=2023-12-01
  - --conf-type=group_bys
  - --additional-conf-path=additional-confs.yaml
  - --is-gcp
  - --gcp-project-id=canary-443022
  - --gcp-bigtable-instance-id=zipline-canary-instance
  fileUris:
  - gs://zipline-warehouse-canary/metadata/purchases.v1_test
  - gs://zipline-artifacts-canary/confs/additional-confs.yaml
  jarFileUris:
  - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
  mainClass: ai.chronon.spark.Driver
status:
  state: DONE
  stateStartTime: '2025-01-30T18:18:33.742458Z'
statusHistory:
- state: PENDING
  stateStartTime: '2025-01-30T18:17:44.197477Z'
- state: SETUP_DONE
  stateStartTime: '2025-01-30T18:17:44.223246Z'
- details: Agent reported job success
  state: RUNNING
  stateStartTime: '2025-01-30T18:17:44.438240Z'
yarnApplications:
- name: group-by-upload
  progress: 1.0
  state: FINISHED
  traour clientsingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m'
<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>
+ touch tmp_upload_to_kv.out
+ zipline run --mode upload-to-kv --conf production/group_bys/quiour clientsstart/purchases.v1_test --partition-string=2023-12-01 --dataproc
+ tee /dev/tty tmp_upload_to_kv.out
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quiour clientsstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quiour clientsstart.purchases.v1_test
From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary.
Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --partition-string=2023-12-01  --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
++ cat tmp_upload_to_kv.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ cheour clients_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe
+ JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe
+ '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']'
+ gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1
Waiting for job output...
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead.
25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quiour clientsstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quiour clientsstart_purchases_v1_test_upload
25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kiour clientsing off bulkLoad with query:

EXPORT DATA OPTIONS (
  format='CLOUD_BIGTABLE',
  overwrite=true,
  uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH",
  bigtable_options='''{
   "columnFamilies" : [
      {
        "familyId": "cf",
        "encoding": "BINARY",
        "columns": [
       …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants