Spark opt: only carry required columns for join #581

varant-zlai · 2025-04-02T23:32:03Z

Summary

Adds a flag to turn on "required columns only" for the merge job, to optimize for cases that left has large non-required-for-join columns that we don't want to carry on each shuffle.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Introduced configurable options for managing join operations and internal identifiers, offering greater flexibility.
Refactor
- Enhanced the data merging process with improved handling of required columns and unique identifiers to ensure more robust join operations.
Bug Fixes
- Adjusted error handling for empty data scenarios to log warnings instead of triggering failures.
Tests
- Updated test configurations to validate and reflect the improved data processing and join logic, including new schema elements.

…table. (#312) …table. ## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Logging** - Adjusted log levels for table accessibility and partition availability messages from error to info.

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update   ## Summary by CodeRabbit - **New Features** - Added `bazelisk` plugin with version `1.25.0` - Configured `bazelisk` plugin repository and commit hash  Co-authored-by: Thomas Chow <[email protected]>

## Summary ^^^ ``` davidhan@Mac: ~/zipline/chronon/api/py/test/sample (davidhan/streaming_verb) $ echo $RUN_PY /Users/davidhan/zipline/chronon/api/py/ai/chronon/repo/run.py davidhan@Mac: ~/zipline/chronon/api/py/test/sample (davidhan/streaming_verb) $ python $RUN_PY --mode streaming --dataproc --groupby-name=etsy.listing_canary.actions_v1 --kafka-bootstrap=bootstrap.zipline-kafka-cluster.us-central1.managedkafka.canary-443022.cloud.goog:9092 ``` Produced job: https://console.cloud.google.com/dataproc/jobs/db59b95d-adae-4802-8d96-5613b0799c04/monitoring?region=us-central1&project=canary-443022 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added support for Flink streaming jobs alongside existing Spark jobs. - Introduced new command-line options for Flink-specific job configurations. - Enhanced job submission capabilities for both Spark and Flink job types. - **Improvements** - Refined argument handling for job submissions. - Updated build and upload scripts to include Flink JAR artifacts. - **Technical Updates** - Introduced new job type enumeration. - Added methods to support Flink streaming job execution. - Enhanced clarity and maintainability of argument handling in job submission logic.

## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …

## Summary Fixed all the test failures after bazel migration. Known test failures are 3 tests in Spark which deal with Resource loading of test data so we plan to temporarily disable them for now, also we have a way out to fix them after killing sbt. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit Based on the comprehensive summary of changes, here are the concise release notes: ## Release Notes - **New Features** - Added support for Bazelisk installation in Docker environment - Enhanced Scala and Java dependency management - Expanded Protobuf and gRPC support - Introduced new testing utilities and frameworks - **Dependency Updates** - Updated Flink, Spark, and Kafka-related dependencies - Added new Maven artifacts for Avro, Thrift, and testing libraries - Upgraded various library versions - **Testing Improvements** - Introduced `scala_junit_suite` for more flexible test execution - Added new test resources and configurations - Enhanced test coverage and dependency management - **Build System** - Updated Bazel build configurations - Improved dependency resolution and repository management - Added new build rules and scripts - **Code Quality** - Refactored package imports and type conversions - Improved code formatting and readability - Streamlined dependency handling across modules  --------- Co-authored-by: nikhil-zlai <[email protected]>

## Summary Some updates to get our Flink jobs running on the Etsy side: * Configure schema registry via host/port/scheme instead of URL * Explicitly set the task slots per task manager * Configure checkpoint directory based on teams.json ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - Kicked off the job on the Etsy cluster and confirmed it's up and running - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced an enhanced configuration option for Flink job submissions by adding a state URI parameter for improved job state management. - Expanded schema registry configuration, enabling greater flexibility with host, port, and scheme settings. - **Chores** - Adjusted logging levels and refined error messaging to support better troubleshooting. - **Documentation** - Updated configuration guidance to aid in setting up schema registry integration.

## Summary - This migrates over our artifact upload + run.py to leverage bazel-built jars. Only for the batch side for now, streaming will follow. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Updated default JAR file names for online and Dataproc submissions. - Migrated build process from `sbt` to `bazel` for GCP artifact generation. - Added new `submitter` binary target for Dataproc submission. - Added dependency for Scala-specific features of the Jackson library in the online library.   --------- Co-authored-by: Thomas Chow <[email protected]>

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update   ## Summary by CodeRabbit - **Chores** - Streamlined the build process with more consistent naming conventions for cloud deployment artifacts. - **New Features** - Enhanced support for macOS environments by introducing platform-specific handling during the build, ensuring improved compatibility.  Co-authored-by: Thomas Chow <[email protected]>

## Summary To add missing dependencies for flink module coming from our recent changes to keep it in sync with sbt Tested locally by running Flink jobs from DataprocSubmitterTest ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Expanded Kafka integration with enhanced authentication, client functionality, and Protobuf serialization support. - Improved JSON processing support for Scala-based operations. - Adjusted dependency versions to ensure better compatibility and stability with Kafka and cloud services.

## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability.  --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>

## Summary Update the Flink job code on the tiling path to use the TileKey. I haven't wired up the KV store side of things yet (can do the write and read side of the KV store collaboratively with Thomas as they need to go together to keep the tests happy). The tiling version of the Flink job isn't in use so these changes should be safe to go and keeps things incremental. ## Checklist - [ ] Added Unit Tests - [X] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added a utility to determine the start timestamp for a defined time window. - **Refactor/Enhancements** - Streamlined time window handling by providing a default one-day resolution when none is specified. - Improved tiled data processing with consistent tiling window sizing and enriched metadata management. - **Tests** - Updated integration tests to validate the new tile processing and time window behavior.

## Summary Updated dev notes with instructions for Bazel setup and some useful commands. Also updated bazel target names so the intemediate/uber jar names don't conflict ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Refactor** - Streamlined naming conventions and updated dependency references across multiple project modules for improved consistency. - **Documentation** - Expanded build documentation with a new, comprehensive Bazel Setup section detailing configuration, caching, testing, and deployment instructions. - **Build System Enhancements** - Introduced updated source-generation rules to support future multi-language integration and more robust build workflows. - **Workflow Updates** - Modified test target names in CI workflows to reflect updated naming conventions, enhancing clarity and consistency in test execution.

… fix (#322) ## Summary Updated dev notes with Bazel installation instructions and java error fix ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Documentation** - Enhanced setup guidance for Bazel installation on both Mac and Linux. - Provided clear instructions for resolving Java-related issues on Mac. - Updated testing procedures by replacing previous instructions with streamlined Bazel commands.

## Summary Increased timeout and java heap size for spark tests to avoid flaky test failures ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Tests** - Extended test timeout settings to 900 seconds for enhanced testing robustness. - Updated job names and workflow references for better clarity and consistency in testing workflows.

## Summary Add jvm_binary targets for service and hub modules to build final assembly jars for deployment ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Updated the dependency supporting Apache Spark functionality to boost backend data processing efficiency.

## Summary Remove flink streaming scala dependency as we no longer need it otherwise we will run into runtime error saying flink-shaded-guava package not found ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Improved dependency handling to reduce the risk of runtime errors such as class incompatibilities.

## Summary Solves sync failures ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Chores** - Streamlined the Java build configuration by removing legacy integration and testing support.

## Summary Hit some errors as our Spark deps pull in rocksdbjni 8.3.2 whereas we expect an older version in Flink (6.20.3-ververica-2.0). As we rely on user class first it seems like this newer version gets priority and when Flink is closing tiles we hit an error - ``` 2025-02-05 21:14:53,614 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Tiling for etsy.listing_canary.actions_v1 -> (Tiling Side Output Late Data for etsy.listing_canary.actions_v1, Avro conversion for etsy.listing_canary.actions_v1 -> async kvstore writes for etsy.listing_canary.actions_v1 -> Sink: Metrics Sink for etsy.listing_canary.actions_v1) (2/3) (a107444db4dad3eb79d9d02631d8696e_5627cd3c4e8c9c02fa4f114c4b3607f4_1_56) switched from RUNNING to FAILED on container_1738197659103_0039_01_000004 @ zipline-canary-cluster-w-1.us-central1-c.c.canary-443022.internal (dataPort=33465). java.lang.NoSuchMethodError: 'void org.rocksdb.WriteBatch.remove(org.rocksdb.ColumnFamilyHandle, byte[])' at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.remove(RocksDBWriteBatchWrapper.java:105) ``` Yanked it out from the two jars and confirmed that the Flink job seems to be running fine + crossing over across hours (and hence tile closures). ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - Chores - Made internal adjustments to dependency management to improve compatibility between libraries and enhance overall application stability.

Reverts #330 It's breaking our build as we are using `java_test_suite` for service_commons module  ## Summary by CodeRabbit - **New Features** - Enhanced configuration to support improved Java testing capabilities. - Expanded build system functionality with additional integrations for JVM-based test suites.

## Summary Some of the benefits using LayerChart over echarts/uplot - Broad support of chart types (cartesian, polar, hierarchy, graph, force, geo) - Simplicity in setup and customization (composable chart components) - Responsive charts, both for viewport/container, and also theming (light/dark, etc) - Flexibility in design/stying (CSS variables, classes, color scales, etc) including transitions - Ability to opt into canvas or svg rendering context as the use case requires. LayerChart's canvas support also has CSS variable/styling support as well (which is unique as far as I'm aware). Html layers are also available, which are great for multiline text (with truncation). ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced new interactive charts, including `FeaturesLineChart` and `PercentileLineChart`, enhancing data visualization with detailed tooltips. - **Refactor** - Replaced legacy ECharts components with LayerChart components, streamlining chart interactions and state management. - **Chores** - Updated dependency configurations and Tailwind CSS settings for improved styling and performance. - Removed unused ECharts-related components and functions to simplify the codebase.  --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209163154826936  --------- Co-authored-by: Sean Lynch <[email protected]>

## Summary Allows connecting to `app` docker container on `localhost:5005` via IntelliJ. See [tutorial](https://www.jetbrains.com/help/idea/tutorial-remote-debug.html) for more details, but the summary is - Recreate the docker containers (`docker-init/build.sh --all`) - This will pass the `-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005` CLI arguments and expose `5005` on the host (i.e. your laptop) - You will see `Listening for transport dt_socket at address: 5005` in the container logs ![image](https://github.com/user-attachments/assets/b18c3965-43d2-4df9-8c61-302a19eecdbe) - In IntelliJ, add a new `Remote JVM Debug` config with default settings ![image](https://github.com/user-attachments/assets/13f102c0-97b6-4e3d-a957-92854c994083) - Set a breakpoint and then trigger it (from the frontend or calling endpoints directly) ![image](https://github.com/user-attachments/assets/658bc0b5-9f41-4cc3-89a2-b74a62fa43fc) This has been useful to understand the [java.lang.NullPointerException](https://app.asana.com/0/home/1208932362205799/1209321714844239) error when fetching summaries for some columns/features (ex. `dim_merchant_account_type`, `dim_merchant_country`) ![image](https://github.com/user-attachments/assets/3def9c86-98d7-42b4-a7e3-74b89a84d81c) ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  Co-authored-by: Sean Lynch <[email protected]>

## Summary Added a new workflow to verify the bazel config setup. This essentially validates all our bazel config by pulling all the necessary dependencies for all targets with out actually building them which is similar to the `Sync project` option in IntelliJ. This should help us capture errors in our bazel config setup for the CI. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - Chores - Added a new automated CI workflow to run configuration tests with improved concurrency management. - Expanded CI triggers to include additional modules for broader testing coverage. - Tests - Temporarily disabled a dynamic class loading test in the cloud integrations to improve overall test stability.

## Summary The codebase previously operated under the assumption that partition listing is a cheap operation. It is not the case for GCS Format - partition listing is expensive on GCS external tables. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Improved error handling to provide clearer messaging when data retrieval issues occur. - **Refactor** - Streamlined internal processing for data partitions, resulting in more consistent behavior. - Enhanced logging during data scanning for easier troubleshooting. - Simplified logic for handling intersected ranges, ensuring consistent definitions. - Reduced the volume of test data for improved performance and resource utilization during tests.

## Summary the default column reader batch size is 4096 - reads that many rows into memory buffer at once. that causes ooms on large columns, for catalyst we only need to read one row at a time. for interactive we set the limit to 16. tested on etsy data. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Enhanced data processing performance by adding an optimized configuration for reading Parquet files in Spark sessions.

## Summary regarding https://app.asana.com/0/1208277377735902/1209321714844239 No longer are nulls put into the histogram array, instead we use the `Constants.magicNullLong`. We will have to filter this out from the charts as empty value in the FE. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced an additional summary retrieval for merchant account types, providing enhanced insights. - **Bug Fixes** - Improved data processing reliability by substituting missing values with default placeholders, ensuring more consistent results. - Enhanced handling of null values in histogram data for more accurate representation.

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Documentation** - Added a new section for connecting remotely to a Java process in Docker using IntelliJ's remote debugging feature. - Clarified commit message formatting instructions for better consistency.   --------- Co-authored-by: Sean Lynch <[email protected]>

## Summary We need to make sure no nulls get passed thru the transpose method. should wrap up this ticket: https://app.asana.com/0/1208277377735902/1209343355165466 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Enhanced error handling ensures that invalid or missing values are consistently processed, leading to more reliable data outcomes. - **Tests** - Added test cases to validate the updated handling of null and non-numeric values, ensuring robustness in data processing.

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced an enhanced drift metric toggle for selecting and exploring drift data. - Updated chart visualizations now accurately display drift column data. - Join page headers have been refreshed to highlight drift series details. - **Bug Fixes** - Resolved issues with data fetching and display related to drift metrics. - **Refactor** - Consolidated data endpoints to focus on drift metrics, replacing older timeseries methods. - Improved sorting and data processing, resulting in a more consistent and reliable display of data distributions.   --------- Co-authored-by: Sean Lynch <[email protected]>

## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - New Features - Introduced the “Inter” font for enhanced typography, now available in both regular and italic styles. The default sans-serif stack was updated to incorporate Inter for a more modern visual experience. - Documentation - Provided comprehensive documentation that includes installation instructions and licensing details for the new font, ensuring clarity on usage rights and guidelines.

…ant cols from left (#577) ## Summary spark optimizations ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced a new configuration option that lets users control time range validation during join backfill operations, providing more consistent data processing. - Added a method for converting partition ranges to time ranges, enhancing flexibility in time range handling. - Added a new test case for data generation and validation during join operations. - **Refactor** - Optimized the way data merging queries are constructed and how time ranges are applied during join operations, ensuring enhanced precision and performance across event data processing. - Simplified method signatures by removing unnecessary parameters, streamlining the process of generating Bloom filters.  --------- Co-authored-by: ezvz <[email protected]>

## Summary Adding step days of 1 to source job ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Data processing is now handled in daily segments, providing more precise and timely results. - **Bug Fixes** - Error messages have been refined to clearly indicate the specific day when a query yields no results, improving clarity during troubleshooting.  --------- Co-authored-by: ezvz <[email protected]>

## Summary Using tableUtils.partitionColumn rather than hardcoded ds ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update Co-authored-by: ezvz <[email protected]>

## Summary Implements the simple label join logic to create a materialized table by joining forward looking partitions from the snapshot table back of labelJoinParts back to join output. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced an updated mechanism for formatting and standardizing label outputs with the addition of `outputLabelTableV2`. - Added a new distributed label join operation with the `LabelJoinV2` class, featuring robust validations, comprehensive error handling, and detailed logging for improved data integration. - Implemented a comprehensive test suite for the `LabelJoinV2` functionality to ensure accuracy and reliability of label joins. - **Updates** - Replaced the existing `LabelJoin` class with the new `LabelJoinV2` class, enhancing the label join process.  --------- Co-authored-by: ezvz <[email protected]>

## Summary Right now, the gcp integration tests do not recognize failures from dataproc and continue running the commands. This has led to the group_by_upload job hanging indefinitely. This change ensures that if the dataproc job state is "ERROR" we fail immediately. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Enhanced job status checking by ensuring that both empty and explicitly error-indicating states trigger consistent failure responses, resulting in more robust job evaluation.

coderabbitai · 2025-04-02T23:36:36Z

Walkthrough

This pull request enhances DataFrame processing in Spark jobs. The MergeJob class is updated with improved join operations, including a unique ID column and new configuration variables. The SourceJob class now logs warnings for empty DataFrames instead of throwing exceptions. Additionally, tests are updated to include extra configuration and a new schema column.

Changes

File(s)	Summary
`spark/src/main/scala/ai/chronon/spark/MergeJob.scala`	Enhanced merge logic: updated imports, added `leftSourceTable`, `requiredColumns`, and `leftDfWithUUID`; modified join foldLeft and updated method signatures for `processJoinedDf` and `padGroupByFields`.
`spark/src/main/scala/ai/chronon/spark/SourceJob.scala`	Updated import to include `logger`; added `producedSomeData` flag; changed empty DataFrame handling to log a warning and assert data presence.
`spark/src/main/scala/ai/chronon/spark/TableUtils.scala`	Introduced new configuration parameters: `carryOnlyRequiredColsFromLeftInJoin` and `ziplineInternalRowIdCol` for join operations.
`spark/src/test/scala/ai/chronon/spark/test/join/ModularJoinTest.scala`	Adjusted SparkSession initialization with extra config; updated schema to add `some_col` and modified SQL queries and grouping accordingly.

Possibly related PRs

Adding verbs for modular join flow to driver #575: The changes in the main PR for MergeJob.scala are related to the new MergeJobRun object introduced in the retrieved PR, as both involve modifications to the MergeJob functionality and its execution context.
fix: thread the table props through properly #576: The changes in the main PR for MergeJob.scala involve significant modifications to how the MergeJob class handles data merging and required columns, while the retrieved PR focuses on restructuring import statements and modifying how table properties are accessed in the same MergeJob.scala file, indicating a direct relationship in terms of code changes.

Suggested reviewers

varant-zlai
nikhil-zlai
piyush-zlai
david-zlai

Poem

In code we trust, our logic set tight,
Joins now dance with UUIDs in flight.
Warnings whisper where emptiness hides,
Tests affirm schema as change abides.
Sparked by change, our efforts shine bright!
🚀 Happy merging under CodeRabbit’s light!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0a9d7e6 and dadb417.

📒 Files selected for processing (1)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (7 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (1)

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (3)

JoinUtils (37-618)

computeLeftSourceTableName (604-617)

leftDf (68-95)

⏰ Context from checks skipped due to timeout of 90000ms (14)

GitHub Check: streaming_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: streaming_tests
GitHub Check: fetcher_tests
GitHub Check: join_tests
GitHub Check: analyzer_tests
GitHub Check: analyzer_tests
GitHub Check: spark_tests
GitHub Check: groupby_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (13)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (13)

3-3: Added DataModel import for Events type detection.

Added import enables the data model-specific column selection.

32-32: Added monotonically_increasing_id import for row tracking.

Needed for UUID column generation in the optimization.

51-52: Good refactoring of leftSourceTable.

Extracted variable improves readability.

69-75: Efficient required columns computation.

Smart calculation of minimal column set based on join keys and data model.

84-88: Added row ID for optimization.

Using monotonically_increasing_id creates unique identifiers to rejoin data later.

89-104: Smart optimization flag handling.

Only applies optimization when beneficial. Good logging.

106-114: Conditional DataFrame selection.

Uses minimal columns when beneficial, full dataset otherwise.

119-119: Updated fold operation with optimized DataFrame.

Uses the filtered DataFrame in join operations.

130-135: Pass column names instead of DataFrame.

Modified to work with column selection optimization.

136-146: Smart rejoining of non-required columns.

Efficiently brings back columns filtered out during join process.

147-148: Updated save to use optimized DataFrame.

Ensures final output includes all columns.

239-248: Updated padGroupByFields to handle column filtering.

Added parameter prevents padding columns that will be rejoined later.

269-276: Updated processJoinedDf signature and implementation.

Takes column names instead of DataFrame, maintaining functionality while supporting optimization.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

tchow-zlai · 2025-04-02T23:45:01Z

spark/src/main/scala/ai/chronon/spark/MergeJob.scala

+        requiredColumnsDf
+      } else {
+        // Else we just need the base DF without the UUID
+        leftDfWithUUID


doesn't this include the UUID?

Co-authored-by: Thomas Chow <[email protected]>

## Summary Right now, the gcp integration tests do not recognize failures from dataproc and continue running the commands. This has led to the group_by_upload job hanging indefinitely. This change ensures that if the dataproc job state is "ERROR" we fail immediately. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Enhanced job status checking by ensuring that both empty and explicitly error-indicating states trigger consistent failure responses, resulting in more robust job evaluation.

Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

tchow-zlai · 2025-04-03T06:59:24Z

spark/src/main/scala/ai/chronon/spark/MergeJob.scala

  def run(): Unit = {
    val leftDfForSchema = tableUtils.scanDf(query = null, table = leftInputTable, range = Some(dateRange))
    val leftSchema = leftDfForSchema.schema
    val bootstrapInfo =


btw, can bootstrapInfo be computed per stepRange? I'm wondering if we should just push this logic into the dateRange.steps iteration below.

Co-authored-by: Thomas Chow <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (3)

79-82: Bootstrap cost might be high.

91-107: Limit log size if sets are large.

121-121: Consider caching partialDf.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 785d777 and 0a9d7e6.

📒 Files selected for processing (1)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (6 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (1)

spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (3)

BootstrapInfo (51-74)

BootstrapInfo (76-346)

from (80-345)

⏰ Context from checks skipped due to timeout of 90000ms (14)

GitHub Check: streaming_tests
GitHub Check: streaming_tests
GitHub Check: spark_tests
GitHub Check: join_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: groupby_tests
GitHub Check: analyzer_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (14)

spark/src/main/scala/ai/chronon/spark/MergeJob.scala (14)

3-3: No conflicts observed.

32-32: No issues found.

51-56: Logic is clear.

65-71: Confirm entity vs. event columns.

86-86: UUID could be expensive.

88-90: Clean approach to limit columns.

108-112: Implementation looks solid.

132-134: No concerns.

136-148: Inner join drops unmatched rows; verify.

241-243: All good here.

244-244: No problems noted.

247-249: Verify no user fields are skipped.

271-271: Simplified signature is good.

274-277: Confirm custom filtering.

## Summary Update Integration Tests to update aws-crew and gcp-crew for integration test failures. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Added automated Slack notifications that alert the team immediately upon integration test failures in both AWS and GCP environments. - **Chores** - Updated notification settings to ensure alerts are routed to the correct Slack channel for effective troubleshooting.

… submission logic (#540) ## Summary Created PubSub interfaces with GCP implementation. Updated NodeExecutionActivity implementation for job submission using PubSub publisher to publish messages for agent to pick up. Added necessary unit and integration tests ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced a robust Pub/Sub module with administration, configuration, publisher, subscriber, and manager components. The module now supports asynchronous job submission with improved error handling and offers flexible production/emulator configurations. - **Documentation** - Added comprehensive usage guidance with examples for the Pub/Sub integration. - **Tests** - Expanded unit and integration tests to cover all Pub/Sub operations, including message publishing, retrieval, and error handling scenarios. - **Chores** - Updated dependency versions and build configurations, including new Maven artifacts for enhanced Google Cloud API support. Updated git ignore rules to exclude additional directories. - **Refactor** - Streamlined the persistence layer by removing obsolete key definitions.

## Summary ^^^ Ran both AWS and GCP integration tests. Following this PR, i'll work on rebasing #549 which applies the confs to the submitter interface. Previously, to compile it would be: ``` zipline compile --conf=group_bys/aws/purchases.py ``` but now `--conf` or `--input-path` is not required, just: ``` zipline compile ``` and this compiles the entire repo. In addition, the compiled files gets persisted in the `compiled` folder. No longer are compiled files persisted by environment like `production/...` ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [x] Integration tested - [ ] Documentation update GCP and AWS integration tests pass ``` <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --chronon_root=/Users/davidhan/zipline/chronon/api/python/test/canary 17:49:15 INFO parse_teams.py:56 - Processing teams from path /Users/davidhan/zipline/chronon/api/python/test/canary/teams.py INFO:ai.chronon.cli.logger:Processing teams from path /Users/davidhan/zipline/chronon/api/python/test/canary/teams.py GroupBy-s: Compiled 5 objects from 3 files. Added aws.plaid_fv.v1 Added aws.purchases.v1_dev Added aws.purchases.v1_test Added gcp.purchases.v1_dev Added gcp.purchases.v1_test .... <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> + aws emr wait step-complete --cluster-id j-13BASWFP15TLR --step-id s-02878283D08B3510ZGCZ ++ aws emr describe-step --cluster-id j-13BASWFP15TLR --step-id s-02878283D08B3510ZGCZ --query Step.Status.State ++ tr -d '"' + STEP_STATE=COMPLETED + '[' COMPLETED '!=' COMPLETED ']' + echo succeeded succeeded + echo -e '\033[0;32m<<<<<.....................................SUCCEEDED!!!.....................................>>>>>\033[0m' <<<<<.....................................SUCCEEDED!!!.....................................>>>>> ```  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced an enhanced CLI compile command with an option to specify a custom configuration root. - Enabled dynamic loading of team configurations from both JSON and Python formats. - Added support for multiple error reporting in various components. - **Improvements** - Streamlined error and status messages for clearer, simplified feedback. - Enhanced error handling mechanisms across the application. - Updated quickstart scripts for AWS and GCP to utilize a unified, precompiled configuration set. - Modified the way environment variables are structured for improved clarity and management. - **Dependency Updates** - Added new dependencies to improve functionality. - Upgraded several existing dependencies to boost performance and enhance output formatting.  --------- Co-authored-by: Nikhil Simha <[email protected]>

Co-authored-by: Thomas Chow <[email protected]>

david-zlai and others added 30 commits January 31, 2025 15:07

Add jvm_binary targets for service and hub modules (#326)

a708fce

## Summary Add jvm_binary targets for service and hub modules to build final assembly jars for deployment ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update

varant-zlai and others added 7 commits April 1, 2025 22:17

WIP

d01cf74

WIP

a604472

tchow-zlai reviewed Apr 2, 2025

View reviewed changes

tchow-zlai and others added 10 commits April 2, 2025 17:10

re-alias

6e97dcd

Co-authored-by: Thomas Chow <[email protected]>

chore:remove flake8 (#583)

bc60a73

chore:remove flake8 (#583)

c5a9662

WIP

704f2ee

WIP

6c95f58

update

ee5d087

Co-authored-by: Thomas Chow <[email protected]>

Merge branch 'main' into vz--add_fat_column_lugging_to_join

22fabb0

disamb

85f1b71

Co-authored-by: Thomas Chow <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

fmt

785d777

Co-authored-by: Thomas Chow <[email protected]>

tchow-zlai reviewed Apr 3, 2025

View reviewed changes

persist

0a9d7e6

Co-authored-by: Thomas Chow <[email protected]>

coderabbitai bot reviewed Apr 3, 2025

View reviewed changes

chewy-zlai and others added 6 commits April 3, 2025 14:33

remove persist

1f5bc25

Co-authored-by: Thomas Chow <[email protected]>

revert

f30bea4

Co-authored-by: Thomas Chow <[email protected]>

Merge branch 'main' into vz--add_fat_column_lugging_to_join

dadb417

kumar-zlai closed this May 1, 2025

kumar-zlai force-pushed the main branch from a969d44 to e6f8822 Compare May 1, 2025 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark opt: only carry required columns for join #581

Spark opt: only carry required columns for join #581

Uh oh!

varant-zlai commented Apr 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2025 •

edited

Loading

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

tchow-zlai Apr 2, 2025

Uh oh!

tchow-zlai Apr 3, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Spark opt: only carry required columns for join #581

Spark opt: only carry required columns for join #581

Uh oh!

Conversation

varant-zlai commented Apr 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

tchow-zlai Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

tchow-zlai Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

varant-zlai commented Apr 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 2, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

tchow-zlai Apr 3, 2025 •

edited

Loading