Connect `fetch` with run.py + Driver for GCP #276

david-zlai · 2025-01-26T23:53:33Z

Summary

Setup:

Download spark and set SPARK_HOME.

wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz
tar -xzf spark-3.5.4-bin-hadoop3.tgz

In your ~/.bashrc or ~/.zshrc add SPARK_HOME env var

export SPARK_HOME=/Users/nikhilsimha/spark-3.5.4-bin-hadoop3                      # choose dir where you unpacked spark

The reason for setting spark for the moment here is because of the way Driver.scala is setup. It's a big melting pot of spark subcommands and others. Because of the other spark sub commands, we have to have spark here.

Test:

(dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/do_fetch_2) $ echo $RUN_PY
/Users/davidhan/zipline/chronon/api/py/ai/chronon/repo/run.py
(dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/do_fetch_2) $ echo $SPARK_HOME
/Users/davidhan/zipline/chronon/spark-3.5.4-bin-hadoop3
(dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/do_fetch_2) $ python $RUN_PY --mode fetch --type group-by --name quickstart/purchases.v1 -k '{"user_id":"5"}'
Running with args: {'mode': 'fetch', 'conf': None, 'env': 'dev', 'dataproc': False, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None}
Setting env variables:
From <common_env> setting VERSION=latest
From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit
From <common_env> setting JOB_MODE=local[*]
From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing
From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class
From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>
From <common_env> setting PARTITION_COLUMN=ds
From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd
From <common_env> setting CUSTOMER_ID=canary
From <common_env> setting GCP_PROJECT_ID=canary-443022
From <common_env> setting GCP_REGION=us-central1
From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster
From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance
From <cli_args> setting APP_NAME=chronon_fetch
From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Downloading chronon gcp jar from GCS...
Downloaded storage object jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar from bucket zipline-artifacts-canary to local file /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmp9ejwzuc_/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar.
Running command: java -cp /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmp9ejwzuc_/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:/Users/davidhan/zipline/chronon/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.spark.Driver fetch --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl  --conf-type=group_bys  --type group-by --name quickstart/purchases.v1 -k {"user_id":"5"}
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
--- [FETCHED RESULT] ---
{
  "purchase_price_average_14d" : 72.5,
  "purchase_price_average_30d" : 250.6,
  "purchase_price_average_3d" : null,
  "purchase_price_count_14d" : 2,
  "purchase_price_count_30d" : 5,
  "purchase_price_count_3d" : null,
  "purchase_price_last10" : [ 76, 69, 367, 466, 275 ],
  "purchase_price_sum_14d" : 145,
  "purchase_price_sum_30d" : 1253,
  "purchase_price_sum_3d" : null
}

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

Configuration Updates
- Updated GCP Dataproc cluster name
- Modified namespace settings in test configuration
Runtime Improvements
- Enhanced JAR download and management functionality
- Improved runtime configuration handling
- Added new methods for JAR retrieval
Fetcher Enhancements
- Updated fetcher initialization process
- Added result printing for better visibility

coderabbitai · 2025-01-26T23:53:40Z

Walkthrough

The pull request introduces modifications to multiple files across the Chronon project. In run.py, the changes focus on enhancing JAR download mechanisms, runtime configuration handling, and job management. The teams.json file sees minor configuration updates for GCP Dataproc cluster and namespace settings. In Driver.scala, the fetcher initialization and result reporting process has been refined, with a new approach to building fetchers and improved result visibility.

Changes

File	Change Summary
`api/py/ai/chronon/repo/run.py`	- Added `conf_type` parameter to `Runner` class - Renamed `download_dataproc_jar` to `download_dataproc_submitter_jar` - Added `download_chronon_gcp_jar` method - Updated JAR download and runtime environment logic
`api/py/test/sample/teams.json`	- Updated `GCP_DATAPROC_CLUSTER_NAME` to "zipline-canary-cluster" - Changed `namespace` to "canary-443022.data"
`spark/src/main/scala/ai/chronon/spark/Driver.scala`	- Modified fetcher initialization in `FetcherCli` - Added formatted result printing

Possibly related PRs

Connect run.py to DataprocSubmitter.scala so that offline jobs can be run on Dataproc #186: Dataproc job submission enhancements
Connect GroupByUploadToKVBulkLoad from Driver.scala to run.py #221: Key-value store data upload integration
Only download the dataproc jar if on dataproc mode #235: Conditional Dataproc JAR download logic

Suggested Reviewers

piyush-zlai
nikhil-zlai

Poem

In the realm of code, where bits dance free,
JAR downloads sing a new melody 🎵
Clusters shift, configurations bright,
Chronon's magic takes playful flight! ✨
Refactored paths, a developer's glee 🚀

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

varant-zlai · 2025-01-27T00:36:17Z

api/py/ai/chronon/repo/run.py

+            else:
+                # Always download the jar for now so that we can pull
+                # in any fixes or latest changes
+                # if self.dataproc:


Is this meant to be commented out?

api/py/ai/chronon/repo/run.py

varant-zlai

Confirming was able to get this to work locally

## Summary Manual cherry-pick from https://github.com/airbnb/chronon/pull/909/files Should be rebased and tested off of #276 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update - [x] Not yet tested, should test after merging 276  ## Summary by CodeRabbit - **New Features** - Added a builder pattern for `JavaFetcher` to enhance object creation flexibility. - Introduced an optional `ExecutionContext` parameter across multiple classes to improve execution context management. - **Improvements** - Updated constructor signatures in Java and Scala classes to support more configurable execution contexts. - Enhanced instantiation process with more structured object creation methods.  --------- Co-authored-by: ezvz <[email protected]> Co-authored-by: Nikhil Simha <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

api/py/ai/chronon/repo/run.py (1)

415-416: Consider reducing condition complexity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c0f1645 and a566189.

📒 Files selected for processing (3)

api/py/ai/chronon/repo/run.py (10 hunks)
api/py/test/sample/teams.json (2 hunks)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2 hunks)

🔇 Additional comments (13)

api/py/ai/chronon/repo/run.py (9)

208-208: All good.

411-411: Looks fine.

452-458: SPARK_HOME logic looks fine.

474-490: Remove commented-out code if not needed.

491-547: No issues spotted.

Line range hint 548-654: Line 550 is commented out; remove if unneeded.

655-664: Parallel mode looks fine.

665-665: Single call usage is fine.

900-908: Temp directory logic is clear.

spark/src/main/scala/ai/chronon/spark/Driver.scala (2)

693-693: Shifting to args.api is OK.

726-727: Neat alignment of result logging.

api/py/test/sample/teams.json (2)

18-18: Dataproc cluster name updated successfully.

64-64: Namespace change looks fine.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/run.py (2)

414-416: Enhance JAR validation logic.

Consider extracting the complex condition into a descriptive method for better readability.

-        if (self.mode in ONLINE_MODES) and (not args["sub_help"] and not self.dataproc) and not valid_jar and (
-                args.get("online_jar_fetch")):
+        def should_download_online_jar():
+            return (self.mode in ONLINE_MODES and 
+                    not args["sub_help"] and 
+                    not self.dataproc and 
+                    not valid_jar and 
+                    args.get("online_jar_fetch"))
+
+        if should_download_online_jar():

808-819: Enhance error messages in GCP JAR download.

Add more context to error messages to help troubleshoot GCP-specific issues.

 def download_chronon_gcp_jar(destination_dir: str, customer_id: str):
-    print("Downloading chronon gcp jar from GCS...")
+    print(f"Downloading chronon GCP JAR from gs://{bucket_name}/{source_blob_name}...")
     bucket_name = f"zipline-artifacts-{customer_id}"
     file_name = "cloud_gcp-assembly-0.1.0-SNAPSHOT.jar"
     source_blob_name = f"jars/{file_name}"
     chronon_gcp_jar_destination_path = f"{destination_dir}/{file_name}"
     download_gcs_blob(bucket_name, source_blob_name,
                      chronon_gcp_jar_destination_path)
     return chronon_gcp_jar_destination_path

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a566189 and 0bae733.

📒 Files selected for processing (3)

api/py/ai/chronon/repo/run.py (10 hunks)
api/py/test/sample/teams.json (2 hunks)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

api/py/test/sample/teams.json
spark/src/main/scala/ai/chronon/spark/Driver.scala

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: no_spark_scala_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (2)

api/py/ai/chronon/repo/run.py (2)

474-474: Remove commented code.

The commented line appears to be debugging code. Either remove it or add a clear comment explaining its purpose.

453-458: Validate SPARK_HOME path more thoroughly.

The current validation only checks if the jars directory exists. Consider validating essential JAR files too.

coderabbitai · 2025-01-30T18:29:23Z

api/py/ai/chronon/repo/run.py

+                            # command = f"java -cp {dataproc_jar} {DATAPROC_ENTRY} {dataproc_command}"
+                            command = f"java -cp {self.jar_path} {DATAPROC_ENTRY} {dataproc_command}"


🛠️ Refactor suggestion

Eliminate command generation duplication.

The Dataproc command generation is duplicated. Consider extracting it into a helper method.

+ def _build_dataproc_command(self, dataproc_command: str) -> str: + return f"java -cp {self.jar_path} {DATAPROC_ENTRY} {dataproc_command}" + - command = f"java -cp {self.jar_path} {DATAPROC_ENTRY} {dataproc_command}" + command = self._build_dataproc_command(dataproc_command)

Also applies to: 651-652

coderabbitai · 2025-01-30T18:29:23Z

api/py/ai/chronon/repo/run.py

+                running_app_map = {}
+                for app in running_apps:
+                    try:
+                        app_json = json.loads(app.strip())
+                        app_name = app_json["app_name"].strip()
+                        if app_name not in running_app_map:
+                            running_app_map[app_name] = []
+                        running_app_map[app_name].append(app_json)
+                    except Exception as ex:
+                        print("failed to process line into app: " + app)
+                        print(ex)


🛠️ Refactor suggestion

Improve error handling in app processing.

The exception handling swallows errors silently. Consider logging the full stack trace for debugging.

running_app_map = {} for app in running_apps: try: app_json = json.loads(app.strip()) app_name = app_json["app_name"].strip() if app_name not in running_app_map: running_app_map[app_name] = [] running_app_map[app_name].append(app_json) except Exception as ex: - print("failed to process line into app: " + app) - print(ex) + logging.error("Failed to process app: %s", app, exc_info=True)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

running_app_map = {}

for app in running_apps:

try:

app_json = json.loads(app.strip())

app_name = app_json["app_name"].strip()

if app_name not in running_app_map:

running_app_map[app_name] = []

running_app_map[app_name].append(app_json)

except Exception as ex:

print("failed to process line into app: " + app)

print(ex)

running_app_map = {}

for app in running_apps:

try:

app_json = json.loads(app.strip())

app_name = app_json["app_name"].strip()

if app_name not in running_app_map:

running_app_map[app_name] = []

running_app_map[app_name].append(app_json)

except Exception as ex:

logging.error("Failed to process app: %s", app, exc_info=True)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

api/py/ai/chronon/repo/run.py (1)

411-416: Improve JAR download condition readability.

The nested conditions make the logic hard to follow.

-        if (self.mode in ONLINE_MODES) and (not args["sub_help"] and not self.dataproc) and not valid_jar and (
-                args.get("online_jar_fetch")):
+        should_download = (
+            self.mode in ONLINE_MODES
+            and not args["sub_help"]
+            and not self.dataproc
+            and not valid_jar
+            and args.get("online_jar_fetch")
+        )
+        if should_download:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0bae733 and e8a79c2.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/run.py (10 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: other_spark_tests
GitHub Check: join_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (5)

api/py/ai/chronon/repo/run.py (5)

453-458: LGTM! Good enhancement for SPARK_HOME integration.

The code properly checks for SPARK_HOME existence and appends jars directory to classpath.

474-534: Improve error handling in streaming job management.

The error handling for app processing is still basic.
-                        print("failed to process line into app: " + app)
-                        print(ex)
+                        logging.error("Failed to process app: %s", app, exc_info=True)
578-584: Eliminate command generation duplication.

The Dataproc command generation logic is duplicated.
+    def _build_dataproc_command(self, user_args: str, local_files: List[str]) -> str:
+        dataproc_command = generate_dataproc_submitter_args(
+            local_files_to_upload_to_gcs=local_files,
+            user_args=user_args
+        )
+        return f"java -cp {self.jar_path} {DATAPROC_ENTRY} {dataproc_command}"
Also applies to: 619-634

892-899: LGTM! Good use of temporary directory.

Using tempfile.TemporaryDirectory ensures proper cleanup.

Line range hint 786-812: Add version validation for GCP jar downloads.

The hardcoded version "0.1.0-SNAPSHOT" should be validated.

✅ Verification successful

Version "0.1.0-SNAPSHOT" is the project's standard development version

The version is correctly defined in version.sbt and consistently used across the project's build system and deployment scripts.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if the jar version exists in GCS
gsutil ls gs://zipline-artifacts-*/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar
Length of output: 178

Script:
#!/bin/bash
# Search for jar version references
rg -g '!*.jar' "0.1.0-SNAPSHOT"

# Look for build files that might define versions
fd "build.sbt|pom.xml|build.gradle" --type f
Length of output: 6627

## Summary Manual cherry-pick from https://github.com/airbnb/chronon/pull/909/files Should be rebased and tested off of #276 ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update - [x] Not yet tested, should test after merging 276  ## Summary by CodeRabbit - **New Features** - Added a builder pattern for `JavaFetcher` to enhance object creation flexibility. - Introduced an optional `ExecutionContext` parameter across multiple classes to improve execution context management. - **Improvements** - Updated constructor signatures in Java and Scala classes to support more configurable execution contexts. - Enhanced instantiation process with more structured object creation methods.  --------- Co-authored-by: ezvz <[email protected]> Co-authored-by: Nikhil Simha <[email protected]> Co-authored-by: Thomas Chow <[email protected]>

## Summary ### Setup: Download spark and set SPARK_HOME. > wget https://www.apache.org/dyn/closer.lua/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz > tar -xzf spark-3.5.4-bin-hadoop3.tgz In your ~/.bashrc or ~/.zshrc add SPARK_HOME env var ``` export SPARK_HOME=/Users/nikhilsimha/spark-3.5.4-bin-hadoop3 # choose dir where you unpacked spark ``` The reason for setting spark for the moment here is because of the way Driver.scala is setup. It's a big melting pot of spark subcommands and others. Because of the other spark sub commands, we have to have spark here. ### Test: ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/do_fetch_2) $ echo $RUN_PY /Users/davidhan/zipline/chronon/api/py/ai/chronon/repo/run.py (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/do_fetch_2) $ echo $SPARK_HOME /Users/davidhan/zipline/chronon/spark-3.5.4-bin-hadoop3 (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/do_fetch_2) $ python $RUN_PY --mode fetch --type group-by --name quickstart/purchases.v1 -k '{"user_id":"5"}' Running with args: {'mode': 'fetch', 'conf': None, 'env': 'dev', 'dataproc': False, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_fetch From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Downloading chronon gcp jar from GCS... Downloaded storage object jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar from bucket zipline-artifacts-canary to local file /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmp9ejwzuc_/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar. Running command: java -cp /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmp9ejwzuc_/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar:/Users/davidhan/zipline/chronon/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.spark.Driver fetch --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --type group-by --name quickstart/purchases.v1 -k {"user_id":"5"} WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. --- [FETCHED RESULT] --- { "purchase_price_average_14d" : 72.5, "purchase_price_average_30d" : 250.6, "purchase_price_average_3d" : null, "purchase_price_count_14d" : 2, "purchase_price_count_30d" : 5, "purchase_price_count_3d" : null, "purchase_price_last10" : [ 76, 69, 367, 466, 275 ], "purchase_price_sum_14d" : 145, "purchase_price_sum_30d" : 1253, "purchase_price_sum_3d" : null } ``` ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Configuration Updates** - Updated GCP Dataproc cluster name - Modified namespace settings in test configuration - **Runtime Improvements** - Enhanced JAR download and management functionality - Improved runtime configuration handling - Added new methods for JAR retrieval - **Fetcher Enhancements** - Updated fetcher initialization process - Added result printing for better visibility