-
Notifications
You must be signed in to change notification settings - Fork 9
Add script to run full zipline quickstart suite #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces two new scripts, Changes
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
6d35a1b to
ee99668
Compare
68b8740 to
d97daf4
Compare
ee99668 to
939858b
Compare
939858b to
c44606c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
distribution/run_zipline_quickstart.py (1)
10-11: Improve line continuation style.Use parentheses for line continuation.
- quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)) - , "run_zipline_quickstart.sh") + quickstart_sh = os.path.join( + os.path.dirname(os.path.realpath(__file__)), + "run_zipline_quickstart.sh" + )distribution/run_zipline_quickstart.sh (4)
10-11: Remove unused color variable.RED variable is defined but never used.
GREEN='\033[0;32m' -RED='\033[0;31m'🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
25-25: Fix environment variable assignments.Split declaration and assignment to avoid masking return values.
- export SPARK_HOME=$(pwd)/spark-3.5.4-bin-hadoop3 + SPARK_HOME=$(pwd)/spark-3.5.4-bin-hadoop3 + export SPARK_HOME -export PYTHONPATH="${PYTHONPATH}:$(pwd)" +PYTHONPATH_NEW="${PYTHONPATH}:$(pwd)" +export PYTHONPATH="$PYTHONPATH_NEW"Also applies to: 47-47
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 25-25: Declare and assign separately to avoid masking return values.
(SC2155)
13-13: Extract configuration values.Move hard-coded values to configuration variables at the top.
+# Configuration +WHEEL_VERSION="0.1.0.dev0" +PROJECT_ID="canary-443022" +DATASET="data" +GIT_BRANCH="davidhan/canary" + -WHEEL_FILE="zipline_ai-0.1.0.dev0-py3-none-any.whl" +WHEEL_FILE="zipline_ai-${WHEEL_VERSION}-py3-none-any.whl" -bq rm -f -t canary-443022:data.quickstart_purchases_v1_test -bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload +bq rm -f -t "${PROJECT_ID}:${DATASET}.quickstart_purchases_v1_test" +bq rm -f -t "${PROJECT_ID}:${DATASET}.quickstart_purchases_v1_test_upload" -git fetch origin davidhan/canary -git checkout davidhan/canary +git fetch origin "${GIT_BRANCH}" +git checkout "${GIT_BRANCH}"Also applies to: 16-17, 33-34
18-18: Address TODO comment about bigtable rows.Implement bigtable row deletion or create a tracking issue.
Would you like me to help implement the bigtable row deletion or create a tracking issue?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
distribution/run_zipline_quickstart.py(1 hunks)distribution/run_zipline_quickstart.sh(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 25-25: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 30-30: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 47-47: Declare and assign separately to avoid masking return values.
(SC2155)
| def main(): | ||
| with tempfile.TemporaryDirectory() as temp_dir: | ||
| print(f"Created temporary directory: {temp_dir}") | ||
|
|
||
| quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)) | ||
| , "run_zipline_quickstart.sh") | ||
|
|
||
| # run the bash script run_zipline_quickstart.sh subprocess command | ||
| # with the temporary directory as the argument | ||
| subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for subprocess execution.
Add try-except block to handle subprocess failures gracefully.
def main():
with tempfile.TemporaryDirectory() as temp_dir:
print(f"Created temporary directory: {temp_dir}")
quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)),
"run_zipline_quickstart.sh")
- subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True)
+ try:
+ subprocess.run(["bash", quickstart_sh, temp_dir], check=True)
+ except subprocess.CalledProcessError as e:
+ print(f"Script execution failed with exit code {e.returncode}")
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def main(): | |
| with tempfile.TemporaryDirectory() as temp_dir: | |
| print(f"Created temporary directory: {temp_dir}") | |
| quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)) | |
| , "run_zipline_quickstart.sh") | |
| # run the bash script run_zipline_quickstart.sh subprocess command | |
| # with the temporary directory as the argument | |
| subprocess.run([f"bash {quickstart_sh} {temp_dir}"], shell=True) | |
| def main(): | |
| with tempfile.TemporaryDirectory() as temp_dir: | |
| print(f"Created temporary directory: {temp_dir}") | |
| quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)), | |
| "run_zipline_quickstart.sh") | |
| # run the bash script run_zipline_quickstart.sh subprocess command | |
| # with the temporary directory as the argument | |
| try: | |
| subprocess.run(["bash", quickstart_sh, temp_dir], check=True) | |
| except subprocess.CalledProcessError as e: | |
| print(f"Script execution failed with exit code {e.returncode}") | |
| raise |
|
|
||
|
|
||
| WORKING_DIR=$1 | ||
| cd $WORKING_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for cd commands.
Add error handling for directory changes.
-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
-cd cananry-confs
+cd cananry-confs || exit 1Also applies to: 30-30
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
60ab641 to
ca1f98d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
distribution/run_zipline_quickstart.sh (3)
11-11: Remove unused RED color variable.The RED color variable is defined but never used.
-RED='\033[0;31m'🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
18-18: Address TODO comment.The TODO comment about deleting bigtable rows needs to be implemented.
Would you like me to help implement the bigtable row deletion?
39-39: Declare and assign PYTHONPATH separately.Split declaration and assignment to avoid masking return values.
-export PYTHONPATH="${PYTHONPATH}:$(pwd)" +old_path="${PYTHONPATH}" +export PYTHONPATH="${old_path}:$(pwd)"🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 39-39: Declare and assign separately to avoid masking return values.
(SC2155)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
distribution/run_zipline_quickstart.py(1 hunks)distribution/run_zipline_quickstart.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- distribution/run_zipline_quickstart.py
🧰 Additional context used
📓 Learnings (1)
distribution/run_zipline_quickstart.sh (1)
Learnt from: david-zlai
PR: zipline-ai/chronon#292
File: distribution/run_zipline_quickstart.sh:72-75
Timestamp: 2025-01-30T19:37:27.646Z
Learning: The `gcloud dataproc jobs wait` command is a blocking call that waits for job completion, so adding an external timeout wrapper is unnecessary.
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 39-39: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (5)
distribution/run_zipline_quickstart.sh (5)
8-8: Add error handling for cd command.Add error handling for directory change.
-cd $WORKING_DIR +cd "$WORKING_DIR" || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
22-22: Add error handling for cd command.Add error handling for directory change.
-cd cananry-confs +cd cananry-confs || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
42-56: LGTM! Job state checking implementation is solid.Good error handling and proper use of gcloud dataproc jobs wait.
60-97: LGTM! Job execution flow is well-structured.Good use of logging and proper job state verification.
99-99: LGTM! Clear success indication.
| #TODO: delete bigtable rows | ||
|
|
||
| # Clone the cananry configs | ||
| git clone [email protected]:zipline-ai/cananry-confs.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an rm -rf cananry-confs before this? otherwise you just cd into it if it exists and git fetch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so the way the integration test should run is via the python file which creates the temp directory first that'll be used to do the full installation and runs.
temp directory should have nothing in it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
distribution/run_zipline_quickstart.sh (2)
8-8:⚠️ Potential issueAdd error handling for directory change.
-cd $WORKING_DIR +cd "$WORKING_DIR" || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
22-22:⚠️ Potential issueAdd error handling for directory change.
-cd cananry-confs +cd cananry-confs || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🧹 Nitpick comments (3)
distribution/run_zipline_quickstart.sh (3)
11-11: Remove unused RED variable.The RED color variable is defined but never used.
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
18-18: Implement bigtable rows deletion.I can help implement the missing bigtable cleanup functionality.
39-39: Separate PYTHONPATH declaration and assignment.-export PYTHONPATH="${PYTHONPATH}:$(pwd)" +pwd_dir=$(pwd) +export PYTHONPATH="${PYTHONPATH}:${pwd_dir}"🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 39-39: Declare and assign separately to avoid masking return values.
(SC2155)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
distribution/run_zipline_quickstart.sh(1 hunks)
🧰 Additional context used
📓 Learnings (1)
distribution/run_zipline_quickstart.sh (1)
Learnt from: david-zlai
PR: zipline-ai/chronon#292
File: distribution/run_zipline_quickstart.sh:72-75
Timestamp: 2025-01-30T19:37:27.646Z
Learning: The `gcloud dataproc jobs wait` command is a blocking call that waits for job completion, so adding an external timeout wrapper is unnecessary.
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 39-39: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (3)
distribution/run_zipline_quickstart.sh (3)
42-56: LGTM! Job state checking looks good.The implementation correctly uses gcloud's built-in blocking wait.
60-97: LGTM! Command execution flow is well structured.Good error handling and output verification.
99-99: LGTM! Clear success indication.
## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
## Summary ``` python distribution/run_zipline_quickstart.py ``` This runs the full zipline suite of commands against a test quickstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quickstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test + bq rm -f -t canary-443022:data.quickstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), pack-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git checkout davidhan/canary branch 'davidhan/canary' set up to track 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting click (from zipline-ai==0.1.0.dev0) Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached click-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected packages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, click, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quickstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quickstart/purchases.py GroupBy Team - quickstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1 GroupBy Team - quickstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quickstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_backfill.out + zipline run --conf production/group_bys/quickstart/purchases.v1_test --dataproc + tee /dev/tty tmp_backfill.out Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-backfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-backfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_backfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + check_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quickstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quickstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quickstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quickstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-backfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quickstart.purchases.v1_test_backfill progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quickstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + check_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTracker 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quickstart.quickstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quickstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quickstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quickstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quickstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quickstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED trackingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quickstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quickstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quickstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quickstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quickstart/purchases.v1_test uploaded to metadata/purchases.v1_test in bucket zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + check_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quickstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quickstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kicking off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ {"qualifierString": "value", "fieldName": ""} ] } ] }''' ) AS SELECT CONCAT(CAST(CONCAT('QUICKSTART_PURCHASES_V1_TEST_BATCH', '#') AS BYTES), key_bytes) as rowkey, value_bytes as cf, TIMESTAMP_MILLIS(1701475200000) as _CHANGE_TIMESTAMP FROM canary-443022.data.quickstart_purchases_v1_test_upload WHERE ds = '2023-12-01' 25/01/30 18:18:48 INFO BigTableKVStoreImpl: Export job started with Id: JobId{project=canary-443022, job=export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353, location=null} and link: https://bigquery.googleapis.com/bigquery/v2/projects/canary-443022/jobs/export_canary_443022_data_quickstart_purchases_v1_test_upload_to_bigtable_2023-12-01_1738261127353?location=us-central1 25/01/30 18:18:48 INFO BigTableKVStoreImpl: …
## Summary ``` python distribution/run_zipline_quiour clientsstart.py ``` This runs the full zipline suite of commands against a test quiour clientsstart groupby. Example: ``` davidhan@Davids-MacBook-Pro: ~/zipline/chronon (davidhan/do_fetch_test) $ python3 distribution/run_zipline_quiour clientsstart.py Created temporary directory: /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + WORKING_DIR=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + cd /var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l + GREEN='\033[0;32m' + RED='\033[0;31m' + WHEEL_FILE=zipline_ai-0.1.0.dev0-py3-none-any.whl + bq rm -f -t canary-443022:data.quiour clientsstart_purchases_v1_test + bq rm -f -t canary-443022:data.quiour clientsstart_purchases_v1_test_upload + '[' -z '' ']' + wget https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz --2025-01-30 10:16:21-- https://dlcdn.apache.org/spark/spark-3.5.4/spark-3.5.4-bin-hadoop3.tgz Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132 Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 400879762 (382M) [application/x-gzip] Saving to: ‘spark-3.5.4-bin-hadoop3.tgz’ spark-3.5.4-bin-hadoop3.tgz 100%[==========================================================================================================================================>] 382.31M 50.2MB/s in 8.4s 2025-01-30 10:16:30 (45.5 MB/s) - ‘spark-3.5.4-bin-hadoop3.tgz’ saved [400879762/400879762] + tar -xzf spark-3.5.4-bin-hadoop3.tgz ++ pwd + export SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + SPARK_HOME=/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3 + git clone [email protected]:zipline-ai/cananry-confs.git Cloning into 'cananry-confs'... remote: Enumerating objects: 148, done. remote: Counting objects: 100% (148/148), done. remote: Compressing objects: 100% (77/77), done. remote: Total 148 (delta 63), reused 139 (delta 60), paour clients-reused 0 (from 0) Receiving objects: 100% (148/148), 93.28 KiB | 746.00 KiB/s, done. Resolving deltas: 100% (63/63), done. + cd cananry-confs + git fetch origin davidhan/canary From github.com:zipline-ai/cananry-confs * branch davidhan/canary -> FETCH_HEAD + git cheour clientsout davidhan/canary branch 'davidhan/canary' set up to traour clients 'origin/davidhan/canary'. Switched to a new branch 'davidhan/canary' + python3 -m venv tmp_chronon + source tmp_chronon/bin/activate ++ deactivate nondestructive ++ '[' -n '' ']' ++ '[' -n '' ']' ++ hash -r ++ '[' -n '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ case "$(uname)" in +++ uname ++ export VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ VIRTUAL_ENV=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon ++ _OLD_VIRTUAL_PATH=/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ PATH=/private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/tmp_chronon/bin:/Users/davidhan/.asdf/plugins/python/shims:/Users/davidhan/.asdf/installs/python/3.13.0/bin:/Users/davidhan/Downloads/google-cloud-sdk/bin:/Users/davidhan/.cargo/bin:/Users/davidhan/.asdf/shims:/Users/davidhan/.asdf/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin ++ export PATH ++ VIRTUAL_ENV_PROMPT=tmp_chronon ++ export VIRTUAL_ENV_PROMPT ++ '[' -n '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(tmp_chronon) ' ++ export PS1 ++ hash -r + gcloud storage cp gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl . Copying gs://zipline-artifacts-canary/jars/zipline_ai-0.1.0.dev0-py3-none-any.whl to file://./zipline_ai-0.1.0.dev0-py3-none-any.whl Completed files 1/1 | 371.1kiB/371.1kiB + pip uninstall zipline-ai WARNING: Skipping zipline-ai as it is not installed. + pip install --force-reinstall zipline_ai-0.1.0.dev0-py3-none-any.whl Processing ./zipline_ai-0.1.0.dev0-py3-none-any.whl Collecting cliour clients (from zipline-ai==0.1.0.dev0) Using cached cliour clients-8.1.8-py3-none-any.whl.metadata (2.3 kB) Collecting thrift==0.21.0 (from zipline-ai==0.1.0.dev0) Using cached thrift-0.21.0-cp313-cp313-macosx_15_0_arm64.whl Collecting google-cloud-storage==2.19.0 (from zipline-ai==0.1.0.dev0) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl.metadata (9.1 kB) Collecting google-auth<3.0dev,>=2.26.1 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB) Collecting google-api-core<3.0.0dev,>=2.15.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB) Collecting google-cloud-core<3.0dev,>=2.3.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl.metadata (2.7 kB) Collecting google-resumable-media>=2.7.2 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl.metadata (2.2 kB) Collecting requests<3.0.0dev,>=2.18.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB) Collecting google-crc32c<2.0dev,>=1.0 (from google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached google_crc32c-1.6.0-py3-none-any.whl Collecting six>=1.7.2 (from thrift==0.21.0->zipline-ai==0.1.0.dev0) Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB) Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl.metadata (1.5 kB) Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes) Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached proto_plus-1.26.0-py3-none-any.whl.metadata (2.2 kB) Collecting cachetools<6.0,>=2.0.0 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB) Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB) Collecting rsa<5,>=3.1.4 (from google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB) Collecting charset-normalizer<4,>=2 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB) Collecting idna<4,>=2.5 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached idna-3.10-py3-none-any.whl.metadata (10 kB) Collecting urllib3<3,>=1.21.1 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests<3.0.0dev,>=2.18.0->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB) Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage==2.19.0->zipline-ai==0.1.0.dev0) Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB) Using cached google_cloud_storage-2.19.0-py2.py3-none-any.whl (131 kB) Using cached cliour clients-8.1.8-py3-none-any.whl (98 kB) Using cached google_api_core-2.24.1-py3-none-any.whl (160 kB) Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB) Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Using cached google_resumable_media-2.7.2-py2.py3-none-any.whl (81 kB) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Using cached six-1.17.0-py2.py3-none-any.whl (11 kB) Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB) Using cached certifi-2024.12.14-py3-none-any.whl (164 kB) Using cached charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB) Using cached googleapis_common_protos-1.66.0-py2.py3-none-any.whl (221 kB) Using cached idna-3.10-py3-none-any.whl (70 kB) Using cached proto_plus-1.26.0-py3-none-any.whl (50 kB) Using cached protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl (417 kB) Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB) Using cached rsa-4.9-py3-none-any.whl (34 kB) Using cached urllib3-2.3.0-py3-none-any.whl (128 kB) Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB) Installing collected paour clientsages: urllib3, six, pyasn1, protobuf, idna, google-crc32c, cliour clients, charset-normalizer, certifi, cachetools, thrift, rsa, requests, pyasn1-modules, proto-plus, googleapis-common-protos, google-resumable-media, google-auth, google-api-core, google-cloud-core, google-cloud-storage, zipline-ai Successfully installed cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 cliour clients-8.1.8 google-api-core-2.24.1 google-auth-2.38.0 google-cloud-core-2.4.1 google-cloud-storage-2.19.0 google-crc32c-1.6.0 google-resumable-media-2.7.2 googleapis-common-protos-1.66.0 idna-3.10 proto-plus-1.26.0 protobuf-5.29.3 pyasn1-0.6.1 pyasn1-modules-0.4.1 requests-2.32.3 rsa-4.9 six-1.17.0 thrift-0.21.0 urllib3-2.3.0 zipline-ai-0.1.0.dev0 [notice] A new release of pip is available: 24.2 -> 25.0 [notice] To update, run: pip install --upgrade pip ++ pwd + export PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + PYTHONPATH=:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs + DATAPROC_SUBMITTER_ID_STR='Dataproc submitter job id' + echo -e '\033[0;32m<<<<<.....................................COMPILE.....................................>>>>>\033[0m' <<<<<.....................................COMPILE.....................................>>>>> + zipline compile --conf=group_bys/quiour clientsstart/purchases.py Using chronon root path - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs Input group_bys from - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/group_bys/quiour clientsstart/purchases.py GroupBy Team - quiour clientsstart GroupBy Name - purchases.v1 Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quiour clientsstart/purchases.v1 GroupBy Team - quiour clientsstart GroupBy Name - purchases.v1_test Writing GroupBy to - /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production/group_bys/quiour clientsstart/purchases.v1_test Successfully wrote 2 GroupBy objects to /private/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/cananry-confs/production + echo -e '\033[0;32m<<<<<.....................................BACKFILL.....................................>>>>>\033[0m' <<<<<.....................................BACKFILL.....................................>>>>> + touch tmp_baour clientsfill.out + zipline run --conf production/group_bys/quiour clientsstart/purchases.v1_test --dataproc + tee /dev/tty tmp_baour clientsfill.out Running with args: {'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'mode': None, 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-baour clientsfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-baour clientsfill, --conf-path=purchases.v1_test, --end-date=2025-01-30, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87bDataproc submitter job id: 945d836f-20d8-4768-97fb-0889c00ed87b Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-baour clientsfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-baour clientsfill --conf-path=purchases.v1_test --end-date=2025-01-30 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_baour clientsfill.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + BACKFILL_JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + cheour clients_dataproc_job_state 945d836f-20d8-4768-97fb-0889c00ed87b + JOB_ID=945d836f-20d8-4768-97fb-0889c00ed87b + '[' -z 945d836f-20d8-4768-97fb-0889c00ed87b ']' + gcloud dataproc jobs wait 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 Waiting for job output... 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:47 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/945d836f-20d8-4768-97fb-0889c00ed87b/local_warehouse 25/01/30 18:16:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:16:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:50 INFO SparkEnv: Registering MapOutputTraour clientser 25/01/30 18:16:50 INFO SparkEnv: Registering Bloour clientsManagerMaster 25/01/30 18:16:50 INFO SparkEnv: Registering Bloour clientsManagerMasterHeartbeat 25/01/30 18:16:50 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:16:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:16:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:16:51 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:16:51 INFO Configuration: resource-types.xml not found 25/01/30 18:16:51 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:16:52 INFO YarnClientImpl: Submitted application application_1738197659103_0011 25/01/30 18:16:53 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:16:53 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:16:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:16:55 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0011.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:16:55 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:16:58 ERROR TableUtils.scala:188 - Table canary-443022.data.quiour clientsstart_purchases_v1_test is not reachable. Returning empty partitions. 2025/01/30 18:17:15 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:15 INFO TableUtils.scala:622 - Unfilled range computation: Output table: canary-443022.data.quiour clientsstart_purchases_v1_test Missing output partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30,2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Input tables: data.purchases Missing input partitions: [2023-12-01,2023-12-02,2023-12-03,2023-12-04,2023-12-05,2023-12-06,2023-12-07,2023-12-08,2023-12-09,2023-12-10,2023-12-11,2023-12-12,2023-12-13,2023-12-14,2023-12-15,2023-12-16,2023-12-17,2023-12-18,2023-12-19,2023-12-20,2023-12-21,2023-12-22,2023-12-23,2023-12-24,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-29,2023-12-30,2023-12-31,2024-01-01,2024-01-02,2024-01-03,2024-01-04,2024-01-05,2024-01-06,2024-01-07,2024-01-08,2024-01-09,2024-01-10,2024-01-11,2024-01-12,2024-01-13,2024-01-14,2024-01-15,2024-01-16,2024-01-17,2024-01-18,2024-01-19,2024-01-20,2024-01-21,2024-01-22,2024-01-23,2024-01-24,2024-01-25,2024-01-26,2024-01-27,2024-01-28,2024-01-29,2024-01-30,2024-01-31,2024-02-01,2024-02-02,2024-02-03,2024-02-04,2024-02-05,2024-02-06,2024-02-07,2024-02-08,2024-02-09,2024-02-10,2024-02-11,2024-02-12,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-20,2024-02-21,2024-02-22,2024-02-23,2024-02-24,2024-02-25,2024-02-26,2024-02-27,2024-02-28,2024-02-29,2024-03-01,2024-03-02,2024-03-03,2024-03-04,2024-03-05,2024-03-06,2024-03-07,2024-03-08,2024-03-09,2024-03-10,2024-03-11,2024-03-12,2024-03-13,2024-03-14,2024-03-15,2024-03-16,2024-03-17,2024-03-18,2024-03-19,2024-03-20,2024-03-21,2024-03-22,2024-03-23,2024-03-24,2024-03-25,2024-03-26,2024-03-27,2024-03-28,2024-03-29,2024-03-30,2024-03-31,2024-04-01,2024-04-02,2024-04-03,2024-04-04,2024-04-05,2024-04-06,2024-04-07,2024-04-08,2024-04-09,2024-04-10,2024-04-11,2024-04-12,2024-04-13,2024-04-14,2024-04-15,2024-04-16,2024-04-17,2024-04-18,2024-04-19,2024-04-20,2024-04-21,2024-04-22,2024-04-23,2024-04-24,2024-04-25,2024-04-26,2024-04-27,2024-04-28,2024-04-29,2024-04-30,2024-05-01,2024-05-02,2024-05-03,2024-05-04,2024-05-05,2024-05-06,2024-05-07,2024-05-08,2024-05-09,2024-05-10,2024-05-11,2024-05-12,2024-05-13,2024-05-14,2024-05-15,2024-05-16,2024-05-17,2024-05-18,2024-05-19,2024-05-20,2024-05-21,2024-05-22,2024-05-23,2024-05-24,2024-05-25,2024-05-26,2024-05-27,2024-05-28,2024-05-29,2024-05-30,2024-05-31,2024-06-01,2024-06-02,2024-06-03,2024-06-04,2024-06-05,2024-06-06,2024-06-07,2024-06-08,2024-06-09,2024-06-10,2024-06-11,2024-06-12,2024-06-13,2024-06-14,2024-06-15,2024-06-16,2024-06-17,2024-06-18,2024-06-19,2024-06-20,2024-06-21,2024-06-22,2024-06-23,2024-06-24,2024-06-25,2024-06-26,2024-06-27,2024-06-28,2024-06-29,2024-06-30,2024-07-01,2024-07-02,2024-07-03,2024-07-04,2024-07-05,2024-07-06,2024-07-07,2024-07-08,2024-07-09,2024-07-10,2024-07-11,2024-07-12,2024-07-13,2024-07-14,2024-07-15,2024-07-16,2024-07-17,2024-07-18,2024-07-19,2024-07-20,2024-07-21,2024-07-22,2024-07-23,2024-07-24,2024-07-25,2024-07-26,2024-07-27,2024-07-28,2024-07-29,2024-07-30,2024-07-31,2024-08-01,2024-08-02,2024-08-03,2024-08-04,2024-08-05,2024-08-06,2024-08-07,2024-08-08,2024-08-09,2024-08-10,2024-08-11,2024-08-12,2024-08-13,2024-08-14,2024-08-15,2024-08-16,2024-08-17,2024-08-18,2024-08-19,2024-08-20,2024-08-21,2024-08-22,2024-08-23,2024-08-24,2024-08-25,2024-08-26,2024-08-27,2024-08-28,2024-08-29,2024-08-30,2024-08-31,2024-09-01,2024-09-02,2024-09-03,2024-09-04,2024-09-05,2024-09-06,2024-09-07,2024-09-08,2024-09-09,2024-09-10,2024-09-11,2024-09-12,2024-09-13,2024-09-14,2024-09-15,2024-09-16,2024-09-17,2024-09-18,2024-09-19,2024-09-20,2024-09-21,2024-09-22,2024-09-23,2024-09-24,2024-09-25,2024-09-26,2024-09-27,2024-09-28,2024-09-29,2024-09-30,2024-10-01,2024-10-02,2024-10-03,2024-10-04,2024-10-05,2024-10-06,2024-10-07,2024-10-08,2024-10-09,2024-10-10,2024-10-11,2024-10-12,2024-10-13,2024-10-14,2024-10-15,2024-10-16,2024-10-17,2024-10-18,2024-10-19,2024-10-20,2024-10-21,2024-10-22,2024-10-23,2024-10-24,2024-10-25,2024-10-26,2024-10-27,2024-10-28,2024-10-29,2024-10-30,2024-10-31,2024-11-01,2024-11-02,2024-11-03,2024-11-04,2024-11-05,2024-11-06,2024-11-07,2024-11-08,2024-11-09,2024-11-10,2024-11-11,2024-11-12,2024-11-13,2024-11-14,2024-11-15,2024-11-16,2024-11-17,2024-11-18,2024-11-19,2024-11-20,2024-11-21,2024-11-22,2024-11-23,2024-11-24,2024-11-25,2024-11-26,2024-11-27,2024-11-28,2024-11-29,2024-11-30,2024-12-01,2024-12-02,2024-12-03,2024-12-04,2024-12-05,2024-12-06,2024-12-07,2024-12-08,2024-12-09,2024-12-10,2024-12-11,2024-12-12,2024-12-13,2024-12-14,2024-12-15,2024-12-16,2024-12-17,2024-12-18,2024-12-19,2024-12-20,2024-12-21,2024-12-22,2024-12-23,2024-12-24,2024-12-25,2024-12-26,2024-12-27,2024-12-28,2024-12-29,2024-12-30,2024-12-31,2025-01-01,2025-01-02,2025-01-03,2025-01-04,2025-01-05,2025-01-06,2025-01-07,2025-01-08,2025-01-09,2025-01-10,2025-01-11,2025-01-12,2025-01-13,2025-01-14,2025-01-15,2025-01-16,2025-01-17,2025-01-18,2025-01-19,2025-01-20,2025-01-21,2025-01-22,2025-01-23,2025-01-24,2025-01-25,2025-01-26,2025-01-27,2025-01-28,2025-01-29,2025-01-30] Unfilled Partitions: [2023-11-01,2023-11-02,2023-11-03,2023-11-04,2023-11-05,2023-11-06,2023-11-07,2023-11-08,2023-11-09,2023-11-10,2023-11-11,2023-11-12,2023-11-13,2023-11-14,2023-11-15,2023-11-16,2023-11-17,2023-11-18,2023-11-19,2023-11-20,2023-11-21,2023-11-22,2023-11-23,2023-11-24,2023-11-25,2023-11-26,2023-11-27,2023-11-28,2023-11-29,2023-11-30] Unfilled ranges: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:733 - group by unfilled ranges: List([2023-11-01...2023-11-30]) 2025/01/30 18:17:15 INFO GroupBy.scala:738 - Group By ranges to compute: [2023-11-01...2023-11-30] 2025/01/30 18:17:15 INFO GroupBy.scala:743 - Computing group by for range: [2023-11-01...2023-11-30] [1/1] 2025/01/30 18:17:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quiour clientsstart.purchases.v1_test]---- 2025/01/30 18:17:20 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:17:20 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-11-01...2023-11-30] query window: None source table: data.purchases source data range: [2023-11-01...2023-11-30] source start/end: null/null source data model: Events queryable data range: [null...2023-11-30] intersected range: [2023-11-01...2023-11-30] 2025/01/30 18:17:20 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:17:20 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-11-30]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-11-30') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:17:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-11-30' 2025/01/30 18:17:20 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-11-01 00:00:00 2025/01/30 18:17:20 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:17:22 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:17:25 INFO TableUtils.scala:494 - 2416 rows requested to be written into table canary-443022.data.quiour clientsstart_purchases_v1_test 2025/01/30 18:17:25 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quiour clientsstart_purchases_v1_test by 300 spark tasks into 30 table partitions and 10 files per partition 2025/01/30 18:17:25 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:17:33 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quiour clientsstart_purchases_v1_test 2025/01/30 18:17:33 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quiour clientsstart_purchases_v1_test - start @ 2025-01-30 18:17:22 end @ 2025-01-30 18:17:33 2025/01/30 18:17:33 INFO GroupBy.scala:757 - Wrote to table canary-443022.data.quiour clientsstart_purchases_v1_test, into partitions: [2023-11-01...2023-11-30] 2025/01/30 18:17:33 INFO GroupBy.scala:759 - Wrote to table canary-443022.data.quiour clientsstart_purchases_v1_test for range: [2023-11-01...2023-11-30] Job [945d836f-20d8-4768-97fb-0889c00ed87b] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/945d836f-20d8-4768-97fb-0889c00ed87b/driveroutput jobUuid: 945d836f-20d8-4768-97fb-0889c00ed87b placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: 945d836f-20d8-4768-97fb-0889c00ed87b projectId: canary-443022 sparkJob: args: - group-by-baour clientsfill - --conf-path=purchases.v1_test - --end-date=2025-01-30 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:17:38.722934Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:16:43.326557Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:16:43.353624Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:16:43.597231Z' yarnApplications: - name: groupBy_quiour clientsstart.purchases.v1_test_baour clientsfill progress: 1.0 state: FINISHED traour clientsingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0011/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe 945d836f-20d8-4768-97fb-0889c00ed87b --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>>\033[0m' <<<<<.....................................GROUP-BY-UPLOAD.....................................>>>>> + touch tmp_gbu.out + zipline run --mode upload --conf production/group_bys/quiour clientsstart/purchases.v1_test --ds 2023-12-01 --dataproc + tee /dev/tty tmp_gbu.out Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'ds': '2023-12-01', 'dataproc': True, 'env': 'dev', 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(group-by-upload, --conf-path=purchases.v1_test, --end-date=2023-12-01, --conf-type=group_bys, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Dataproc submitter job id: c672008e-7380-4a82-a121-4bb0cb46503f Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <default_env> setting EXECUTOR_CORES=1 From <default_env> setting EXECUTOR_MEMORY=8G From <default_env> setting PARALLELISM=1000 From <default_env> setting MAX_EXECUTORS=1000 From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter group-by-upload --conf-path=purchases.v1_test --end-date=2023-12-01 --conf-type=group_bys --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_gbu.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + GBU_JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + cheour clients_dataproc_job_state c672008e-7380-4a82-a121-4bb0cb46503f + JOB_ID=c672008e-7380-4a82-a121-4bb0cb46503f + '[' -z c672008e-7380-4a82-a121-4bb0cb46503f ']' + gcloud dataproc jobs wait c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 Waiting for job output... 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:48 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. Using warehouse dir: /tmp/c672008e-7380-4a82-a121-4bb0cb46503f/local_warehouse 25/01/30 18:17:50 INFO HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 25/01/30 18:17:50 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:51 INFO SparkEnv: Registering MapOutputTraour clientser 25/01/30 18:17:51 INFO SparkEnv: Registering Bloour clientsManagerMaster 25/01/30 18:17:51 INFO SparkEnv: Registering Bloour clientsManagerMasterHeartbeat 25/01/30 18:17:51 INFO SparkEnv: Registering OutputCommitCoordinator 25/01/30 18:17:51 INFO DataprocSparkPlugin: Registered 188 driver metrics 25/01/30 18:17:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8032 25/01/30 18:17:52 INFO AHSProxy: Connecting to Application History server at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:10200 25/01/30 18:17:52 INFO Configuration: resource-types.xml not found 25/01/30 18:17:52 INFO ResourceUtils: Unable to find 'resource-types.xml'. 25/01/30 18:17:53 INFO YarnClientImpl: Submitted application application_1738197659103_0012 25/01/30 18:17:54 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:17:54 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal./10.128.0.17:8030 25/01/30 18:17:55 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state. 25/01/30 18:17:56 INFO GoogleHadoopOutputStream: hflush(): No-op due to rate limit (RateLimiter[stableRate=0.2qps]): readers will *not* yet see flushed data for gs://dataproc-temp-us-central1-703996152583-pqtvfptb/5d9e94ed-7649-4828-8b64-e3d58632a5d0/spark-job-history/application_1738197659103_0012.inprogress [CONTEXT ratelimit_period="1 MINUTES" ] 2025/01/30 18:17:56 INFO SparkSessionBuilder.scala:76 - Chronon logging system initialized. Overrides spark's configuration 2025/01/30 18:17:57 INFO GroupByUpload.scala:229 - GroupBy upload for: quiour clientsstart.quiour clientsstart.purchases.v1_test Accuracy: SNAPSHOT Data Model: Events 2025/01/30 18:17:57 INFO GroupBy.scala:492 - ----[Processing GroupBy: quiour clientsstart.purchases.v1_test]---- 2025/01/30 18:18:14 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:14 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:14 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:14 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:14 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:14 INFO HopsAggregator.scala:147 - Left bounds: 1d->unbounded minQueryTs = 2023-12-01 00:00:00 2025/01/30 18:18:14 INFO FastHashing.scala:52 - Generating key builder over keys: bigint : user_id 2025/01/30 18:18:15 INFO KvRdd.scala:102 - key schema: { "type" : "record", "name" : "Key", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "user_id", "type" : [ "null", "long" ], "doc" : "" } ] } value schema: { "type" : "record", "name" : "Value", "namespace" : "ai.chronon.data", "doc" : "", "fields" : [ { "name" : "purchase_price_sum_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_sum_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_3d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_14d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_count_30d", "type" : [ "null", "long" ], "doc" : "" }, { "name" : "purchase_price_average_3d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_14d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_average_30d", "type" : [ "null", "double" ], "doc" : "" }, { "name" : "purchase_price_last10", "type" : [ "null", { "type" : "array", "items" : "long" } ], "doc" : "" } ] } 2025/01/30 18:18:15 INFO GroupBy.scala:492 - ----[Processing GroupBy: quiour clientsstart.purchases.v1_test]---- 2025/01/30 18:18:19 INFO TableUtils.scala:200 - Found 30, between (2023-11-01, 2023-11-30) partitions for table: data.purchases 2025/01/30 18:18:19 INFO GroupBy.scala:618 - Computing intersected range as: query range: [2023-12-01...2023-12-01] query window: None source table: data.purchases source data range: [2023-11-01...2023-12-01] source start/end: null/null source data model: Events queryable data range: [null...2023-12-01] intersected range: [2023-11-01...2023-12-01] 2025/01/30 18:18:19 INFO GroupBy.scala:658 - Time Mapping: Some((ts,ts)) 2025/01/30 18:18:19 INFO GroupBy.scala:668 - Rendering source query: intersected/effective scan range: Some([2023-11-01...2023-12-01]) partitionConditions: List(ds >= '2023-11-01', ds <= '2023-12-01') metaColumns: Map(ds -> null, ts -> ts) 2025/01/30 18:18:20 INFO TableUtils.scala:759 - Scanning data: table: data.purchases options: Map() format: Some(bigquery) selects: `ds` `ts` `user_id` `purchase_price` wheres: partition filters: ds >= '2023-11-01', ds <= '2023-12-01' 2025/01/30 18:18:20 INFO GroupByUpload.scala:175 - Not setting InputAvroSchema to GroupByServingInfo as there is no streaming source defined. 2025/01/30 18:18:20 INFO GroupByUpload.scala:188 - Built GroupByServingInfo for quiour clientsstart.purchases.v1_test: table: data.purchases / data-model: Events keySchema: Success(struct<user_id:bigint>) valueSchema: Success(struct<purchase_price:bigint>) mutationSchema: Failure(java.lang.NullPointerException) inputSchema: Failure(java.lang.NullPointerException) selectedSchema: Success(struct<purchase_price:bigint>) streamSchema: Failure(java.lang.NullPointerException) 2025/01/30 18:18:20 INFO TableUtils.scala:459 - Repartitioning before writing... 2025/01/30 18:18:24 INFO TableUtils.scala:494 - 102 rows requested to be written into table canary-443022.data.quiour clientsstart_purchases_v1_test_upload 2025/01/30 18:18:24 INFO TableUtils.scala:531 - repartitioning data for table canary-443022.data.quiour clientsstart_purchases_v1_test_upload by 200 spark tasks into 1 table partitions and 10 files per partition 2025/01/30 18:18:24 INFO TableUtils.scala:536 - Sorting within partitions with cols: List(ds) 2025/01/30 18:18:30 INFO TableUtils.scala:469 - Finished writing to canary-443022.data.quiour clientsstart_purchases_v1_test_upload 2025/01/30 18:18:30 INFO TableUtils.scala:440 - Cleared the dataframe cache after repartition & write to canary-443022.data.quiour clientsstart_purchases_v1_test_upload - start @ 2025-01-30 18:18:20 end @ 2025-01-30 18:18:30 Job [c672008e-7380-4a82-a121-4bb0cb46503f] finished successfully. done: true driverControlFilesUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/ driverOutputResourceUri: gs://dataproc-staging-us-central1-703996152583-lxespibx/google-cloud-dataproc-metainfo/5d9e94ed-7649-4828-8b64-e3d58632a5d0/jobs/c672008e-7380-4a82-a121-4bb0cb46503f/driveroutput jobUuid: c672008e-7380-4a82-a121-4bb0cb46503f placement: clusterName: zipline-canary-cluster clusterUuid: 5d9e94ed-7649-4828-8b64-e3d58632a5d0 reference: jobId: c672008e-7380-4a82-a121-4bb0cb46503f projectId: canary-443022 sparkJob: args: - group-by-upload - --conf-path=purchases.v1_test - --end-date=2023-12-01 - --conf-type=group_bys - --additional-conf-path=additional-confs.yaml - --is-gcp - --gcp-project-id=canary-443022 - --gcp-bigtable-instance-id=zipline-canary-instance fileUris: - gs://zipline-warehouse-canary/metadata/purchases.v1_test - gs://zipline-artifacts-canary/confs/additional-confs.yaml jarFileUris: - gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar mainClass: ai.chronon.spark.Driver status: state: DONE stateStartTime: '2025-01-30T18:18:33.742458Z' statusHistory: - state: PENDING stateStartTime: '2025-01-30T18:17:44.197477Z' - state: SETUP_DONE stateStartTime: '2025-01-30T18:17:44.223246Z' - details: Agent reported job success state: RUNNING stateStartTime: '2025-01-30T18:17:44.438240Z' yarnApplications: - name: group-by-upload progress: 1.0 state: FINISHED traour clientsingUrl: http://zipline-canary-cluster-m.us-central1-c.c.canary-443022.internal.:8088/proxy/application_1738197659103_0012/ + echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m' <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>> ++ gcloud dataproc jobs describe c672008e-7380-4a82-a121-4bb0cb46503f --region=us-central1 --format=flattened ++ grep status.state: + JOB_STATE='status.state: DONE' + echo status.state: DONE status.state: DONE + '[' -z 'status.state: DONE' ']' + echo -e '\033[0;32m<<<<<.....................................UPLOAD-TO-KV.....................................>>>>>\033[0m' <<<<<.....................................UPLOAD-TO-KV.....................................>>>>> + touch tmp_upload_to_kv.out + zipline run --mode upload-to-kv --conf production/group_bys/quiour clientsstart/purchases.v1_test --partition-string=2023-12-01 --dataproc + tee /dev/tty tmp_upload_to_kv.out Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Running with args: {'mode': 'upload-to-kv', 'conf': 'production/group_bys/quiour clientsstart/purchases.v1_test', 'dataproc': True, 'env': 'dev', 'ds': None, 'app_name': None, 'start_ds': None, 'end_ds': None, 'parallelism': None, 'repo': '.', 'online_jar': 'cloud_gcp-assembly-0.1.0-SNAPSHOT.jar', 'online_class': 'ai.chronon.integrations.cloud_gcp.GcpApiImpl', 'version': None, 'spark_version': '2.4.0', 'spark_submit_path': None, 'spark_streaming_submit_path': None, 'online_jar_fetch': None, 'sub_help': False, 'conf_type': None, 'online_args': None, 'chronon_jar': None, 'release_tag': None, 'list_apps': None, 'render_info': None} Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance)Array(groupby-upload-bulk-load, --conf-path=purchases.v1_test, --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar, --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl, --conf-type=group_bys, --partition-string=2023-12-01, --additional-conf-path=additional-confs.yaml, --is-gcp, --gcp-project-id=canary-443022, --gcp-bigtable-instance-id=zipline-canary-instance) WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Dataproc submitter job id: c29097e9-b845-4ad7-843a-c89b622c5cfe Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar Setting env variables: From <common_env> setting VERSION=latest From <common_env> setting SPARK_SUBMIT_PATH=[TODO]/path/to/spark-submit From <common_env> setting JOB_MODE=local[*] From <common_env> setting HADOOP_DIR=[STREAMING-TODO]/path/to/folder/containing From <common_env> setting CHRONON_ONLINE_CLASS=[ONLINE-TODO]your.online.class From <common_env> setting CHRONON_ONLINE_ARGS=[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT> From <common_env> setting PARTITION_COLUMN=ds From <common_env> setting PARTITION_FORMAT=yyyy-MM-dd From <common_env> setting CUSTOMER_ID=canary From <common_env> setting GCP_PROJECT_ID=canary-443022 From <common_env> setting GCP_REGION=us-central1 From <common_env> setting GCP_DATAPROC_CLUSTER_NAME=zipline-canary-cluster From <common_env> setting GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance From <cli_args> setting APP_NAME=chronon_group_bys_upload-to-kv_dev_quiour clientsstart.purchases.v1_test From <cli_args> setting CHRONON_CONF_PATH=./production/group_bys/quiour clientsstart/purchases.v1_test From <cli_args> setting CHRONON_ONLINE_JAR=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar File production/group_bys/quiour clientsstart/purchases.v1_test uploaded to metadata/purchases.v1_test in buour clientset zipline-warehouse-canary. Running command: java -cp /Users/davidhan/zipline/chronon/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar:/var/folders/2p/h5v8s0515xv20cgprdjngttr0000gn/T/tmpkirssr9l/spark-3.5.4-bin-hadoop3/jars/* ai.chronon.integrations.cloud_gcp.DataprocSubmitter groupby-upload-bulk-load --conf-path=purchases.v1_test --online-jar=cloud_gcp-assembly-0.1.0-SNAPSHOT.jar --online-class=ai.chronon.integrations.cloud_gcp.GcpApiImpl --conf-type=group_bys --partition-string=2023-12-01 --additional-conf-path=additional-confs.yaml --gcs_files=gs://zipline-warehouse-canary/metadata/purchases.v1_test,gs://zipline-artifacts-canary/confs/additional-confs.yaml --chronon_jar_uri=gs://zipline-artifacts-canary/jars/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar ++ cat tmp_upload_to_kv.out ++ grep 'Dataproc submitter job id' ++ cut -d ' ' -f5 + UPLOAD_TO_KV_JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + cheour clients_dataproc_job_state c29097e9-b845-4ad7-843a-c89b622c5cfe + JOB_ID=c29097e9-b845-4ad7-843a-c89b622c5cfe + '[' -z c29097e9-b845-4ad7-843a-c89b622c5cfe ']' + gcloud dataproc jobs wait c29097e9-b845-4ad7-843a-c89b622c5cfe --region=us-central1 Waiting for job output... 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:42 WARN SparkConf: The configuration key 'spark.yarn.executor.failuresValidityInterval' has been deprecated as of Spark 3.5 and may be removed in the future. Please use the new key 'spark.executor.failuresValidityInterval' instead. 25/01/30 18:18:45 INFO Driver$GroupByUploadToKVBulkLoad$: Triggering bulk load for GroupBy: quiour clientsstart.purchases.v1_test for partition: 2023-12-01 from table: canary-443022.data.quiour clientsstart_purchases_v1_test_upload 25/01/30 18:18:47 INFO BigTableKVStoreImpl: Kiour clientsing off bulkLoad with query: EXPORT DATA OPTIONS ( format='CLOUD_BIGTABLE', overwrite=true, uri="https://bigtable.googleapis.com/projects/canary-443022/instances/zipline-canary-instance/appProfiles/GROUPBY_INGEST/tables/GROUPBY_BATCH", bigtable_options='''{ "columnFamilies" : [ { "familyId": "cf", "encoding": "BINARY", "columns": [ …
Summary
This runs the full zipline suite of commands against a test quickstart groupby.
Example:
Checklist
Summary by CodeRabbit
New Features
Chores