Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .tool-versions
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@ java corretto-11.0.25.9.1
scala 2.12.20
asdf-plugin-manager 1.4.0
sbt 1.8.2
python 3.7.17
python
3.7.17
3.11.0
Comment on lines +5 to +7
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix .tool-versions format.

The format should be "tool version" per line. Multiple versions should be space-separated.

-python
-    3.7.17
-    3.11.0
+python 3.7.17 3.11.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python
3.7.17
3.11.0
python 3.7.17 3.11.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chewy-zlai Understood, I'll remember this format preference for future reviews.


✏️ Learnings added
Learnt from: chewy-zlai
PR: zipline-ai/chronon#203
File: .tool-versions:5-7
Timestamp: 2025-01-14T18:38:18.334Z
Learning: The `.tool-versions` file in this repository intentionally uses indented format for multiple Python versions, rather than space-separated format on a single line.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I needed it to be space-separated on a single line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chewy-zlai I apologize for the misunderstanding! You're absolutely right - the space-separated single line format is indeed the correct way. My original suggestion to use python 3.7.17 3.11.0 stands correct.

⊙﹏⊙
(embarrassed rabbit)


✏️ Learnings added
Learnt from: chewy-zlai
PR: zipline-ai/chronon#203
File: .tool-versions:5-7
Timestamp: 2025-01-14T19:19:17.302Z
Learning: The `.tool-versions` file should use space-separated format for multiple versions on a single line (e.g., `python 3.7.17 3.11.0`).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

83 changes: 83 additions & 0 deletions distribution/build_and_upload_gcp_artifacts.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/bash

SCRIPT_DIRECTORY=$(dirname -- "$(realpath -- "$0")")
CHRONON_ROOT_DIR=$(dirname "$SCRIPT_DIRECTORY")

echo "Working in $CHRONON_ROOT_DIR"
cd $CHRONON_ROOT_DIR

echo "Building wheel"
#Check python version >= 3.9
MAJOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 1)
MINOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 2)

EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION=3
EXPECTED_MINIMUM_MINOR_PYTHON_VERSION=9

if [[ $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION -gt $MAJOR_PYTHON_VERSION ]] ; then
echo "Failed major version of $MAJOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi

if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix Python version comparison.

Missing $ in variable references.

-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi

thrift --gen py -out api/py/ai/chronon api/thrift/common.thrift
thrift --gen py -out api/py/ai/chronon api/thrift/api.thrift
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py
EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"
if [ ! -f "$EXPECTED_ZIPLINE_WHEEL" ]; then
echo "$EXPECTED_ZIPLINE_WHEEL not found"
exit 1
fi

echo "Building jars"
sbt cloud_gcp/assembly
sbt cloud_gcp_submitter/assembly

Comment on lines +36 to +39

This comment was marked as resolved.

CLOUD_GCP_JAR="$CHRONON_ROOT_DIR/cloud_gcp/target/scala-2.12/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar"
CLOUD_GCP_SUBMITTER_JAR="$CHRONON_ROOT_DIR/cloud_gcp_submitter/target/scala-2.12/cloud_gcp_submitter-assembly-0.1.0-SNAPSHOT.jar"

if [ ! -f "$CLOUD_GCP_JAR" ]; then
echo "$CLOUD_GCP_JAR not found"
exit 1
fi

if [ ! -f "$CLOUD_GCP_SUBMITTER_JAR" ]; then
echo "$CLOUD_GCP_SUBMITTER_JAR not found"
exit 1
fi

# all customer ids
ALL_CUSTOMER_IDS=("canary" "etsy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add configuration file upload support.

Per PR comments, add support for uploading configuration files:

 ALL_CUSTOMER_IDS=("canary" "etsy")
+CONF_FILES=("conf/dataproc.conf" "conf/cloud_gcp.conf")

Then update the upload function to handle configuration files:

 ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
+ELEMENT_CONF_PATH=gs://zipline-artifacts-$element/conf
 backup_and_verify "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH/$(basename "$CLOUD_GCP_JAR")"
 backup_and_verify "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH/$(basename "$CLOUD_GCP_SUBMITTER_JAR")"
 backup_and_verify "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH/$(basename "$EXPECTED_ZIPLINE_WHEEL")"
+for conf in "${CONF_FILES[@]}"; do
+  if [ -f "$conf" ]; then
+    backup_and_verify "$conf" "$ELEMENT_CONF_PATH/$(basename "$conf")"
+  fi
+done

Committable suggestion skipped: line range outside the PR's diff.


# Takes in array of customer ids
function upload_to_gcp() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this! Could we also add the additional-confs.yaml file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want to upload this file though? currently the etsy file looks like this:

spark.chronon.table.format_provider.class: "ai.chronon.integrations.cloud_gcp.GcpFormatProvider"
spark.chronon.partition.format: "yyyy-MM-dd"
spark.chronon.table.gcs.temporary_gcs_bucket: "zipline-warehouse-etsy"
spark.chronon.partition.column: "_DATE"
spark.chronon.table.gcs.connector_output_dataset: "search"
spark.chronon.table.gcs.connector_output_project: "etsy-zipline-dev"

i think we were planning on moving most of these params out to teams.json except for maybe the first two?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay let's leave it out - I think we can bake them into the run.py script.

customer_ids_to_upload=("$@")
echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
select yn in "Yes" "No"; do
case $yn in
Yes )
set -euxo pipefail
for element in "${customer_ids_to_upload[@]}"
do
ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH";
gcloud storage cp "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH";
gcloud storage cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH"
done
echo "Succeeded"
break;;
No ) exit 0;;
esac
done
}

# check if $1 (single customer id mode) has been set
if [ -z "$1" ]; then
upload_to_gcp "${ALL_CUSTOMER_IDS[@]}"
else
upload_to_gcp "$1"
fi