-
Notifications
You must be signed in to change notification settings - Fork 8
Refactor run.py to not hardcode against zipline-artifacts or zipline-warehouse buckets but configurable #658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update centralizes the management of cloud storage bucket names for both AWS and GCP runners by introducing new constants and enforcing validation on bucket name prefixes. The runners now store bucket names as instance variables, removing the need for dynamic construction using customer IDs. Method signatures for downloading JARs have been simplified to accept explicit bucket names. The CLI adds options for specifying artifact and warehouse buckets. Test environment configurations and Thrift struct definitions are also updated to reflect these changes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Runner
participant CloudRunner
User->>CLI: Provide --zipline-artifacts-bucket and --zipline-warehouse-bucket
CLI->>Runner: Pass bucket names via args
Runner->>CloudRunner: Initialize with bucket names
CloudRunner->>CloudRunner: Validate bucket name prefix
CloudRunner->>CloudRunner: Set jar path
CloudRunner->>CloudRunner: Use bucket names directly for JAR/file operations
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (2)
api/python/ai/chronon/repo/aws.py (1)
61-72:⚠️ Potential issuePrefix handling in S3 operations
upload_file,download_file,head_objectall need the plain bucket name. Quick fix: strip once at call‑site.- obj.upload_file(source_file_name, bucket_name, destination_blob_name) + stripped = bucket_name[len(S3_PREFIX):] if bucket_name.startswith(S3_PREFIX) else bucket_name + obj.upload_file(source_file_name, stripped, destination_blob_name)Do the same in
download_zipline_aws_jar()andget_s3_file_hash().api/python/ai/chronon/repo/gcp.py (1)
80-90:⚠️ Potential issue
download_gcs_blobstill passes prefixed bucket
Add the samegs://stripping logic used elsewhere:- bucket = storage_client.bucket(bucket_name) + if bucket_name.startswith(GCS_PREFIX): + bucket_name = bucket_name[len(GCS_PREFIX):] + bucket = storage_client.bucket(bucket_name)
🧹 Nitpick comments (4)
api/thrift/agent.thrift (2)
36-37: Consider documenting the reason for commenting out.Rather than commenting out the field, add an explanation or consider proper deprecation method.
- // 10: optional list<string> args + // 10: optional list<string> args - Deprecated: args now passed through environment variables and runner configuration
126-127: Consider documenting the reason for commenting out.Similar to above, add context for why this field is being disabled.
- // 2: optional string end + // 2: optional string end - Deprecated: end date handling refactoredapi/python/ai/chronon/repo/run.py (2)
185-190: Add env‑fallback & type guard for new bucket flags
CLI flags are added but not backed bytype=str, defaults orset_defaults(). Users omitting the flag but setting the env var will still seeNoneinctx.params, breaking downstream validation.
243-245: Use the canonical setter
Directly mutatingdefault_runner.jar_pathbypasses any validation encapsulated inRunner.set_jar_path().- default_runner.jar_path = os.path.expanduser(chronon_jar) + default_runner.set_jar_path(os.path.expanduser(chronon_jar))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (7)
api/python/ai/chronon/repo/aws.py(5 hunks)api/python/ai/chronon/repo/constants.py(1 hunks)api/python/ai/chronon/repo/default_runner.py(2 hunks)api/python/ai/chronon/repo/gcp.py(7 hunks)api/python/ai/chronon/repo/run.py(4 hunks)api/python/test/canary/teams.py(3 hunks)api/thrift/agent.thrift(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
api/python/ai/chronon/repo/default_runner.py (1)
api/python/ai/chronon/repo/constants.py (1)
RunMode(4-30)
api/python/ai/chronon/repo/gcp.py (1)
api/python/ai/chronon/repo/default_runner.py (1)
set_jar_path(106-107)
🔇 Additional comments (6)
api/python/ai/chronon/repo/constants.py (1)
160-164: Good addition of configurable bucket constants.The addition of bucket environment keys and URI prefixes supports the PR objective of removing hardcoded bucket references.
api/python/test/canary/teams.py (2)
60-61: Proper bucket configuration for GCP team.GCS bucket URLs correctly use the "gs://" prefix.
107-108: Proper bucket configuration for AWS team.S3 bucket URLs correctly use the "s3://" prefix.
api/python/ai/chronon/repo/default_runner.py (3)
15-17: Appropriate constant imports.New imports support the bucket name configuration.
22-24: Constructor refactored to initialize jar_path as None.This aligns with the new pattern of setting jar_path after initialization.
106-108: Good addition of setter method.Setter enables setting jar_path after initialization, supporting refactored construction pattern.
| self.zipline_artifacts_bucket = (args.get("zipline_artifacts_bucket") | ||
| or os.environ.get(ZIPLINE_ARTIFACTS_BUCKET_ENV_KEY)) | ||
| self.zipline_warehouse_bucket = (args.get("zipline_warehouse_bucket") | ||
| or os.environ.get(ZIPLINE_WAREHOUSE_BUCKET_ENV_KEY)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add bucket name validation.
Validate that the bucket names have correct prefixes (S3_PREFIX or GCS_PREFIX).
self.zipline_artifacts_bucket = (args.get("zipline_artifacts_bucket")
or os.environ.get(ZIPLINE_ARTIFACTS_BUCKET_ENV_KEY))
self.zipline_warehouse_bucket = (args.get("zipline_warehouse_bucket")
or os.environ.get(ZIPLINE_WAREHOUSE_BUCKET_ENV_KEY))
+
+if self.zipline_artifacts_bucket and not (self.zipline_artifacts_bucket.startswith(S3_PREFIX) or
+ self.zipline_artifacts_bucket.startswith(GCS_PREFIX)):
+ raise ValueError(f"Artifacts bucket must start with {S3_PREFIX} or {GCS_PREFIX}")
+
+if self.zipline_warehouse_bucket and not (self.zipline_warehouse_bucket.startswith(S3_PREFIX) or
+ self.zipline_warehouse_bucket.startswith(GCS_PREFIX)):
+ raise ValueError(f"Warehouse bucket must start with {S3_PREFIX} or {GCS_PREFIX}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.zipline_artifacts_bucket = (args.get("zipline_artifacts_bucket") | |
| or os.environ.get(ZIPLINE_ARTIFACTS_BUCKET_ENV_KEY)) | |
| self.zipline_warehouse_bucket = (args.get("zipline_warehouse_bucket") | |
| or os.environ.get(ZIPLINE_WAREHOUSE_BUCKET_ENV_KEY)) | |
| self.zipline_artifacts_bucket = (args.get("zipline_artifacts_bucket") | |
| or os.environ.get(ZIPLINE_ARTIFACTS_BUCKET_ENV_KEY)) | |
| self.zipline_warehouse_bucket = (args.get("zipline_warehouse_bucket") | |
| or os.environ.get(ZIPLINE_WAREHOUSE_BUCKET_ENV_KEY)) | |
| if self.zipline_artifacts_bucket and not ( | |
| self.zipline_artifacts_bucket.startswith(S3_PREFIX) | |
| or self.zipline_artifacts_bucket.startswith(GCS_PREFIX) | |
| ): | |
| raise ValueError( | |
| f"Artifacts bucket must start with {S3_PREFIX} or {GCS_PREFIX}" | |
| ) | |
| if self.zipline_warehouse_bucket and not ( | |
| self.zipline_warehouse_bucket.startswith(S3_PREFIX) | |
| or self.zipline_warehouse_bucket.startswith(GCS_PREFIX) | |
| ): | |
| raise ValueError( | |
| f"Warehouse bucket must start with {S3_PREFIX} or {GCS_PREFIX}" | |
| ) |
| for bucket in [self.zipline_artifacts_bucket, self.zipline_warehouse_bucket]: | ||
| if not bucket.startswith(S3_PREFIX): | ||
| raise ValueError( | ||
| f"Invalid bucket name: {bucket}. " | ||
| f"Bucket names must start with '{S3_PREFIX}'." | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard None and strip s3:// before AWS SDK calls
bucket may be None, and boto3 expects bare bucket names (no s3://). bucket.startswith() on None raises, and passing the prefixed value to boto3 will fail.
- for bucket in [self.zipline_artifacts_bucket, self.zipline_warehouse_bucket]:
- if not bucket.startswith(S3_PREFIX):
+ for bucket in [self.zipline_artifacts_bucket, self.zipline_warehouse_bucket]:
+ if bucket is None:
+ raise ValueError("Zipline bucket not provided.")
+ if not bucket.startswith(S3_PREFIX):
raise ValueError(
f"Invalid bucket name: {bucket}. "
f"Bucket names must start with '{S3_PREFIX}'."
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for bucket in [self.zipline_artifacts_bucket, self.zipline_warehouse_bucket]: | |
| if not bucket.startswith(S3_PREFIX): | |
| raise ValueError( | |
| f"Invalid bucket name: {bucket}. " | |
| f"Bucket names must start with '{S3_PREFIX}'." | |
| ) | |
| for bucket in [self.zipline_artifacts_bucket, self.zipline_warehouse_bucket]: | |
| if bucket is None: | |
| raise ValueError("Zipline bucket not provided.") | |
| if not bucket.startswith(S3_PREFIX): | |
| raise ValueError( | |
| f"Invalid bucket name: {bucket}. " | |
| f"Bucket names must start with '{S3_PREFIX}'." | |
| ) |
| # Validate bucket names start with "gs://" | ||
| for bucket in [self.zipline_artifacts_bucket, self.zipline_warehouse_bucket]: | ||
| if not bucket.startswith(GCS_PREFIX): | ||
| raise ValueError( | ||
| f"Invalid bucket name: {bucket}. " | ||
| f"Bucket names must start with '{GCS_PREFIX}'." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Same None / prefix issue as AWS
Handle missing buckets before .startswith() to avoid AttributeError.
Summary
^^^
Checklist
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation
Tests
Other