-
Notifications
You must be signed in to change notification settings - Fork 8
Allow metadata upload to submit to dataproc #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces modifications across multiple files in the Chronon project, focusing on enhancing configuration handling and command execution flexibility. Key changes include adding a Changes
Possibly related PRs
Suggested Reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (6)
🔇 Additional comments (3)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
| case value if value.contains(s"$JoinKeyword/") || maybeConfType.contains(JoinKeyword) => | ||
| loadJsonToConf[api.Join](filePath) | ||
| case value if value.contains(s"$GroupByKeyword/") || maybeConfType.contains(GroupByKeyword) => | ||
| loadJsonToConf[api.GroupBy](filePath) | ||
| case value if value.contains(s"$StagingQueryKeyword/") || maybeConfType.contains(StagingQueryKeyword) => | ||
| loadJsonToConf[api.StagingQuery](filePath) | ||
| case value if value.contains(s"$ModelKeyword/") || maybeConfType.contains(ModelKeyword) => | ||
| loadJsonToConf[api.Model](filePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing this because the filePath we're passing in from Dataproc doesn't have the full path because when we configure a job to be submitted with Dataproc, we add the GCS uploaded config but Dataproc places that file in the working directory (not with the full path):
https://cloud.google.com/dataproc/docs/reference/rest/v1/SparkJob
fileUris[] | stringOptional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where we strip the filepath for just the filename in rrun.py. https://github.com/zipline-ai/chronon/blob/main/api/py/ai/chronon/repo/run.py#L575-L580
has some docs
| // TODO: this is actually just an async task. it doesn't block and thus we don't actually | ||
| // know if it successfully created the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piyush-zlai - was thinking I could poll? the issue is this doesn't return a future
| val confType: ScallopOption[String] = | ||
| opt[String](required = false, descr = "Type of the conf to run. ex: join, group-by, etc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned earlier but this is needed because there's logic in scala that depends on the conf file path having keywords like .../joins/... etc and extracting the conf type from the path. Since we don't have the full file path (see above) then we have to use confType which is set in run.py
00fdf11 to
97450d8
Compare
|
|
||
| # fetch online jar if necessary | ||
| if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar: | ||
| if (self.mode in ONLINE_MODES) and (not args["sub_help"] and not self.dataproc) and not valid_jar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like we might want a flag that's not just for dataproc but whether this command is an offline batch run? Not critically blocking though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's getting pretty crowded in the code here.
command is an offline batch run
I think that's what the opposite ofif (self.mode in ONLINE_MODES)is supposed to represent. but then there's whether we submit to dataproc or not (spark-submit).
the sub help part is one we probably could catch much earlier tbh.
| val dirWalker = new MetadataDirWalker(args.confPath(), acceptedEndPoints, maybeConfType = args.confType.toOption) | ||
| val kvMap: Map[String, Map[String, List[String]]] = dirWalker.run | ||
|
|
||
| if (kvMap.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually going to break downstream, if the metadata is not found (even if empty)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'll still continue to attempt to create the metadatastore tables at the line below: https://github.com/zipline-ai/chronon/pull/246/files/97450d8244681ee4f85c39b98157cfaaea911b4e#diff-9ae276942b0ebb6dfd7a36bbcd82083484dce87632b9f91a859df47e36fd5ceaR763
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but aren't you returning early ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry yes. i'm returning early because I didn't think it felt right to continue any further if there's no kv stuff so i returned early. without this conditional, it still tried to create tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to maintain the current behavior don't we want to still create the tables even if there's no kv stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that's where it doesn't feel right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chatted offline. not sure if something else downstream might still want the table even if it's empty
tchow-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
fetch to be called from zipline run outside of Driver.scala so that spark is not required
#306
## Summary ^^^ - Did have to add `bigtables.tables.create` permission to the dataproc service account `[email protected]` since one of the tables `CHRONON_ENTITY_BY_TEAM` didn't exist at the time. See job: https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022 cc @chewy-zlai ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update Successful test in Dataproc: https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022 See: ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_METADATA 2025/01/17 21:44:34 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_METADATA#purchases.v1 cf:value @ 2025/01/17-21:30:36.130000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":13,\"argMap\":{\"k\":\"10\"}}],\"backfillStartDate\":\"2023-11-20\"}" cf:value @ 2025/01/17-21:28:45.503000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputCol ``` ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_ENTITY_BY_TEAM 2025/01/17 21:45:43 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_ENTITY_BY_TEAM#group_bys/quickstart cf:value @ 2025/01/17-21:30:36.621000 "cHVyY2hhc2VzLnYx" ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added configuration type specification option for command-line interfaces. - Enhanced metadata processing with more flexible configuration handling. - **Improvements** - Refined command execution logic for better control flow. - Updated metadata parsing to support additional configuration types. - **Documentation** - Added clarifying comments about asynchronous task behaviors in dataset and table creation processes. The release introduces more flexible configuration management and improved command-line argument handling across various components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary ^^^ - Did have to add `bigtables.tables.create` permission to the dataproc service account `[email protected]` since one of the tables `CHRONON_ENTITY_BY_TEAM` didn't exist at the time. See job: https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022 cc @chewy-zlai ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update Successful test in Dataproc: https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022 See: ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_METADATA 2025/01/17 21:44:34 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_METADATA#purchases.v1 cf:value @ 2025/01/17-21:30:36.130000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":13,\"argMap\":{\"k\":\"10\"}}],\"backfillStartDate\":\"2023-11-20\"}" cf:value @ 2025/01/17-21:28:45.503000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputCol ``` ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_ENTITY_BY_TEAM 2025/01/17 21:45:43 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_ENTITY_BY_TEAM#group_bys/quickstart cf:value @ 2025/01/17-21:30:36.621000 "cHVyY2hhc2VzLnYx" ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added configuration type specification option for command-line interfaces. - Enhanced metadata processing with more flexible configuration handling. - **Improvements** - Refined command execution logic for better control flow. - Updated metadata parsing to support additional configuration types. - **Documentation** - Added clarifying comments about asynchronous task behaviors in dataset and table creation processes. The release introduces more flexible configuration management and improved command-line argument handling across various components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary ^^^ - Did have to add `bigtables.tables.create` permission to the dataproc service account `[email protected]` since one of the tables `CHRONON_ENTITY_BY_TEAM` didn't exist at the time. See job: https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022 cc @chewy-zlai ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update Successful test in Dataproc: https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022 See: ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_METADATA 2025/01/17 21:44:34 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_METADATA#purchases.v1 cf:value @ 2025/01/17-21:30:36.130000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":13,\"argMap\":{\"k\":\"10\"}}],\"backfillStartDate\":\"2023-11-20\"}" cf:value @ 2025/01/17-21:28:45.503000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputCol ``` ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_ENTITY_BY_TEAM 2025/01/17 21:45:43 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_ENTITY_BY_TEAM#group_bys/quickstart cf:value @ 2025/01/17-21:30:36.621000 "cHVyY2hhc2VzLnYx" ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added configuration type specification option for command-line interfaces. - Enhanced metadata processing with more flexible configuration handling. - **Improvements** - Refined command execution logic for better control flow. - Updated metadata parsing to support additional configuration types. - **Documentation** - Added clarifying comments about asynchronous task behaviors in dataset and table creation processes. The release introduces more flexible configuration management and improved command-line argument handling across various components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary ^^^ - Did have to add `bigtables.tables.create` permission to the dataproc service account `[email protected]` since one of the tables `CHRONON_ENTITY_BY_TEAM` didn't exist at the time. See job: https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022 cc @chewy-zlai ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update Successful test in Dataproc: https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022 See: ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_METADATA 2025/01/17 21:44:34 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_METADATA#purchases.v1 cf:value @ 2025/01/17-21:30:36.130000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":13,\"argMap\":{\"k\":\"10\"}}],\"backfillStartDate\":\"2023-11-20\"}" cf:value @ 2025/01/17-21:28:45.503000 "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputCol ``` ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_ENTITY_BY_TEAM 2025/01/17 21:45:43 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_ENTITY_BY_TEAM#group_bys/quickstart cf:value @ 2025/01/17-21:30:36.621000 "cHVyY2hhc2VzLnYx" ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added configuration type specification option for command-line interfaces. - Enhanced metadata processing with more flexible configuration handling. - **Improvements** - Refined command execution logic for better control flow. - Updated metadata parsing to support additional configuration types. - **Documentation** - Added clarifying comments about asynchronous task behaviors in dataset and table creation processes. The release introduces more flexible configuration management and improved command-line argument handling across various components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary ^^^ - Did have to add `bigtables.tables.create` permission to the dataproc service account `[email protected]` since one of the tables `CHRONON_ENTITY_BY_TEAM` didn't exist at the time. See job: https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022 cc @chewy-zlai ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update Successful test in Dataproc: https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022 See: ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_METADATA 2025/01/17 21:44:34 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_METADATA#purchases.v1 cf:value @ 2025/01/17-21:30:36.130000 "{\"metaData\":{\"name\":\"quiour clientsstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quiour clientsstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":13,\"argMap\":{\"k\":\"10\"}}],\"baour clientsfillStartDate\":\"2023-11-20\"}" cf:value @ 2025/01/17-21:28:45.503000 "{\"metaData\":{\"name\":\"quiour clientsstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quiour clientsstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputCol ``` ``` (dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_ENTITY_BY_TEAM 2025/01/17 21:45:43 -creds flag unset, will use gcloud credential ---------------------------------------- CHRONON_ENTITY_BY_TEAM#group_bys/quiour clientsstart cf:value @ 2025/01/17-21:30:36.621000 "cHVyY2hhc2VzLnYx" ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Added configuration type specification option for command-line interfaces. - Enhanced metadata processing with more flexible configuration handling. - **Improvements** - Refined command execution logic for better control flow. - Updated metadata parsing to support additional configuration types. - **Documentation** - Added clarifying comments about asynchronous task behaviors in dataset and table creation processes. The release introduces more flexible configuration management and improved command-line argument handling across various components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
^^^
bigtables.tables.createpermission to the dataproc service account[email protected]since one of the tablesCHRONON_ENTITY_BY_TEAMdidn't exist at the time. See job:https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022
cc @chewy-zlai
Checklist
Successful test in Dataproc: https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022
See:
Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation
The release introduces more flexible configuration management and improved command-line argument handling across various components.