-
Notifications
You must be signed in to change notification settings - Fork 9
feat: add a join to the gcp integration tests #727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new Python test file was added to define and configure GCP-based training set joins, specifying sources and join logic for two environments. Additionally, a shell script was updated to introduce a new backfill step, executing a join operation as part of the data processing pipeline. The script selects the appropriate join configuration based on the environment and includes error checking after execution. Changes
Sequence Diagram(s)sequenceDiagram
participant Script
participant Zipline
participant Config
Script->>Zipline: Run backfill for view (BACKFILL-VIEW)
Script->>Config: Select join config (v1_test or v1_dev)
Script->>Zipline: Run backfill for join (BACKFILL-JOIN) with config
Zipline-->>Script: Return status
Script->>Script: Check for failure
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
api/python/test/canary/joins/gcp/training_set.py (3)
7-21: Convert comment to proper docstring.-""" -This is the "left side" of the join that will comprise our training set. It is responsible for providing the primary keys -and timestamps for which features will be computed. -""" +# This is the "left side" of the join that will comprise our training set. It is responsible for providing the primary keys +# and timestamps for which features will be computed.
23-28: Add brief comment explaining v1_test's purpose.Join definition lacks documentation.
30-35: Add brief comment explaining v1_dev's purpose.Join definition lacks documentation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
api/python/test/canary/joins/gcp/training_set.py(1 hunks)scripts/distribution/run_gcp_quickstart.sh(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
api/python/test/canary/joins/gcp/training_set.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/src/main/scala/ai/chronon/api/Builders.scala (1)
Source(106-140)api/python/ai/chronon/query.py (1)
selects(93-116)
⏰ Context from checks skipped due to timeout of 90000ms (15)
- GitHub Check: service_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: online_tests
- GitHub Check: online_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: flink_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: api_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: aggregator_tests
- GitHub Check: api_tests
- GitHub Check: aggregator_tests
- GitHub Check: flink_tests
🔇 Additional comments (1)
scripts/distribution/run_gcp_quickstart.sh (1)
146-153: New BACKFILL-JOIN step looks good.Properly implements join backfill similar to existing backfill steps.
2f5eb9a to
dacd493
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
scripts/distribution/run_gcp_quickstart.sh (3)
129-133: Parameterize date flags.
Extract--start-dsand--end-dsinto top-level variables (e.g.START_DS/END_DS) instead of hard-coding here.
137-142: DRY backfill-view block.
This mirrors the purchases backfill. Factor out common logic into a loop or function.
144-144: Unify fail handler usage.
fail_if_bash_faileduses$?internally; drop the explicit$?argument or update the function signature.Also applies to: 153-153
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
scripts/distribution/run_gcp_quickstart.sh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: service_commons_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: online_tests
- GitHub Check: online_tests
- GitHub Check: api_tests
- GitHub Check: flink_tests
- GitHub Check: flink_tests
- GitHub Check: api_tests
- GitHub Check: aggregator_tests
- GitHub Check: aggregator_tests
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
| echo -e "${GREEN}<<<<<.....................................BACKFILL-JOIN.....................................>>>>>\033[0m" | ||
| if [[ "$ENVIRONMENT" == "canary" ]]; then | ||
| zipline run --repo=$CHRONON_ROOT --version $VERSION --mode backfill --conf compiled/joins/gcp/training_set.v1_test --start-ds 2023-11-01 --end-ds 2023-12-01 | ||
| else | ||
| zipline run --repo=$CHRONON_ROOT --version $VERSION --mode backfill --conf compiled/joins/gcp/training_set.v1_dev --start-ds 2023-11-01 --end-ds 2023-12-01 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate join config existence.
Add a check like [[ -f compiled/joins/gcp/training_set.v1_${ENVIRONMENT} ]] || { echo "Missing join config"; exit 1; } before running.
c039903 to
dacd493
Compare
## Summary
## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a new test for joining GCP-based training set data,
supporting different join configurations.
- Added a new backfill step for join operations in the data processing
pipeline, with environment-specific configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: thomaschow <[email protected]>
## Summary
## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a new test for joining GCP-based training set data,
supporting different join configurations.
- Added a new backfill step for join operations in the data processing
pipeline, with environment-specific configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: thomaschow <[email protected]>
## Summary
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a new test for joining GCP-based training set data,
supporting different join configurations.
- Added a new baour clientsfill step for join operations in the data processing
pipeline, with environment-specific configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: thomaschow <[email protected]>
Summary
Checklist
Summary by CodeRabbit