-
Notifications
You must be signed in to change notification settings - Fork 9
Adding step days of 1 to source job #578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe Changes
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code Definitions (1)spark/src/main/scala/ai/chronon/spark/MergeJob.scala (4)
⏰ Context from checks skipped due to timeout of 90000ms (14)
🔇 Additional comments (5)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
spark/src/main/scala/ai/chronon/spark/MergeJob.scala (3)
57-58: Potential performance overhead scanning full range for schema.
66-79: Consider refining exception handling and logging the problematic day.
80-81: Check if DataFrame is empty before saving to avoid surprises.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
spark/src/main/scala/ai/chronon/spark/MergeJob.scala(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
spark/src/main/scala/ai/chronon/spark/MergeJob.scala (3)
api/src/main/scala/ai/chronon/api/DataRange.scala (4)
PartitionRange(38-128)PartitionRange(130-174)steps(82-87)shift(99-105)spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)
scanDf(593-613)spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (3)
BootstrapInfo(51-74)BootstrapInfo(76-346)from(80-345)
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: streaming_tests
- GitHub Check: streaming_tests
- GitHub Check: spark_tests
- GitHub Check: join_tests
- GitHub Check: analyzer_tests
- GitHub Check: spark_tests
- GitHub Check: groupby_tests
- GitHub Check: groupby_tests
- GitHub Check: fetcher_tests
- GitHub Check: fetcher_tests
- GitHub Check: analyzer_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: join_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (5)
spark/src/main/scala/ai/chronon/spark/MergeJob.scala (5)
19-19: No issues with the new import.
62-65: Day-by-day iteration looks correct.
85-85: New parameter for daily range retrieval is consistent with the loop.
91-91: Watch for off-by-one errors when shifting day range by -1.
93-93: Leaving day range unshifted looks correct for non-snapshot accuracy.
| val joinedDfTry = | ||
| try { | ||
| Success( | ||
| rightPartsData | ||
| .foldLeft(leftDf) { case (partialDf, (rightPart, rightDf)) => | ||
| joinWithLeft(partialDf, rightDf, rightPart) | ||
| } | ||
| // drop all processing metadata columns | ||
| .drop(Constants.MatchedHashes, Constants.TimePartitionColumn)) | ||
| } catch { | ||
| case e: Exception => | ||
| e.printStackTrace() | ||
| Failure(e) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| val joinedDfTry = | |
| try { | |
| Success( | |
| rightPartsData | |
| .foldLeft(leftDf) { case (partialDf, (rightPart, rightDf)) => | |
| joinWithLeft(partialDf, rightDf, rightPart) | |
| } | |
| // drop all processing metadata columns | |
| .drop(Constants.MatchedHashes, Constants.TimePartitionColumn)) | |
| } catch { | |
| case e: Exception => | |
| e.printStackTrace() | |
| Failure(e) | |
| } | |
| val joinedDfTry = | |
| Try { | |
| rightPartsData | |
| .foldLeft(leftDf) { case (partialDf, (rightPart, rightDf)) => | |
| joinWithLeft(partialDf, rightDf, rightPart) | |
| } | |
| // drop all processing metadata columns | |
| .drop(Constants.MatchedHashes, Constants.TimePartitionColumn)) | |
| } |
## Summary Adding step days of 1 to source job ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Data processing is now handled in daily segments, providing more precise and timely results. - **Bug Fixes** - Error messages have been refined to clearly indicate the specific day when a query yields no results, improving clarity during troubleshooting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Adding step days of 1 to source job ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Data processing is now handled in daily segments, providing more precise and timely results. - **Bug Fixes** - Error messages have been refined to clearly indicate the specific day when a query yields no results, improving clarity during troubleshooting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Adding step days of 1 to source job ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Data processing is now handled in daily segments, providing more precise and timely results. - **Bug Fixes** - Error messages have been refined to clearly indicate the specific day when a query yields no results, improving clarity during troubleshooting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Adding step days of 1 to source job ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Data processing is now handled in daily segments, providing more precise and timely results. - **Bug Fixes** - Error messages have been refined to clearly indicate the specific day when a query yields no results, improving clarity during troubleshooting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Adding step days of 1 to source job ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Data processing is now handled in daily segments, providing more precise and timely results. - **Bug Fixes** - Error messages have been refined to clearly indicate the specific day when a query yields no results, improving clarity during troubleshooting. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
Summary
Adding step days of 1 to source job
Checklist
Summary by CodeRabbit