-
Notifications
You must be signed in to change notification settings - Fork 9
feat: deprecate partition_column for StagingQuery #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant StagingQuery
participant TableDependency
participant Thrift
Caller->>StagingQuery: Call without partition_column
StagingQuery->>TableDependency: Use TableDependency (with partition_format)
TableDependency->>Thrift: to_thrift() (sets partitionFormat, 1-day interval)
StagingQuery->>Thrift: Construct StagingQuery (no partitionColumn)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
api/python/ai/chronon/staging_query.py (1)
81-81: Remove orphaned documentation.Documentation references removed
partition_columnparameter.- :type partition_column: str
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
api/python/ai/chronon/staging_query.py(2 hunks)api/thrift/api.thrift(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#192
File: spark/src/main/scala/ai/chronon/spark/GroupBy.scala:296-299
Timestamp: 2025-01-09T17:57:34.451Z
Learning: In Spark SQL date handling:
- date_format() converts dates to strings (used for partition columns which need string format)
- to_date() converts strings to DateType (used when date operations are needed)
These are opposites and should not be standardized to use the same function.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: api_tests
- GitHub Check: groupby_tests
- GitHub Check: online_tests
- GitHub Check: aggregator_tests
- GitHub Check: python_tests
- GitHub Check: flink_tests
- GitHub Check: analyzer_tests
- GitHub Check: streaming_tests
- GitHub Check: fetcher_tests
- GitHub Check: join_tests
- GitHub Check: spark_tests
- GitHub Check: batch_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
api/python/ai/chronon/staging_query.py (3)
22-22: LGTM on partition_format addition.New field properly supports the evolution away from partition_column.
34-35: LGTM on to_thrift updates.Correctly includes new partition_format field and sets fixed 1-day interval.
139-144: Remove partition_column suggestion: TableDependency owns and uses partition_column correctlyThe StagingQuery builder never accepted a top-level
partition_column; it’s provided per-dependency viaTableDependency, which is passed into bothcreate_airflow_dependencyandto_thrift(). No change needed.Likely an incorrect or invalid review comment.
| 5: optional EngineType engineType | ||
|
|
||
| 7: optional list<common.TableDependency> tableDependencies | ||
| 6: optional list<common.TableDependency> tableDependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field removal breaks backward compatibility.
Removing partitionColumn field and renumbering subsequent fields will break existing Thrift clients. Consider deprecating the field first before removal.
🤖 Prompt for AI Agents
In api/thrift/api.thrift around lines 87 to 89, the removal of the
partitionColumn field and renumbering of subsequent fields breaks backward
compatibility with existing Thrift clients. To fix this, do not remove the
partitionColumn field immediately; instead, mark it as deprecated in the Thrift
IDL to maintain compatibility. Keep the original field numbering intact and
avoid renumbering other fields to prevent breaking existing clients.
Summary
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Refactor