-
Notifications
You must be signed in to change notification settings - Fork 8
feat: add planner tests #864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new Scala test suite was added to verify join planner configurations parse and plan without exceptions. Test resources were added and linked in the Bazel build. A resource file was introduced in test resources. An assertion was replaced by a precondition check in JSON deserialization. The Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as PlannerTest
participant Env as Environment
participant FileSystem as File System
participant LocalRunner
participant MonolithJoinPlanner
TestRunner->>Env: Read RUNFILES_DIR
TestRunner->>FileSystem: Locate compiled join configs
TestRunner->>LocalRunner: Parse configs into Join objects
loop For each Join object
TestRunner->>MonolithJoinPlanner: Create planner
TestRunner->>MonolithJoinPlanner: buildPlan()
MonolithJoinPlanner-->>TestRunner: No exception thrown
end
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (18)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala (1)
12-12: Mark implicit explicitly.Add the type for clarity:
-private implicit val testPartitionSpec = PartitionSpec.daily +private implicit val testPartitionSpec: PartitionSpec = PartitionSpec.daily
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
spark/src/test/resources/local_data_csv/test_table_1_data.csvis excluded by!**/*.csv
📒 Files selected for processing (9)
api/BUILD.bazel(1 hunks)api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala(1 hunks)spark/src/test/resources/canary/compiled(1 hunks)spark/src/test/resources/group_bys/team/example_group_by.v1(0 hunks)spark/src/test/resources/group_bys/team/purchases.v1(0 hunks)spark/src/test/resources/group_bys/team/purchases_only_conf_common.v1(0 hunks)spark/src/test/resources/joins/team/example_join.v1(0 hunks)spark/src/test/resources/joins/team/example_join_failure.v1(0 hunks)spark/src/test/resources/teams_metadata/default_team_metadata(0 hunks)
💤 Files with no reviewable changes (6)
- spark/src/test/resources/teams_metadata/default_team_metadata
- spark/src/test/resources/group_bys/team/example_group_by.v1
- spark/src/test/resources/joins/team/example_join.v1
- spark/src/test/resources/group_bys/team/purchases.v1
- spark/src/test/resources/group_bys/team/purchases_only_conf_common.v1
- spark/src/test/resources/joins/team/example_join_failure.v1
⏰ Context from checks skipped due to timeout of 90000ms (16)
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: streaming_tests
- GitHub Check: analyzer_tests
- GitHub Check: join_tests
- GitHub Check: batch_tests
- GitHub Check: spark_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: fetcher_tests
- GitHub Check: aggregator_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: api_tests
- GitHub Check: online_tests
- GitHub Check: flink_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
api/BUILD.bazel (1)
64-66: Ensure thetest-resourcestarget exists & is propagated.If
//spark/src/test/resources:test-resourcesisn’t defined withvisibility = ["//visibility:public"]the test will break at runtime. Double-check the BUILD underspark/src/test/resources(or add one) so Bazel can copy the files.
| joinPlanners | ||
| .foreach { planner => | ||
| noException should be thrownBy { | ||
| planner.buildPlan | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Assert on result, not just “no exception”.
A test that only checks for absence of exceptions can silently pass on wrong output. Capture the plan and at least assert size > 0.
- joinPlanners
- .foreach { planner =>
- noException should be thrownBy {
- planner.buildPlan
- }
- }
+ joinPlanners.foreach { planner =>
+ val plan = planner.buildPlan
+ plan.nodes.size should be > 0 // or a more specific assertion
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| joinPlanners | |
| .foreach { planner => | |
| noException should be thrownBy { | |
| planner.buildPlan | |
| } | |
| } | |
| joinPlanners.foreach { planner => | |
| val plan = planner.buildPlan | |
| plan.nodes.size should be > 0 // or a more specific assertion | |
| } |
🤖 Prompt for AI Agents
In api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala around lines
24 to 29, the test currently only asserts that no exception is thrown when
calling planner.buildPlan. Modify the test to capture the result of buildPlan
and add an assertion that the resulting plan's size is greater than zero to
ensure the output is valid and not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala (1)
108-113: 🛠️ Refactor suggestionPrefer
assert/requiresemantics match the failure typeSwitching to
requireworks, but note the semantic shift:
requiresignals bad caller input (throwsIllegalArgumentException), whereas this check validates an internal invariant (round-trip equality). For internal sanity checksassert(orPreconditions.checkState) is more idiomatic. If you keeprequire, renamecheckparam to something likevalidateRoundTripto avoid confusion.
🧹 Nitpick comments (1)
api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala (1)
101-104: Reuse a singleObjectMapper
new ObjectMapper()per call is fairly heavy and thread-safe once configured. Consider:- val mapper = new ObjectMapper() - mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) + val mapper = ThriftJsonCodec.sharedMapperand add somewhere in the object:
private val sharedMapper: ObjectMapper = new ObjectMapper() .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)Cuts GC churn in hot paths.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala(1 hunks)api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: online_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: service_commons_tests
- GitHub Check: aggregator_tests
- GitHub Check: flink_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_tests
- GitHub Check: api_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: streaming_tests
- GitHub Check: batch_tests
- GitHub Check: join_tests
- GitHub Check: spark_tests
- GitHub Check: groupby_tests
- GitHub Check: fetcher_tests
- GitHub Check: analyzer_tests
- GitHub Check: enforce_triggered_workflows
880a999 to
54345bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 (1)
65-69: Leftover angle-bracket placeholders
Values like<customer_id>or<partition-column-name>remain. If this file is meant for real execution (not just test fixtures), replace them to avoid runtime NPEs during env-var interpolation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
api/python/ai/chronon/resources/gcp/compiled/group_bys/test/data.group_by_v1(1 hunks)api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1(1 hunks)api/python/ai/chronon/resources/gcp/compiled/teams_metadata/default/default_team_metadata(1 hunks)api/python/ai/chronon/resources/gcp/compiled/teams_metadata/team_conf/team_conf_team_metadata(1 hunks)api/python/ai/chronon/resources/gcp/compiled/teams_metadata/test/test_team_metadata(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- api/python/ai/chronon/resources/gcp/compiled/teams_metadata/test/test_team_metadata
- api/python/ai/chronon/resources/gcp/compiled/teams_metadata/default/default_team_metadata
- api/python/ai/chronon/resources/gcp/compiled/teams_metadata/team_conf/team_conf_team_metadata
- api/python/ai/chronon/resources/gcp/compiled/group_bys/test/data.group_by_v1
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: streaming_tests
- GitHub Check: analyzer_tests
- GitHub Check: join_tests
- GitHub Check: fetcher_tests
- GitHub Check: batch_tests
- GitHub Check: spark_tests
- GitHub Check: service_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_commons_tests
- GitHub Check: online_tests
- GitHub Check: api_tests
- GitHub Check: flink_tests
- GitHub Check: aggregator_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 (1)
6-8: Online-flag mismatch between parent join & child groupBy
metaData.onlineis0for the join, yet the nested group-by advertisesonline = 1. Planners that propagate the top-level flag may silently drop the online artifact or double-publish. Please confirm the intended behaviour.Also applies to: 94-99
| "operation": 13, | ||
| "argMap": { | ||
| "k": "10" | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k should be an int, not a quoted string
"k": "10" is parsed as a string. The aggregation codepath expects an integer and will fail JSON → protobuf validation.
- "k": "10"
+ "k": 10📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "operation": 13, | |
| "argMap": { | |
| "k": "10" | |
| } | |
| } | |
| "operation": 13, | |
| "argMap": { | |
| "k": 10 | |
| } | |
| } |
🤖 Prompt for AI Agents
In api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 around lines
178 to 182, the value of "k" is currently a quoted string "10", but it should be
an integer. Remove the quotes around 10 to make it a numeric value, ensuring the
JSON to protobuf validation succeeds and the aggregation codepath works
correctly.
82a0a42 to
c79a001
Compare
| @@ -0,0 +1 @@ | |||
| api/python/test/canary/compiled/ No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
symlinking to canary test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
♻️ Duplicate comments (3)
api/python/test/canary/compiled/joins/gcp/training_set.v1_test (3)
16-18: Placeholder values unresolvedSame TODO placeholders; fill before merging.
Also applies to: 34-36
6-7:onlineflag inconsistentRoot = 0, inner = 1; confirm correctness.
Also applies to: 141-141
225-227: Numeric literal fork- "k": "10" + "k": 10
🧹 Nitpick comments (12)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test (1)
112-114: Inline Dataproc JSON is bulkyConsider moving the long JSON into a separate file or templated helper to keep this spec readable.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev (2)
72-75: Hard-coded tiny parallelism
coalesce.factor,default.parallelism, andshuffle.partitionsare fixed at 10; risky for larger volumes. Consider param-driven sizing.Also applies to: 102-104
112-114: Huge Dataproc spec embedded as stringInline 1-line JSON is unreadable and un-diffable. Externalize to a separate file or multi-line format.
api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds (1)
227-229:kshould be numericJSON number preferred over quoted string.
- "k": "10" + "k": 10api/python/test/canary/compiled/joins/gcp/training_set.v1_dev_notds (3)
9-10: Nested JSON-as-string hurts readability
customJsonis double-encoded. Prefer embedding this structure directly or moving to a separate file to avoid escape-hell.
50-78: Heavy duplication across mode configs
env.commonandmodeEnvironments.uploadplus correspondingspark.*blocks are effectively copies. Consolidate in a shared template to cut drift.Also applies to: 80-109
115-116: Inline Dataproc spec is unwieldyStoring a full Dataproc cluster JSON inside a single field makes reviews and updates painful. Consider referencing an external YAML/JSON resource instead.
api/python/test/canary/compiled/joins/gcp/training_set.v1_dev (4)
124-131: Make partitioning explicitThe left event omits
partitionColumn; relying on defaults can bite later. Add"partitionColumn": "ds"for clarity.
9-10: Avoid JSON-in-JSONSame escape issue as other file – consider externalising
airflowDependencies.
50-78: Reduce config duplicationDuplicate
spark.*blocks across modes; template or inherit to ease maintenance.Also applies to: 80-109
115-116: Inline Dataproc definitionLong JSON string again; external file or shared reference recommended.
api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 (1)
25-26:ENABLE_PUBSUBset but source is Kafka
Flag may be misleading; consider a Kafka-specific toggle.Also applies to: 125-126
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.gitignore(1 hunks)api/python/test/canary/compiled/group_bys/aws/purchases.v1_dev(1 hunks)api/python/test/canary/compiled/group_bys/aws/purchases.v1_test(1 hunks)api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_pubsub_v2(1 hunks)api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1(1 hunks)api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev(1 hunks)api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev_notds(1 hunks)api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test(1 hunks)api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test_notds(1 hunks)api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev(1 hunks)api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test(1 hunks)api/python/test/canary/compiled/joins/gcp/training_set.v1_dev(1 hunks)api/python/test/canary/compiled/joins/gcp/training_set.v1_dev_notds(1 hunks)api/python/test/canary/compiled/joins/gcp/training_set.v1_test(1 hunks)api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds(1 hunks)api/python/test/canary/compiled/teams_metadata/aws/aws_team_metadata(1 hunks)api/python/test/canary/compiled/teams_metadata/default/default_team_metadata(1 hunks)api/python/test/canary/compiled/teams_metadata/gcp/gcp_team_metadata(1 hunks)api/python/test/canary/compiled/teams_metadata/test/test_team_metadata(1 hunks)
✅ Files skipped from review due to trivial changes (8)
- .gitignore
- api/python/test/canary/compiled/teams_metadata/aws/aws_team_metadata
- api/python/test/canary/compiled/teams_metadata/default/default_team_metadata
- api/python/test/canary/compiled/group_bys/aws/purchases.v1_dev
- api/python/test/canary/compiled/group_bys/aws/purchases.v1_test
- api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev_notds
- api/python/test/canary/compiled/teams_metadata/gcp/gcp_team_metadata
- api/python/test/canary/compiled/teams_metadata/test/test_team_metadata
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: service_tests
- GitHub Check: online_tests
- GitHub Check: api_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_commons_tests
- GitHub Check: aggregator_tests
- GitHub Check: flink_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: python_tests
- GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (5)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev (2)
30-31:JOB_MODEset tolocal[*]Local mode defeats cluster scaling; confirm this is intentional for canary.
140-156: Verify aggregation op codesOps 6/7/8 must exist in planner enum; otherwise parsing will fail.
Also applies to: 160-176, 178-194
api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds (2)
16-18: Placeholder values unresolved
[STREAMING-TODO]/[ONLINE-TODO]placeholders remain; pipeline will error at runtime.Also applies to: 34-36
6-7:onlineflag inconsistentTop-level job marked
online: 0, but nestedgroupByisonline: 1. Verify intent.Also applies to: 142-142
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test (1)
17-19: Unreferenced partition columnds.
Env/Spark expectds, but the query lackspartitionColumn; ensure table is actually partitioned or drop these settings.Also applies to: 50-53, 124-131
| "GCP_PROJECT_ID": "canary-443022", | ||
| "GCP_REGION": "us-central1", | ||
| "GCP_DATAPROC_CLUSTER_NAME": "zipline-canary-cluster", | ||
| "GCP_BIGTABLE_INSTANCE_ID": "zipline-canary-instance", | ||
| "FLINK_STATE_URI": "gs://zipline-warehouse-canary/flink-state", | ||
| "CLOUD_PROVIDER": "gcp", | ||
| "ENABLE_PUBSUB": "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hard-coded project/cluster IDs
Values like canary-443022 & zipline-canary-cluster will break when reused in another environment. Externalize to vars.
Also applies to: 38-45
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test around
lines 20 to 26 and 38 to 45, the project and cluster IDs such as "canary-443022"
and "zipline-canary-cluster" are hard-coded, which will cause failures in
different environments. Replace these hard-coded strings with environment
variables or configuration parameters to externalize the values, allowing the
code to adapt to different deployment contexts without modification.
| "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing", | ||
| "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class", | ||
| "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>", | ||
| "PARTITION_COLUMN": "ds", | ||
| "PARTITION_FORMAT": "yyyy-MM-dd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO placeholders still present
HADOOP_DIR, CHRONON_ONLINE_CLASS, CHRONON_ONLINE_ARGS carry [STREAMING-TODO]/[ONLINE-TODO]. Replace or purge before shipping.
Also applies to: 32-37
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test around
lines 14 to 18 and 32 to 37, the configuration values for HADOOP_DIR,
CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS still contain placeholder tags
like [STREAMING-TODO] and [ONLINE-TODO]. Replace these placeholders with actual
valid paths, class names, and argument strings relevant to your environment or
remove these entries if they are not needed before finalizing the code.
| "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing", | ||
| "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class", | ||
| "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODO placeholders before prod
HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS still have TODO placeholders. Leaving these will break runtime discovery.
Also applies to: 32-34
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev at lines 14-16
and 32-34, the environment variables HADOOP_DIR, CHRONON_ONLINE_CLASS, and
CHRONON_ONLINE_ARGS contain placeholder TODO values. Replace these placeholders
with the actual production values required for runtime discovery to function
correctly, ensuring no placeholder text remains.
| "operation": 13, | ||
| "argMap": { | ||
| "k": "10" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k value should be int
"k": "10" is a string; parsers may expect number.
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev around lines
198 to 201, the value of "k" is currently a string "10" but it should be an
integer. Change the value of "k" from the string "10" to the integer 10 to
ensure parsers expecting a number handle it correctly.
| "PARTITION_COLUMN": "ds", | ||
| "PARTITION_FORMAT": "yyyy-MM-dd", | ||
| "CUSTOMER_ID": "dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partition column mismatch (ds vs notds).
PARTITION_COLUMN + Spark conf use ds, but the source and Airflow spec use notds. This will break partition pruning & scheduling.
- "PARTITION_COLUMN": "ds",
+ "PARTITION_COLUMN": "notds",
@@
- "spark.chronon.partition.column": "ds",
+ "spark.chronon.partition.column": "notds",
@@
- "partitionColumn": "notds"
+ "partitionColumn": "notds" # keep, now consistentAlso applies to: 50-53, 123-132
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test_notds around
lines 17 to 19 (and also lines 50-53, 123-132), the PARTITION_COLUMN is set to
"ds" while the source and Airflow spec use "notds", causing a partition column
mismatch. Update the PARTITION_COLUMN and any related Spark configuration to use
"notds" consistently to ensure partition pruning and scheduling work correctly.
| "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing", | ||
| "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class", | ||
| "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>", | ||
| "PARTITION_COLUMN": "ds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
PARTITION_COLUMN mismatch
Env sets ds, Spark conf expects _DATE; pick one to avoid empty partitions.
Also applies to: 50-51
🤖 Prompt for AI Agents
In
api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_pubsub_v2
at line 17 and also lines 50-51, the PARTITION_COLUMN is set to "ds" in the
environment but Spark configuration expects "_DATE". To fix this, ensure that
the PARTITION_COLUMN value matches the expected partition column name in Spark
configuration by changing "ds" to "_DATE" or vice versa, so that partitioning
works correctly and does not result in empty partitions.
| "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing", | ||
| "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class", | ||
| "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same TODO placeholders as above
Replace before CI triggers planner.
Also applies to: 32-34
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 at
lines 14-16 and 32-34, replace the placeholder TODO strings in the environment
variable values with actual valid paths, class names, and arguments required for
the Hadoop directory and Chronon online class configuration before running CI.
Ensure these values are correctly set to avoid CI failures.
| "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing", | ||
| "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class", | ||
| "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>", | ||
| "PARTITION_COLUMN": "ds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Partition column disparity
ds vs _DATE—align to prevent write/read skew.
Also applies to: 50-51
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 at
line 17 and also lines 50-51, the partition column is set to "ds" which is
inconsistent with the expected "_DATE". To fix this, update the
"PARTITION_COLUMN" value from "ds" to "_DATE" in all these locations to ensure
consistent partitioning and prevent write/read skew.
| "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing", | ||
| "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class", | ||
| "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO placeholders must be resolved before execution
HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS still contain placeholder text ([STREAMING-TODO], [ONLINE-TODO]). If this config ever leaves the test fixture path, the job will blow up at runtime. Replace or override these values, or explicitly document why they are safe to remain.
Also applies to: 32-34
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev around
lines 14 to 16 and 32 to 34, the environment variables HADOOP_DIR,
CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS contain unresolved placeholder
text like [STREAMING-TODO] and [ONLINE-TODO]. Replace these placeholders with
actual valid paths, class names, and argument strings appropriate for the
environment, or override them with correct values before execution to prevent
runtime failures. If these placeholders must remain, add explicit comments
explaining why they are safe in this context.
| "operation": 13, | ||
| "argMap": { | ||
| "k": "10" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
k should be numeric, not a quoted string
Down-stream aggregation code usually deserialises argMap values as numbers. Keep it an int to avoid parsing hiccups.
- "k": "10"
+ "k": 10Convert argMap “k” to an integer
ArgMap values are deserialized as numbers—quoting “10” may break downstream parsing.
• File: api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev
Lines: 198–201
- "k": "10"
+ "k": 10📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "operation": 13, | |
| "argMap": { | |
| "k": "10" | |
| } | |
| "operation": 13, | |
| "argMap": { | |
| "k": 10 | |
| } |
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev around
lines 198 to 201, the value of "k" in argMap is currently a string "10". Change
this value to an integer 10 by removing the quotes to ensure downstream
aggregation code correctly interprets it as a number and avoids parsing issues.
9de75a3 to
23a3d4b
Compare
da7c3ab to
84d7004
Compare
| val terminalNodeNames = Option(finalNode) | ||
| .map(_.metaData) | ||
| .map(_.name) | ||
| .map((name) => | ||
| Map( | ||
| planner.Mode.BACKFILL -> name | ||
| )) | ||
| .getOrElse(Map.empty) | ||
| .asJava |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| val terminalNodeNames = Option(finalNode) | |
| .map(_.metaData) | |
| .map(_.name) | |
| .map((name) => | |
| Map( | |
| planner.Mode.BACKFILL -> name | |
| )) | |
| .getOrElse(Map.empty) | |
| .asJava | |
| val terminalNodeNames = Option(finalNode) | |
| .map(_.metaData) | |
| .map(_.name) | |
| .map((name) => | |
| Map( | |
| planner.Mode.BACKFILL -> name | |
| )) | |
| .getOrElse(Map.empty) | |
| .asJava |
for comprehension?
81f928c to
fe36748
Compare
Summary
Checklist
Summary by CodeRabbit
New Features
Bug Fixes