feat: add planner tests #864

tchow-zlai · 2025-06-13T00:25:33Z

Summary

Symlinking the python compiled directory and checking that in.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Added a new test suite to verify join planner functionality.
- Added multiple JSON configuration files defining data processing jobs, aggregations, and training sets for AWS and GCP environments.
- Introduced new team metadata configurations for AWS, GCP, default, and test environments.
- Added resource files supporting test data and environment configurations.
Bug Fixes
- Improved error handling for JSON parsing by replacing assertions with precondition checks.

coderabbitai · 2025-06-13T00:25:39Z

Walkthrough

A new Scala test suite was added to verify join planner configurations parse and plan without exceptions. Test resources were added and linked in the Bazel build. A resource file was introduced in test resources. An assertion was replaced by a precondition check in JSON deserialization. The MonolithJoinPlanner class was converted to a case class. Numerous JSON configuration files for data pipelines, joins, and team metadata were added under api/python/test/canary/compiled/. The .gitignore was updated to remove old ignores and add a new one for GCP compiled resources.

Changes

File(s)	Change Summary
api/BUILD.bazel	Added `data` attribute with test resource dependency to `scala_test_suite` named `tests`.
api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala	Added `PlannerTest` verifying join planner builds plans without exceptions.
spark/src/test/resources/canary/compiled	Added file referencing compiled resource path for tests.
api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala	Replaced `assert` with `require` in `fromJsonStr` method for JSON equivalence check.
api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala	Converted `MonolithJoinPlanner` from class to case class; safely handle optional finalNode name.
api/python/test/canary/compiled/group_bys/aws/purchases.v1_dev	Added JSON config for AWS purchases data aggregation job.
api/python/test/canary/compiled/group_bys/aws/purchases.v1_test	Added JSON test config for AWS purchases group-by job.
api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_pubsub_v2	Added JSON config for GCP item event canary Pub/Sub aggregation job.
api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1	Added JSON config for GCP item event canary Kafka aggregation job.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev	Added JSON config for GCP purchases data aggregation job.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev_notds	Added JSON config for GCP purchases aggregation job on non-TDS source.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test	Added JSON test config for GCP purchases data pipeline.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test_notds	Added JSON config for GCP purchases test job on non-TDS source.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev	Added JSON config for GCP purchases native view aggregation job.
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test	Added JSON test config for GCP purchases native view aggregation job.
api/python/test/canary/compiled/joins/gcp/training_set.v1_dev	Added JSON config for GCP training set join pipeline.
api/python/test/canary/compiled/joins/gcp/training_set.v1_dev_notds	Added JSON config for GCP training set join pipeline on non-TDS source.
api/python/test/canary/compiled/joins/gcp/training_set.v1_test	Added JSON test config for GCP training set join pipeline.
api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds	Added JSON config for GCP training set test join pipeline on non-TDS source.
api/python/test/canary/compiled/teams_metadata/aws/aws_team_metadata	Added JSON config defining AWS team execution environment and config metadata.
api/python/test/canary/compiled/teams_metadata/default/default_team_metadata	Added JSON config defining default team execution environment and config metadata.
api/python/test/canary/compiled/teams_metadata/gcp/gcp_team_metadata	Added JSON config defining GCP team execution environment, Spark, and cluster configs.
api/python/test/canary/compiled/teams_metadata/test/test_team_metadata	Added JSON config defining test team execution environment and config settings.
.gitignore	Removed old ignored paths; added new ignore for `api/python/ai/chronon/resources/gcp/compiled/`.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner as PlannerTest
  participant Env as Environment
  participant FileSystem as File System
  participant LocalRunner
  participant MonolithJoinPlanner

  TestRunner->>Env: Read RUNFILES_DIR
  TestRunner->>FileSystem: Locate compiled join configs
  TestRunner->>LocalRunner: Parse configs into Join objects
  loop For each Join object
    TestRunner->>MonolithJoinPlanner: Create planner
    TestRunner->>MonolithJoinPlanner: buildPlan()
    MonolithJoinPlanner-->>TestRunner: No exception thrown
  end

Suggested reviewers

nikhil-zlai

Poem

In test we trust, with joins to parse,
A planner’s path, both near and far.
Resources linked, the configs found,
No errors rise, no faults abound.
With Bazel’s aid and Scala’s might,
Our plans are built, the future bright!
🛠️✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81f928c and 4f71651.

📒 Files selected for processing (1)

api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: streaming_tests
GitHub Check: join_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: cloud_aws_tests
GitHub Check: batch_tests
GitHub Check: spark_tests
GitHub Check: fetcher_tests
GitHub Check: service_tests
GitHub Check: flink_tests
GitHub Check: api_tests
GitHub Check: online_tests
GitHub Check: service_commons_tests
GitHub Check: aggregator_tests
GitHub Check: cloud_gcp_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: python_tests
GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala (1)
12-12: Mark implicit explicitly.

Add the type for clarity:
-private implicit val testPartitionSpec = PartitionSpec.daily
+private implicit val testPartitionSpec: PartitionSpec = PartitionSpec.daily

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44b0958 and 18095da.

⛔ Files ignored due to path filters (1)

spark/src/test/resources/local_data_csv/test_table_1_data.csv is excluded by !**/*.csv

📒 Files selected for processing (9)

api/BUILD.bazel (1 hunks)
api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala (1 hunks)
spark/src/test/resources/canary/compiled (1 hunks)
spark/src/test/resources/group_bys/team/example_group_by.v1 (0 hunks)
spark/src/test/resources/group_bys/team/purchases.v1 (0 hunks)
spark/src/test/resources/group_bys/team/purchases_only_conf_common.v1 (0 hunks)
spark/src/test/resources/joins/team/example_join.v1 (0 hunks)
spark/src/test/resources/joins/team/example_join_failure.v1 (0 hunks)
spark/src/test/resources/teams_metadata/default_team_metadata (0 hunks)

💤 Files with no reviewable changes (6)

spark/src/test/resources/teams_metadata/default_team_metadata
spark/src/test/resources/group_bys/team/example_group_by.v1
spark/src/test/resources/joins/team/example_join.v1
spark/src/test/resources/group_bys/team/purchases.v1
spark/src/test/resources/group_bys/team/purchases_only_conf_common.v1
spark/src/test/resources/joins/team/example_join_failure.v1

⏰ Context from checks skipped due to timeout of 90000ms (16)

GitHub Check: scala_compile_fmt_fix
GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: batch_tests
GitHub Check: spark_tests
GitHub Check: cloud_gcp_tests
GitHub Check: cloud_aws_tests
GitHub Check: fetcher_tests
GitHub Check: aggregator_tests
GitHub Check: service_commons_tests
GitHub Check: service_tests
GitHub Check: api_tests
GitHub Check: online_tests
GitHub Check: flink_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (1)

api/BUILD.bazel (1)

64-66: Ensure the test-resources target exists & is propagated.

If //spark/src/test/resources:test-resources isn’t defined with visibility = ["//visibility:public"] the test will break at runtime. Double-check the BUILD under spark/src/test/resources (or add one) so Bazel can copy the files.

spark/src/test/resources/canary/compiled

api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala

coderabbitai · 2025-06-13T00:27:50Z

api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala

+    joinPlanners
+      .foreach { planner =>
+        noException should be thrownBy {
+          planner.buildPlan
+        }
+      }


🛠️ Refactor suggestion

Assert on result, not just “no exception”.

A test that only checks for absence of exceptions can silently pass on wrong output. Capture the plan and at least assert size > 0.

- joinPlanners - .foreach { planner => - noException should be thrownBy { - planner.buildPlan - } - } + joinPlanners.foreach { planner => + val plan = planner.buildPlan + plan.nodes.size should be > 0 // or a more specific assertion + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

joinPlanners

.foreach { planner =>

noException should be thrownBy {

planner.buildPlan

}

}

joinPlanners.foreach { planner =>

val plan = planner.buildPlan

plan.nodes.size should be > 0 // or a more specific assertion

}

🤖 Prompt for AI Agents

In api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala around lines 24 to 29, the test currently only asserts that no exception is thrown when calling planner.buildPlan. Modify the test to capture the result of buildPlan and add an assertion that the resulting plan's size is greater than zero to ensure the output is valid and not empty.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala (1)

108-113: 🛠️ Refactor suggestion

Prefer assert/require semantics match the failure type

Switching to require works, but note the semantic shift:
require signals bad caller input (throws IllegalArgumentException), whereas this check validates an internal invariant (round-trip equality). For internal sanity checks assert (or Preconditions.checkState) is more idiomatic. If you keep require, rename check param to something like validateRoundTrip to avoid confusion.

🧹 Nitpick comments (1)

api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala (1)
101-104: Reuse a single ObjectMapper

new ObjectMapper() per call is fairly heavy and thread-safe once configured. Consider:
-    val mapper = new ObjectMapper()
-    mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
+    val mapper = ThriftJsonCodec.sharedMapper
and add somewhere in the object:
private val sharedMapper: ObjectMapper = new ObjectMapper()
  .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
Cuts GC churn in hot paths.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18095da and cb0aef6.

📒 Files selected for processing (2)

api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala (1 hunks)
api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/src/test/scala/ai/chronon/api/test/planner/PlannerTest.scala

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: online_tests
GitHub Check: cloud_aws_tests
GitHub Check: service_commons_tests
GitHub Check: aggregator_tests
GitHub Check: flink_tests
GitHub Check: cloud_gcp_tests
GitHub Check: service_tests
GitHub Check: api_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: streaming_tests
GitHub Check: batch_tests
GitHub Check: join_tests
GitHub Check: spark_tests
GitHub Check: groupby_tests
GitHub Check: fetcher_tests
GitHub Check: analyzer_tests
GitHub Check: enforce_triggered_workflows

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 (1)

65-69: Leftover angle-bracket placeholders
Values like <customer_id> or <partition-column-name> remain. If this file is meant for real execution (not just test fixtures), replace them to avoid runtime NPEs during env-var interpolation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54345bd and 82a0a42.

📒 Files selected for processing (5)

api/python/ai/chronon/resources/gcp/compiled/group_bys/test/data.group_by_v1 (1 hunks)
api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 (1 hunks)
api/python/ai/chronon/resources/gcp/compiled/teams_metadata/default/default_team_metadata (1 hunks)
api/python/ai/chronon/resources/gcp/compiled/teams_metadata/team_conf/team_conf_team_metadata (1 hunks)
api/python/ai/chronon/resources/gcp/compiled/teams_metadata/test/test_team_metadata (1 hunks)

✅ Files skipped from review due to trivial changes (4)

api/python/ai/chronon/resources/gcp/compiled/teams_metadata/test/test_team_metadata
api/python/ai/chronon/resources/gcp/compiled/teams_metadata/default/default_team_metadata
api/python/ai/chronon/resources/gcp/compiled/teams_metadata/team_conf/team_conf_team_metadata
api/python/ai/chronon/resources/gcp/compiled/group_bys/test/data.group_by_v1

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: batch_tests
GitHub Check: spark_tests
GitHub Check: service_tests
GitHub Check: cloud_aws_tests
GitHub Check: cloud_gcp_tests
GitHub Check: service_commons_tests
GitHub Check: online_tests
GitHub Check: api_tests
GitHub Check: flink_tests
GitHub Check: aggregator_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: python_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (1)

api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 (1)

6-8: Online-flag mismatch between parent join & child groupBy
metaData.online is 0 for the join, yet the nested group-by advertises online = 1. Planners that propagate the top-level flag may silently drop the online artifact or double-publish. Please confirm the intended behaviour.

Also applies to: 94-99

coderabbitai · 2025-06-13T21:02:26Z

api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1

+            "operation": 13,
+            "argMap": {
+              "k": "10"
+            }
+          }


⚠️ Potential issue

k should be an int, not a quoted string
"k": "10" is parsed as a string. The aggregation codepath expects an integer and will fail JSON → protobuf validation.

- "k": "10" + "k": 10

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"operation": 13,

"argMap": {

"k": "10"

}

}

"operation": 13,

"argMap": {

"k": 10

}

}

🤖 Prompt for AI Agents

In api/python/ai/chronon/resources/gcp/compiled/joins/test/data.v1 around lines 178 to 182, the value of "k" is currently a quoted string "10", but it should be an integer. Remove the quotes around 10 to make it a numeric value, ensuring the JSON to protobuf validation succeeds and the aggregation codepath works correctly.

tchow-zlai · 2025-06-13T21:12:29Z

spark/src/test/resources/canary/compiled

@@ -0,0 +1 @@
+api/python/test/canary/compiled/


symlinking to canary test files.

coderabbitai

Actionable comments posted: 14

♻️ Duplicate comments (3)

api/python/test/canary/compiled/joins/gcp/training_set.v1_test (3)
16-18: Placeholder values unresolved

Same TODO placeholders; fill before merging.

Also applies to: 34-36

6-7: online flag inconsistent

Root = 0, inner = 1; confirm correctness.

Also applies to: 141-141

225-227: Numeric literal for k
-              "k": "10"
+              "k": 10

🧹 Nitpick comments (12)

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test (1)

112-114: Inline Dataproc JSON is bulky

Consider moving the long JSON into a separate file or templated helper to keep this spec readable.

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev (2)

72-75: Hard-coded tiny parallelism

coalesce.factor, default.parallelism, and shuffle.partitions are fixed at 10; risky for larger volumes. Consider param-driven sizing.

Also applies to: 102-104

112-114: Huge Dataproc spec embedded as string

Inline 1-line JSON is unreadable and un-diffable. Externalize to a separate file or multi-line format.
api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds (1)
227-229: k should be numeric

JSON number preferred over quoted string.
-              "k": "10"
+              "k": 10
api/python/test/canary/compiled/joins/gcp/training_set.v1_dev_notds (3)

9-10: Nested JSON-as-string hurts readability

customJson is double-encoded. Prefer embedding this structure directly or moving to a separate file to avoid escape-hell.

50-78: Heavy duplication across mode configs

env.common and modeEnvironments.upload plus corresponding spark.* blocks are effectively copies. Consolidate in a shared template to cut drift.

Also applies to: 80-109

115-116: Inline Dataproc spec is unwieldy

Storing a full Dataproc cluster JSON inside a single field makes reviews and updates painful. Consider referencing an external YAML/JSON resource instead.

api/python/test/canary/compiled/joins/gcp/training_set.v1_dev (4)

124-131: Make partitioning explicit

The left event omits partitionColumn; relying on defaults can bite later. Add "partitionColumn": "ds" for clarity.

9-10: Avoid JSON-in-JSON

Same escape issue as other file – consider externalising airflowDependencies.

50-78: Reduce config duplication

Duplicate spark.* blocks across modes; template or inherit to ease maintenance.

Also applies to: 80-109

115-116: Inline Dataproc definition

Long JSON string again; external file or shared reference recommended.

api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 (1)

25-26: ENABLE_PUBSUB set but source is Kafka
Flag may be misleading; consider a Kafka-specific toggle.

Also applies to: 125-126

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a520b5 and 7b4d6b0.

📒 Files selected for processing (19)

.gitignore (1 hunks)
api/python/test/canary/compiled/group_bys/aws/purchases.v1_dev (1 hunks)
api/python/test/canary/compiled/group_bys/aws/purchases.v1_test (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_pubsub_v2 (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev_notds (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test_notds (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev (1 hunks)
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test (1 hunks)
api/python/test/canary/compiled/joins/gcp/training_set.v1_dev (1 hunks)
api/python/test/canary/compiled/joins/gcp/training_set.v1_dev_notds (1 hunks)
api/python/test/canary/compiled/joins/gcp/training_set.v1_test (1 hunks)
api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds (1 hunks)
api/python/test/canary/compiled/teams_metadata/aws/aws_team_metadata (1 hunks)
api/python/test/canary/compiled/teams_metadata/default/default_team_metadata (1 hunks)
api/python/test/canary/compiled/teams_metadata/gcp/gcp_team_metadata (1 hunks)
api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (1 hunks)

✅ Files skipped from review due to trivial changes (8)

.gitignore
api/python/test/canary/compiled/teams_metadata/aws/aws_team_metadata
api/python/test/canary/compiled/teams_metadata/default/default_team_metadata
api/python/test/canary/compiled/group_bys/aws/purchases.v1_dev
api/python/test/canary/compiled/group_bys/aws/purchases.v1_test
api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev_notds
api/python/test/canary/compiled/teams_metadata/gcp/gcp_team_metadata
api/python/test/canary/compiled/teams_metadata/test/test_team_metadata

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: service_tests
GitHub Check: online_tests
GitHub Check: api_tests
GitHub Check: cloud_aws_tests
GitHub Check: cloud_gcp_tests
GitHub Check: service_commons_tests
GitHub Check: aggregator_tests
GitHub Check: flink_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: python_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (5)

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev (2)

30-31: JOB_MODE set to local[*]

Local mode defeats cluster scaling; confirm this is intentional for canary.

140-156: Verify aggregation op codes

Ops 6/7/8 must exist in planner enum; otherwise parsing will fail.

Also applies to: 160-176, 178-194

api/python/test/canary/compiled/joins/gcp/training_set.v1_test_notds (2)

16-18: Placeholder values unresolved

[STREAMING-TODO] / [ONLINE-TODO] placeholders remain; pipeline will error at runtime.

Also applies to: 34-36

6-7: online flag inconsistent

Top-level job marked online: 0, but nested groupBy is online: 1. Verify intent.

Also applies to: 142-142

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test (1)

17-19: Unreferenced partition column ds.
Env/Spark expect ds, but the query lacks partitionColumn; ensure table is actually partitioned or drop these settings.

Also applies to: 50-53, 124-131

coderabbitai · 2025-06-13T21:38:28Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test

+          "GCP_PROJECT_ID": "canary-443022",
+          "GCP_REGION": "us-central1",
+          "GCP_DATAPROC_CLUSTER_NAME": "zipline-canary-cluster",
+          "GCP_BIGTABLE_INSTANCE_ID": "zipline-canary-instance",
+          "FLINK_STATE_URI": "gs://zipline-warehouse-canary/flink-state",
+          "CLOUD_PROVIDER": "gcp",
+          "ENABLE_PUBSUB": "true"


🛠️ Refactor suggestion

Hard-coded project/cluster IDs

Values like canary-443022 & zipline-canary-cluster will break when reused in another environment. Externalize to vars.

Also applies to: 38-45

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test around lines 20 to 26 and 38 to 45, the project and cluster IDs such as "canary-443022" and "zipline-canary-cluster" are hard-coded, which will cause failures in different environments. Replace these hard-coded strings with environment variables or configuration parameters to externalize the values, allowing the code to adapt to different deployment contexts without modification.

coderabbitai · 2025-06-13T21:38:28Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test

+          "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing",
+          "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class",
+          "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>",
+          "PARTITION_COLUMN": "ds",
+          "PARTITION_FORMAT": "yyyy-MM-dd",


⚠️ Potential issue

TODO placeholders still present

HADOOP_DIR, CHRONON_ONLINE_CLASS, CHRONON_ONLINE_ARGS carry [STREAMING-TODO]/[ONLINE-TODO]. Replace or purge before shipping.

Also applies to: 32-37

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_test around lines 14 to 18 and 32 to 37, the configuration values for HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS still contain placeholder tags like [STREAMING-TODO] and [ONLINE-TODO]. Replace these placeholders with actual valid paths, class names, and argument strings relevant to your environment or remove these entries if they are not needed before finalizing the code.

coderabbitai · 2025-06-13T21:38:28Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev

+          "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing",
+          "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class",
+          "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>",


⚠️ Potential issue

Remove TODO placeholders before prod

HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS still have TODO placeholders. Leaving these will break runtime discovery.

Also applies to: 32-34

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev at lines 14-16 and 32-34, the environment variables HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS contain placeholder TODO values. Replace these placeholders with the actual production values required for runtime discovery to function correctly, ensuring no placeholder text remains.

coderabbitai · 2025-06-13T21:38:28Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev

+      "operation": 13,
+      "argMap": {
+        "k": "10"
+      }


⚠️ Potential issue

k value should be int

"k": "10" is a string; parsers may expect number.

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_dev around lines 198 to 201, the value of "k" is currently a string "10" but it should be an integer. Change the value of "k" from the string "10" to the integer 10 to ensure parsers expecting a number handle it correctly.

coderabbitai · 2025-06-13T21:38:28Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test_notds

+          "PARTITION_COLUMN": "ds",
+          "PARTITION_FORMAT": "yyyy-MM-dd",
+          "CUSTOMER_ID": "dev",


⚠️ Potential issue

Partition column mismatch (ds vs notds).
PARTITION_COLUMN + Spark conf use ds, but the source and Airflow spec use notds. This will break partition pruning & scheduling.

- "PARTITION_COLUMN": "ds", + "PARTITION_COLUMN": "notds", @@ - "spark.chronon.partition.column": "ds", + "spark.chronon.partition.column": "notds", @@ - "partitionColumn": "notds" + "partitionColumn": "notds" # keep, now consistent

Also applies to: 50-53, 123-132

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_test_notds around lines 17 to 19 (and also lines 50-53, 123-132), the PARTITION_COLUMN is set to "ds" while the source and Airflow spec use "notds", causing a partition column mismatch. Update the PARTITION_COLUMN and any related Spark configuration to use "notds" consistently to ensure partition pruning and scheduling work correctly.

coderabbitai · 2025-06-13T21:38:29Z

api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_pubsub_v2

+          "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing",
+          "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class",
+          "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>",
+          "PARTITION_COLUMN": "ds",


🛠️ Refactor suggestion

PARTITION_COLUMN mismatch
Env sets ds, Spark conf expects _DATE; pick one to avoid empty partitions.

Also applies to: 50-51

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_pubsub_v2 at line 17 and also lines 50-51, the PARTITION_COLUMN is set to "ds" in the environment but Spark configuration expects "_DATE". To fix this, ensure that the PARTITION_COLUMN value matches the expected partition column name in Spark configuration by changing "ds" to "_DATE" or vice versa, so that partitioning works correctly and does not result in empty partitions.

coderabbitai · 2025-06-13T21:38:29Z

api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1

+          "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing",
+          "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class",
+          "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>",


⚠️ Potential issue

Same TODO placeholders as above
Replace before CI triggers planner.

Also applies to: 32-34

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 at lines 14-16 and 32-34, replace the placeholder TODO strings in the environment variable values with actual valid paths, class names, and arguments required for the Hadoop directory and Chronon online class configuration before running CI. Ensure these values are correctly set to avoid CI failures.

coderabbitai · 2025-06-13T21:38:29Z

api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1

+          "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing",
+          "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class",
+          "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>",
+          "PARTITION_COLUMN": "ds",


🛠️ Refactor suggestion

Partition column disparity
ds vs _DATE—align to prevent write/read skew.

Also applies to: 50-51

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/item_event_canary.actions_v1 at line 17 and also lines 50-51, the partition column is set to "ds" which is inconsistent with the expected "_DATE". To fix this, update the "PARTITION_COLUMN" value from "ds" to "_DATE" in all these locations to ensure consistent partitioning and prevent write/read skew.

coderabbitai · 2025-06-13T21:38:29Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev

+          "HADOOP_DIR": "[STREAMING-TODO]/path/to/folder/containing",
+          "CHRONON_ONLINE_CLASS": "[ONLINE-TODO]your.online.class",
+          "CHRONON_ONLINE_ARGS": "[ONLINE-TODO]args prefixed with -Z become constructor map for your implementation of ai.chronon.online.Api, -Zkv-host=<YOUR_HOST> -Zkv-port=<YOUR_PORT>",


⚠️ Potential issue

TODO placeholders must be resolved before execution

HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS still contain placeholder text ([STREAMING-TODO], [ONLINE-TODO]). If this config ever leaves the test fixture path, the job will blow up at runtime. Replace or override these values, or explicitly document why they are safe to remain.

Also applies to: 32-34

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev around lines 14 to 16 and 32 to 34, the environment variables HADOOP_DIR, CHRONON_ONLINE_CLASS, and CHRONON_ONLINE_ARGS contain unresolved placeholder text like [STREAMING-TODO] and [ONLINE-TODO]. Replace these placeholders with actual valid paths, class names, and argument strings appropriate for the environment, or override them with correct values before execution to prevent runtime failures. If these placeholders must remain, add explicit comments explaining why they are safe in this context.

coderabbitai · 2025-06-13T21:38:29Z

api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev

+      "operation": 13,
+      "argMap": {
+        "k": "10"
+      }


💡 Verification agent

❓ Verification inconclusive

k should be numeric, not a quoted string

Down-stream aggregation code usually deserialises argMap values as numbers. Keep it an int to avoid parsing hiccups.

- "k": "10" + "k": 10

Convert argMap “k” to an integer
ArgMap values are deserialized as numbers—quoting “10” may break downstream parsing.

• File: api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev
Lines: 198–201

- "k": "10" + "k": 10

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"operation": 13,

"argMap": {

"k": "10"

}

"operation": 13,

"argMap": {

"k": 10

}

🤖 Prompt for AI Agents

In api/python/test/canary/compiled/group_bys/gcp/purchases.v1_view_dev around lines 198 to 201, the value of "k" in argMap is currently a string "10". Change this value to an integer 10 by removing the quotes to ensure downstream aggregation code correctly interprets it as a number and avoids parsing issues.

nikhil-zlai · 2025-06-13T23:45:50Z

api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala

+    val terminalNodeNames = Option(finalNode)
+      .map(_.metaData)
+      .map(_.name)
+      .map((name) =>
+        Map(
+          planner.Mode.BACKFILL -> name
+        ))
+      .getOrElse(Map.empty)
+      .asJava


Suggested change

val terminalNodeNames = Option(finalNode)

.map(_.metaData)

.map(_.name)

.map((name) =>

Map(

planner.Mode.BACKFILL -> name

))

.getOrElse(Map.empty)

.asJava

val terminalNodeNames = Option(finalNode)

.map(_.metaData)

.map(_.name)

.map((name) =>

Map(

planner.Mode.BACKFILL -> name

))

.getOrElse(Map.empty)

.asJava

for comprehension?

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

tchow-zlai force-pushed the tchow/align-confs branch 2 times, most recently from 880a999 to 54345bd Compare June 13, 2025 20:55

tchow-zlai requested review from david-zlai, nikhil-zlai, piyush-zlai and sean-zlai June 13, 2025 21:00

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

updates

c79a001

tchow-zlai force-pushed the tchow/align-confs branch from 82a0a42 to c79a001 Compare June 13, 2025 21:05

require

bfa3bbd

tchow-zlai changed the title ~~add planner tests~~ feat: add planner tests Jun 13, 2025

tchow-zlai commented Jun 13, 2025

View reviewed changes

thomaschow added 2 commits June 13, 2025 14:33

case class

1a520b5

check in compiled metadata

7b4d6b0

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

thomaschow added 3 commits June 13, 2025 14:42

fmt

b513c9f

fmt

c6b8d19

gotta have the name

23a3d4b

tchow-zlai force-pushed the tchow/align-confs branch 2 times, most recently from 9de75a3 to 23a3d4b Compare June 13, 2025 22:57

relative

84d7004

tchow-zlai force-pushed the tchow/align-confs branch from da7c3ab to 84d7004 Compare June 13, 2025 23:30

nikhil-zlai reviewed Jun 13, 2025

View reviewed changes

nikhil-zlai approved these changes Jun 13, 2025

View reviewed changes

for comprehesion

fe36748

tchow-zlai force-pushed the tchow/align-confs branch from 81f928c to fe36748 Compare June 13, 2025 23:58

Merge branch 'main' into tchow/align-confs

4f71651

tchow-zlai merged commit e72566e into main Jun 14, 2025
48 of 58 checks passed

tchow-zlai deleted the tchow/align-confs branch June 14, 2025 02:25

This was referenced Aug 14, 2025

refactor: scala + py builds on mill #916

Merged

Chore: Create CI Team in teams.py for Canary Hub Testing ZIP-956 #1086

Closed

		@@ -0,0 +1 @@
		api/python/test/canary/compiled/ No newline at end of file

feat: add planner tests #864

feat: add planner tests #864

Uh oh!

Conversation

tchow-zlai commented Jun 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

tchow-zlai Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

nikhil-zlai Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

tchow-zlai commented Jun 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 13, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)