feat: complex stagingquery canary #1003

tchow-zlai · 2025-07-28T17:54:38Z

Summary

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Added multiple new staging query configurations for GCP, supporting advanced scheduling, environment, and cluster setup options.
- Introduced new terminal staging query with dependencies on outputs from multiple staging queries.
Bug Fixes
- Updated output namespace in an existing staging query configuration for improved consistency.
Refactor
- Streamlined the creation of staging queries for easier maintenance and scalability.

coderabbitai · 2025-07-28T17:54:47Z

Walkthrough

Multiple new JSON configuration files for GCP staging queries (v2__0 through v6__0 and terminal_v1__0) were added, each specifying metadata, environment, Spark, cluster, scheduling, and dependency details. The Python staging query code was refactored to use a factory function for staging query creation, added versions v2–v6, and updated terminal_v1 dependencies.

Changes

Cohort / File(s)	Change Summary
New GCP Staging Query Configurations `.../gcp/sample_staging_query.v2__0`, `.../gcp/sample_staging_query.v3__0`, `.../gcp/sample_staging_query.v4__0`, `.../gcp/sample_staging_query.v5__0`, `.../gcp/sample_staging_query.v6__0`	Added new JSON configs for staging queries v2–v6 with detailed metadata, environment, Spark, cluster, scheduling, and dependency info.
Terminal Staging Query Configuration `.../gcp/sample_staging_query.terminal_v1__0`	Added new JSON config for terminal staging query, defining metadata, dependencies on v1–v6 outputs, environment, Spark, cluster, and scheduling.
Existing Staging Query Config Update `.../gcp/sample_staging_query.v1__0`	Updated `outputNamespace` in metadata from `data` to `sample_namespace`.
Python Staging Query Refactor `.../gcp/sample_staging_query.py`	Refactored to use a factory function for staging queries, added instances for v2–v6, and redefined `terminal_v1` to depend on v1–v6 outputs.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant StagingQuery_v1
    participant StagingQuery_v2
    participant StagingQuery_v3
    participant StagingQuery_v4
    participant StagingQuery_v5
    participant StagingQuery_v6
    participant Terminal_v1

    Scheduler->>StagingQuery_v1: Trigger (daily)
    Scheduler->>StagingQuery_v2: Trigger (daily)
    Scheduler->>StagingQuery_v3: Trigger (daily)
    Scheduler->>StagingQuery_v4: Trigger (daily)
    Scheduler->>StagingQuery_v5: Trigger (daily)
    Scheduler->>StagingQuery_v6: Trigger (daily)
    StagingQuery_v1-->>Terminal_v1: Output table
    StagingQuery_v2-->>Terminal_v1: Output table
    StagingQuery_v3-->>Terminal_v1: Output table
    StagingQuery_v4-->>Terminal_v1: Output table
    StagingQuery_v5-->>Terminal_v1: Output table
    StagingQuery_v6-->>Terminal_v1: Output table
    Scheduler->>Terminal_v1: Trigger after dependencies

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

nikhil-zlai
varant-zlai
piyush-zlai
david-zlai

Poem

Staging queries line up, v1 through v6,
Each with its config, all ready to mix.
Terminal waits for their tables to land,
A daily parade, orchestrated and planned.
Refactored in Python, dependencies anew—
GCP’s pipelines march smartly on through! 🚦

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13f3d59 and 34d36e8.

📒 Files selected for processing (8)

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1 hunks)
api/python/test/canary/staging_queries/gcp/sample_staging_query.py (2 hunks)

✅ Files skipped from review due to trivial changes (3)

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0

🚧 Files skipped from review as they are similar to previous changes (5)

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0
api/python/test/canary/staging_queries/gcp/sample_staging_query.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: service_commons_tests
GitHub Check: service_tests
GitHub Check: cloud_gcp_tests
GitHub Check: cloud_aws_tests
GitHub Check: join_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: streaming_tests
GitHub Check: api_tests
GitHub Check: online_tests
GitHub Check: fetcher_tests
GitHub Check: aggregator_tests
GitHub Check: flink_tests
GitHub Check: python_tests
GitHub Check: spark_tests
GitHub Check: batch_tests
GitHub Check: enforce_triggered_workflows

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tchow/canary-sg-complex

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1)

17-18: Same TODO placeholders as v4

These environment placeholders are duplicated across versions and need resolution.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1)

17-18: TODO placeholders replicated across all versions

Address these placeholders systematically across all staging query versions.

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (1)

31-33: TODO placeholders in training set config

Same environment placeholder issues exist in this training set configuration.

🧹 Nitpick comments (5)

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (2)

17-18: TODO placeholders need resolution

Environment variables contain placeholder values that will likely cause runtime failures.

54-81: Extensive configuration duplication

Common and backfill configs are nearly identical. Consider extracting shared configuration.

Also applies to: 83-111

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (3)

59-63: Deduplicate bucket path config

temporary_gcs_bucket, connector_output_dataset, etc. re-appear verbatim in every mode. Move to one shared place to avoid drift.

82-111: Backfill config duplicates “common”

The block repeats ~30 keys unchanged. If identical, omit and inherit from common.

118-118: Inline Dataproc JSON hurts readability

A >600-char escaped string is painful to diff. Store cluster spec in its own file and reference it.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c140aa and 13f3d59.

📒 Files selected for processing (11)

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (1 hunks)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1 hunks)
api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (3 hunks)
api/python/test/canary/staging_queries/gcp/sample_staging_query.py (2 hunks)

🧰 Additional context used

🧠 Learnings (10)

📓 Common learnings

Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

Learnt from: chewy-zlai
PR: zipline-ai/chronon#789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.

api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (3)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:4-5
Timestamp: 2025-01-15T21:00:45.383Z
Learning: Security suggestions about using environment variables for project IDs and datasets in additional-confs.yaml were deemed non-critical by the team, as these values may be temporarily hardcoded for development purposes.

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: piyush-zlai
PR: #135
File: quickstart/README.md:11-13
Timestamp: 2024-12-18T16:08:20.688Z
Learning: In the “quickstart” Docker environment for Bigtable, GCP credentials aren’t required because the system is using a Bigtable emulator.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: cloud_gcp_tests
GitHub Check: cloud_aws_tests
GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: service_commons_tests
GitHub Check: groupby_tests
GitHub Check: service_tests
GitHub Check: api_tests
GitHub Check: aggregator_tests
GitHub Check: online_tests
GitHub Check: batch_tests
GitHub Check: flink_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: python_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (10)

api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (1)

20-56: LGTM - Resource allocation swap makes sense.

Environment mode swap aligns backfill with higher resource requirements and upload with lighter configuration.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 (1)

6-6: LGTM - Namespace update maintains consistency.

Output namespace aligned with other staging query versions.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (1)

1-143: LGTM - Well-structured staging query configuration.

Comprehensive GCP configuration follows established patterns for the v3 staging query.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (1)

1-143: LGTM - Consistent v5 staging query configuration.

Configuration maintains pattern consistency across staging query versions.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (2)

11-11: LGTM - Comprehensive dependency orchestration.

Airflow dependencies properly reference all 6 staging query versions.

127-212: LGTM - Well-structured dependency chain.

Table dependencies correctly reference all staging query outputs with proper offsets.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (1)

118-118: Service account pattern is acceptable

The hardcoded service account follows the expected terraform-provisioned pattern for this project.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1)

6-6: Output namespace differs from other versions

This uses "data" while v4 uses "sample_namespace". Verify this is intentional.

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (2)

198-261: Well-structured aggregation pipeline

The windowed aggregations (sum, count, average) over multiple time periods and last-k operation are properly configured.

7-19: Ignore identical column hashes in canary output

The hashing algorithm only fingerprints the aggregation logic and input column, not the window parameter—so sum/count/average over 3d/14d/30d all collapse to the same hash by design. No change needed.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-07-28T17:58:08Z

api/python/test/canary/staging_queries/gcp/sample_staging_query.py

+def get_staging_query():
+    return StagingQuery(
+        query=query,
+        start_partition="2020-03-01",
+        name="sample_staging_query",
+        output_namespace="sample_namespace",
+        table_properties={"sample_config_json": """{"sample_key": "sample value"}"""},
+        dependencies=[
+            TableDependency(table=get_join_output_table_name(training_set.v1_test, True), partition_column="ds", offset=1)
+        ],
+        version=0,
+    )


🛠️ Refactor suggestion

Factory function creates identical queries

All staging queries are identical. Consider parameterizing the factory or clarify the purpose of multiple identical queries.

🤖 Prompt for AI Agents

In api/python/test/canary/staging_queries/gcp/sample_staging_query.py around lines 13 to 24, the factory function get_staging_query creates identical staging queries without variation. To fix this, modify the function to accept parameters that customize key attributes such as query, start_partition, name, output_namespace, or dependencies, allowing creation of distinct queries. Alternatively, add comments clarifying why multiple identical queries are needed if parameterization is not intended.

coderabbitai · 2025-07-28T17:58:08Z

api/python/test/canary/staging_queries/gcp/sample_staging_query.py

+v1 = get_staging_query()
+v2 = get_staging_query()
+v3 = get_staging_query()
+v4 = get_staging_query()
+v5 = get_staging_query()
+v6 = get_staging_query()


💡 Verification agent

🧩 Analysis chain

Six identical staging query instances

Creating multiple identical instances may not provide the intended canary testing benefits.

🏁 Script executed:

#!/bin/bash # Verify if there are differences in the compiled JSON configs for v1-v6 for i in {1..6}; do echo "=== v${i} differences ===" if [ $i -gt 1 ]; then diff api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 \ api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v${i}__0 || echo "Files differ" fi done

Length of output: 2065

Differentiate staging query definitions

Shell verification shows each get_staging_query() call yields identical JSON except for the "name" attribute. To leverage canary testing effectively:

Refactor get_staging_query() to accept parameters (filters, partitions, payloads, etc.) so that each query truly differs.

Otherwise, if only unique names are required, consolidate to a single call.

Pinpointed file:
• api/python/test/canary/staging_queries/gcp/sample_staging_query.py (lines 26–31)

🤖 Prompt for AI Agents

In api/python/test/canary/staging_queries/gcp/sample_staging_query.py around lines 26 to 31, multiple calls to get_staging_query() produce nearly identical JSON except for the "name" attribute, which limits the effectiveness of canary testing. Refactor get_staging_query() to accept parameters such as filters, partitions, or payloads to generate truly distinct queries for each call. Alternatively, if only unique names are necessary, replace the multiple calls with a single call and adjust the code accordingly.

coderabbitai · 2025-07-28T17:58:08Z

api/python/test/canary/staging_queries/gcp/sample_staging_query.py

+        TableDependency(table=get_staging_query_output_table_name(v1, True), partition_column="ds", offset=1),
+        TableDependency(table=get_staging_query_output_table_name(v2, True), partition_column="ds", offset=1),
+        TableDependency(table=get_staging_query_output_table_name(v3, True), partition_column="ds", offset=1),
+        TableDependency(table=get_staging_query_output_table_name(v4, True), partition_column="ds", offset=1),
+        TableDependency(table=get_staging_query_output_table_name(v5, True), partition_column="ds", offset=1),
+        TableDependency(table=get_staging_query_output_table_name(v6, True), partition_column="ds", offset=1),


🛠️ Refactor suggestion

Terminal query dependency explosion

Changed from 1 dependency to 6 identical dependencies. This increases complexity without clear benefit.

🤖 Prompt for AI Agents

In api/python/test/canary/staging_queries/gcp/sample_staging_query.py around lines 40 to 45, the code adds six identical TableDependency entries, which unnecessarily increases complexity. Refactor by consolidating these dependencies into a single TableDependency instance or reduce the number of dependencies to only those that are distinct and necessary, avoiding duplication.

sean-zlai

Thanks @thomaschow. Here is a thread after running 11/1 - 11/30 and seeing some anomalies. @david-zlai and @kumar-zlai are investigating (not likely config related but more likely workflow/API).

tchow-zlai requested review from chewy-zlai, david-zlai and sean-zlai July 28, 2025 17:54

coderabbitai bot reviewed Jul 28, 2025

View reviewed changes

david-zlai approved these changes Jul 28, 2025

View reviewed changes

sean-zlai approved these changes Jul 28, 2025

View reviewed changes

feat: complex stagingquery canary

34d36e8

tchow-zlai force-pushed the tchow/canary-sg-complex branch from 13f3d59 to 34d36e8 Compare July 28, 2025 21:18

tchow-zlai merged commit 13e0d4e into main Jul 28, 2025
25 of 27 checks passed

tchow-zlai deleted the tchow/canary-sg-complex branch July 28, 2025 21:46

This was referenced Jul 31, 2025

fix: rename sample stagingqueries to something else #1023

Merged

feat: modify staging queries for demo purposes #1026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: complex stagingquery canary #1003

feat: complex stagingquery canary #1003

Uh oh!

tchow-zlai commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 28, 2025

Uh oh!

coderabbitai bot Jul 28, 2025

Uh oh!

coderabbitai bot Jul 28, 2025

Uh oh!

sean-zlai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: complex stagingquery canary #1003

feat: complex stagingquery canary #1003

Uh oh!

Conversation

tchow-zlai commented Jul 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

sean-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tchow-zlai commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)