Skip to content

Conversation

@tchow-zlai
Copy link
Collaborator

@tchow-zlai tchow-zlai commented Jul 28, 2025

Summary

Screenshot 2025-07-28 at 10 43 40 AM

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features
    • Added multiple new staging query configurations for GCP, supporting advanced scheduling, environment, and cluster setup options.
    • Introduced new terminal staging query with dependencies on outputs from multiple staging queries.
  • Bug Fixes
    • Updated output namespace in an existing staging query configuration for improved consistency.
  • Refactor
    • Streamlined the creation of staging queries for easier maintenance and scalability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Walkthrough

Multiple new JSON configuration files for GCP staging queries (v2__0 through v6__0 and terminal_v1__0) were added, each specifying metadata, environment, Spark, cluster, scheduling, and dependency details. The Python staging query code was refactored to use a factory function for staging query creation, added versions v2v6, and updated terminal_v1 dependencies.

Changes

Cohort / File(s) Change Summary
New GCP Staging Query Configurations
.../gcp/sample_staging_query.v2__0,
.../gcp/sample_staging_query.v3__0,
.../gcp/sample_staging_query.v4__0,
.../gcp/sample_staging_query.v5__0,
.../gcp/sample_staging_query.v6__0
Added new JSON configs for staging queries v2–v6 with detailed metadata, environment, Spark, cluster, scheduling, and dependency info.
Terminal Staging Query Configuration
.../gcp/sample_staging_query.terminal_v1__0
Added new JSON config for terminal staging query, defining metadata, dependencies on v1–v6 outputs, environment, Spark, cluster, and scheduling.
Existing Staging Query Config Update
.../gcp/sample_staging_query.v1__0
Updated outputNamespace in metadata from data to sample_namespace.
Python Staging Query Refactor
.../gcp/sample_staging_query.py
Refactored to use a factory function for staging queries, added instances for v2–v6, and redefined terminal_v1 to depend on v1–v6 outputs.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant StagingQuery_v1
    participant StagingQuery_v2
    participant StagingQuery_v3
    participant StagingQuery_v4
    participant StagingQuery_v5
    participant StagingQuery_v6
    participant Terminal_v1

    Scheduler->>StagingQuery_v1: Trigger (daily)
    Scheduler->>StagingQuery_v2: Trigger (daily)
    Scheduler->>StagingQuery_v3: Trigger (daily)
    Scheduler->>StagingQuery_v4: Trigger (daily)
    Scheduler->>StagingQuery_v5: Trigger (daily)
    Scheduler->>StagingQuery_v6: Trigger (daily)
    StagingQuery_v1-->>Terminal_v1: Output table
    StagingQuery_v2-->>Terminal_v1: Output table
    StagingQuery_v3-->>Terminal_v1: Output table
    StagingQuery_v4-->>Terminal_v1: Output table
    StagingQuery_v5-->>Terminal_v1: Output table
    StagingQuery_v6-->>Terminal_v1: Output table
    Scheduler->>Terminal_v1: Trigger after dependencies
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • nikhil-zlai
  • varant-zlai
  • piyush-zlai
  • david-zlai

Poem

Staging queries line up, v1 through v6,
Each with its config, all ready to mix.
Terminal waits for their tables to land,
A daily parade, orchestrated and planned.
Refactored in Python, dependencies anew—
GCP’s pipelines march smartly on through! 🚦

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13f3d59 and 34d36e8.

📒 Files selected for processing (8)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1 hunks)
  • api/python/test/canary/staging_queries/gcp/sample_staging_query.py (2 hunks)
✅ Files skipped from review due to trivial changes (3)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0
🚧 Files skipped from review as they are similar to previous changes (5)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0
  • api/python/test/canary/staging_queries/gcp/sample_staging_query.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: streaming_tests
  • GitHub Check: api_tests
  • GitHub Check: online_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: flink_tests
  • GitHub Check: python_tests
  • GitHub Check: spark_tests
  • GitHub Check: batch_tests
  • GitHub Check: enforce_triggered_workflows
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tchow/canary-sg-complex

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1)

17-18: Same TODO placeholders as v4

These environment placeholders are duplicated across versions and need resolution.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1)

17-18: TODO placeholders replicated across all versions

Address these placeholders systematically across all staging query versions.

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (1)

31-33: TODO placeholders in training set config

Same environment placeholder issues exist in this training set configuration.

🧹 Nitpick comments (5)
api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (2)

17-18: TODO placeholders need resolution

Environment variables contain placeholder values that will likely cause runtime failures.


54-81: Extensive configuration duplication

Common and backfill configs are nearly identical. Consider extracting shared configuration.

Also applies to: 83-111

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (3)

59-63: Deduplicate bucket path config

temporary_gcs_bucket, connector_output_dataset, etc. re-appear verbatim in every mode. Move to one shared place to avoid drift.


82-111: Backfill config duplicates “common”

The block repeats ~30 keys unchanged. If identical, omit and inherit from common.


118-118: Inline Dataproc JSON hurts readability

A >600-char escaped string is painful to diff. Store cluster spec in its own file and reference it.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c140aa and 13f3d59.

📒 Files selected for processing (11)
  • api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (1 hunks)
  • api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1 hunks)
  • api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (3 hunks)
  • api/python/test/canary/staging_queries/gcp/sample_staging_query.py (2 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.
api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (3)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:4-5
Timestamp: 2025-01-15T21:00:45.383Z
Learning: Security suggestions about using environment variables for project IDs and datasets in additional-confs.yaml were deemed non-critical by the team, as these values may be temporarily hardcoded for development purposes.

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: piyush-zlai
PR: #135
File: quickstart/README.md:11-13
Timestamp: 2024-12-18T16:08:20.688Z
Learning: In the “quickstart” Docker environment for Bigtable, GCP credentials aren’t required because the system is using a Bigtable emulator.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v6__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v2__0 (2)

Learnt from: david-zlai
PR: #222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse" should remain in the main additional-confs.yaml file, not in dev-specific configs.

Learnt from: chewy-zlai
PR: #30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:60-61
Timestamp: 2024-10-08T16:18:45.669Z
Learning: The JSON files in 'api/py/test/sample/production/' are automatically generated; avoid suggesting manual changes to them.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: groupby_tests
  • GitHub Check: service_tests
  • GitHub Check: api_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: online_tests
  • GitHub Check: batch_tests
  • GitHub Check: flink_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (10)
api/python/test/canary/compiled/teams_metadata/test/test_team_metadata (1)

20-56: LGTM - Resource allocation swap makes sense.

Environment mode swap aligns backfill with higher resource requirements and upload with lighter configuration.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 (1)

6-6: LGTM - Namespace update maintains consistency.

Output namespace aligned with other staging query versions.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v3__0 (1)

1-143: LGTM - Well-structured staging query configuration.

Comprehensive GCP configuration follows established patterns for the v3 staging query.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v5__0 (1)

1-143: LGTM - Consistent v5 staging query configuration.

Configuration maintains pattern consistency across staging query versions.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.terminal_v1__0 (2)

11-11: LGTM - Comprehensive dependency orchestration.

Airflow dependencies properly reference all 6 staging query versions.


127-212: LGTM - Well-structured dependency chain.

Table dependencies correctly reference all staging query outputs with proper offsets.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v4__0 (1)

118-118: Service account pattern is acceptable

The hardcoded service account follows the expected terraform-provisioned pattern for this project.

api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1_hub__0 (1)

6-6: Output namespace differs from other versions

This uses "data" while v4 uses "sample_namespace". Verify this is intentional.

api/python/test/canary/compiled/joins/gcp/training_set.v1_hub__0 (2)

198-261: Well-structured aggregation pipeline

The windowed aggregations (sum, count, average) over multiple time periods and last-k operation are properly configured.


7-19: Ignore identical column hashes in canary output

The hashing algorithm only fingerprints the aggregation logic and input column, not the window parameter—so sum/count/average over 3d/14d/30d all collapse to the same hash by design. No change needed.

Likely an incorrect or invalid review comment.

Comment on lines +13 to +24
def get_staging_query():
return StagingQuery(
query=query,
start_partition="2020-03-01",
name="sample_staging_query",
output_namespace="sample_namespace",
table_properties={"sample_config_json": """{"sample_key": "sample value"}"""},
dependencies=[
TableDependency(table=get_join_output_table_name(training_set.v1_test, True), partition_column="ds", offset=1)
],
version=0,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Factory function creates identical queries

All staging queries are identical. Consider parameterizing the factory or clarify the purpose of multiple identical queries.

🤖 Prompt for AI Agents
In api/python/test/canary/staging_queries/gcp/sample_staging_query.py around
lines 13 to 24, the factory function get_staging_query creates identical staging
queries without variation. To fix this, modify the function to accept parameters
that customize key attributes such as query, start_partition, name,
output_namespace, or dependencies, allowing creation of distinct queries.
Alternatively, add comments clarifying why multiple identical queries are needed
if parameterization is not intended.

Comment on lines +26 to +31
v1 = get_staging_query()
v2 = get_staging_query()
v3 = get_staging_query()
v4 = get_staging_query()
v5 = get_staging_query()
v6 = get_staging_query()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Six identical staging query instances

Creating multiple identical instances may not provide the intended canary testing benefits.


🏁 Script executed:

#!/bin/bash
# Verify if there are differences in the compiled JSON configs for v1-v6
for i in {1..6}; do
  echo "=== v${i} differences ==="
  if [ $i -gt 1 ]; then
    diff api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v1__0 \
         api/python/test/canary/compiled/staging_queries/gcp/sample_staging_query.v${i}__0 || echo "Files differ"
  fi
done

Length of output: 2065


Differentiate staging query definitions

Shell verification shows each get_staging_query() call yields identical JSON except for the "name" attribute. To leverage canary testing effectively:

  • Refactor get_staging_query() to accept parameters (filters, partitions, payloads, etc.) so that each query truly differs.
  • Otherwise, if only unique names are required, consolidate to a single call.

Pinpointed file:
• api/python/test/canary/staging_queries/gcp/sample_staging_query.py (lines 26–31)

🤖 Prompt for AI Agents
In api/python/test/canary/staging_queries/gcp/sample_staging_query.py around
lines 26 to 31, multiple calls to get_staging_query() produce nearly identical
JSON except for the "name" attribute, which limits the effectiveness of canary
testing. Refactor get_staging_query() to accept parameters such as filters,
partitions, or payloads to generate truly distinct queries for each call.
Alternatively, if only unique names are necessary, replace the multiple calls
with a single call and adjust the code accordingly.

Comment on lines +40 to +45
TableDependency(table=get_staging_query_output_table_name(v1, True), partition_column="ds", offset=1),
TableDependency(table=get_staging_query_output_table_name(v2, True), partition_column="ds", offset=1),
TableDependency(table=get_staging_query_output_table_name(v3, True), partition_column="ds", offset=1),
TableDependency(table=get_staging_query_output_table_name(v4, True), partition_column="ds", offset=1),
TableDependency(table=get_staging_query_output_table_name(v5, True), partition_column="ds", offset=1),
TableDependency(table=get_staging_query_output_table_name(v6, True), partition_column="ds", offset=1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Terminal query dependency explosion

Changed from 1 dependency to 6 identical dependencies. This increases complexity without clear benefit.

🤖 Prompt for AI Agents
In api/python/test/canary/staging_queries/gcp/sample_staging_query.py around
lines 40 to 45, the code adds six identical TableDependency entries, which
unnecessarily increases complexity. Refactor by consolidating these dependencies
into a single TableDependency instance or reduce the number of dependencies to
only those that are distinct and necessary, avoiding duplication.

Copy link
Contributor

@sean-zlai sean-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomaschow. Here is a thread after running 11/1 - 11/30 and seeing some anomalies. @david-zlai and @kumar-zlai are investigating (not likely config related but more likely workflow/API).

image

@tchow-zlai tchow-zlai force-pushed the tchow/canary-sg-complex branch from 13f3d59 to 34d36e8 Compare July 28, 2025 21:18
@tchow-zlai tchow-zlai merged commit 13e0d4e into main Jul 28, 2025
25 of 27 checks passed
@tchow-zlai tchow-zlai deleted the tchow/canary-sg-complex branch July 28, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants