Skip to content

Conversation

@nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented Mar 4, 2025

Summary

we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready.

added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Introduced a consolidated execution configuration that simplifies scheduling, environment, and backfill settings.
  • Refactor

    • Streamlined metadata and dependency management in data processing and join/group-by workflows.
    • Enhanced error handling for file operations to better manage missing directories and improve feedback.
  • Frontend

    • Updated the metadata display to now show scheduling details, providing clearer overview information.
  • Tests

    • Revised test cases to reflect the new naming conventions and metadata structure.
    • Updated assertions to simplify the access of metadata tags.
    • Removed obsolete tests related to dependency management.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2025

Walkthrough

This PR updates several API functions by removing deprecated parameters and introducing new ones to encapsulate execution details via an ExecutionInfo object. Internal metadata handling is streamlined across Python modules, Thrift definitions, and production configuration JSONs. Dependency-related functions are eliminated, and method signatures are refactored in both the Python and Scala codebases. Test files and frontend components are updated accordingly, while orchestration modules see further cleanup of unused methods and consolidated imports.

Changes

File(s) Change Summary
api/py/ai/chronon/{group_by.py, join.py, parse_teams.py, types.py, utils.py, cli/compile/compiler.py, repo/compile.py, repo/run.py} Updated function signatures; removed parameters (e.g., online, production, dependencies, custom_json) and added new ones (derivations, conf, env_vars, step_days, disable_historical_backfill); integrated ExecutionInfo into metadata; improved error handling in compiler.
api/thrift/{api.thrift, common.thrift, orchestration.thrift} Renumbered and extended MetaData (added sourceFile and executionInfo); introduced new structs (EnvironmentVariables, ConfigProperties, TableDependency, ExecutionInfo) and enum (KvScanStrategy); removed legacy dependency structures.
api/py/test/sample/production/** Simplified JSON configurations by removing customJson, dependencies, and offlineSchedule; added structured executionInfo (with scheduleCron and historicalBackfill) and consistencyCheck; updated naming conventions.
api/py/test/{sample/group_bys/*, sample/joins/*, sample/staging_queries/*, test/*} Adjusted GroupBy/Join instantiations by removing extra parameters; streamlined test assertions and table naming; eliminated tests for dependency propagation.
api/src/main/scala/**, orchestration/src/main/scala/**, spark/src/main/scala/ai/chronon/spark/GroupBy.scala Refactored historical backfill logic to use ExecutionInfo; consolidated or removed redundant imports and commented out unused dependency/output methods.
frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte Replaced offlineSchedule with executionInfo in the metadata property display.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Compiler as Compiler
    User->>Compiler: Trigger compile process
    Compiler->>OS: Check for staging directory existence
    alt Staging exists
        Compiler->>OS: If output dir exists, remove it
        Compiler->>OS: Move staging directory to output directory
    else Staging missing
        Compiler->>User: Log warning about missing staging directory
    end
Loading

Possibly related PRs

  • refactor: split fetcher logic into multiple files #425: The changes in the main PR, which involve significant modifications to the GroupBy function and its parameters, are related to the changes in the retrieved PR, where the GroupByFetcher class is introduced to manage fetching GroupBy data, indicating a direct connection in functionality and parameter handling.
  • Join and Group Bys for POC #30: The changes in the main PR, which involve significant modifications to the GroupBy function and its parameters, are related to the retrieved PR as both involve the GroupBy functionality, specifically in how grouping operations are defined and utilized in different contexts. Both PRs modify the handling of parameters and the structure of GroupBy instances.

Suggested reviewers

  • piyush-zlai
  • varant-zlai

Poem

In our code a change unfolds,
Old parameters now gently old,
ExecutionInfo lights the night,
Metadata now clean and bright,
With each refactor, a fresh delight!
Code and soul both take flight!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
api/py/ai/chronon/join.py (1)

532-536: Remove unused code.

custom_json is created but never used.

-    custom_json = {"check_consistency": check_consistency, "lag": lag}
🧰 Tools
🪛 Ruff (0.8.2)

532-532: Local variable custom_json is assigned to but never used

Remove assignment to unused variable custom_json

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 74165a4 and 837d202.

📒 Files selected for processing (4)
  • api/py/ai/chronon/group_by.py (3 hunks)
  • api/py/ai/chronon/join.py (4 hunks)
  • api/py/ai/chronon/utils.py (0 hunks)
  • api/thrift/api.thrift (1 hunks)
💤 Files with no reviewable changes (1)
  • api/py/ai/chronon/utils.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
🔇 Additional comments (6)
api/thrift/api.thrift (1)

279-279: Clean addition.

The optional stepDays field is properly defined.

api/py/ai/chronon/group_by.py (2)

444-446: Good API cleanup.

Replaced legacy parameters with more structured configuration approach.


655-657: Config properly applied.

New parameters correctly passed to metadata.

api/py/ai/chronon/join.py (3)

204-205: Clean removal.

Simplified ExternalSource signature by removing custom_json.


414-416: Good API cleanup.

Consistent with GroupBy changes.


572-574: Config properly applied.

New parameters correctly passed to metadata.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)

7-7: Avoid wildcard imports.

Replace with explicit imports for better code clarity.

-from ai.chronon.types import *
+from ai.chronon.types import Join, JoinPart
🧰 Tools
🪛 Ruff (0.8.2)

7-7: from ai.chronon.types import * used; unable to detect undefined names

(F403)

orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala (1)

8-9: Methods commented out rather than removed.

Consider removing these methods completely rather than commenting them out since they're no longer needed according to the PR objectives.

api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)

1-2: Avoid wildcard imports.

Replace wildcard imports with explicit imports for better code clarity and maintenance.

-from ai.chronon.types import *
+from ai.chronon.types import GroupBy, JoinSource, Query, selects, Aggregation, Operation, Accuracy
🧰 Tools
🪛 Ruff (0.8.2)

1-1: from ai.chronon.types import * used; unable to detect undefined names

(F403)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1)

1-2: Avoid wildcard imports.

Replace wildcard imports with explicit imports for better code clarity and maintenance.

-from ai.chronon.types import *
+from ai.chronon.types import GroupBy, Aggregation, Operation
🧰 Tools
🪛 Ruff (0.8.2)

2-2: from ai.chronon.types import * used; unable to detect undefined names

(F403)

api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (1)

5-9: Consider using booleans for online/production flags.
Might simplify checks if these fields are actually boolean.

api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (1)

9-10: Consider booleans for online/production.
If these are meant to be true/false, booleans may be clearer.

api/src/main/scala/ai/chronon/api/Extensions.scala (1)

647-647: Extract duplicated logic.
The same TODO repeats; consider a shared helper.

api/thrift/common.thrift (1)

87-106: Well-designed ExecutionInfo structure

The ExecutionInfo structure elegantly combines the various execution parameters that were previously scattered across multiple fields.

Consider adding documentation comments to the stepDays field to clarify its purpose, especially as it's a new addition mentioned in the PR objectives.

api/thrift/api.thrift (1)

285-286: Comment style inconsistency.

Hash symbol used instead of double slash for comment in Thrift file.

-    # information that needs to be present on every physical node
+    // information that needs to be present on every physical node
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 837d202 and 628de7b.

📒 Files selected for processing (69)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (3 hunks)
  • api/py/ai/chronon/group_by.py (3 hunks)
  • api/py/ai/chronon/join.py (6 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/run.py (25 hunks)
  • api/py/ai/chronon/types.py (2 hunks)
  • api/py/ai/chronon/utils.py (2 hunks)
  • api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/event_sample_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py (0 hunks)
  • api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/sample_group_by.py (3 hunks)
  • api/py/test/sample/joins/sample_team/sample_chaining_join.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_join.py (3 hunks)
  • api/py/test/sample/joins/sample_team/sample_join_from_module.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_label_join.py (2 hunks)
  • api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_online_join.py (0 hunks)
  • api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1 (3 hunks)
  • api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1 (1 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (4 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.consistency_check (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.never (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_label_join.v1 (5 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (5 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (4 hunks)
  • api/py/test/sample/staging_queries/sample_team/sample_staging_query.py (0 hunks)
  • api/py/test/test_group_by.py (3 hunks)
  • api/py/test/test_join.py (0 hunks)
  • api/py/test/test_run.py (3 hunks)
  • api/py/test/test_utils.py (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (0 hunks)
  • api/thrift/api.thrift (3 hunks)
  • api/thrift/common.thrift (1 hunks)
  • api/thrift/orchestration.thrift (2 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala (2 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (3 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1 hunks)
💤 Files with no reviewable changes (5)
  • api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py
  • api/py/test/sample/staging_queries/sample_team/sample_staging_query.py
  • api/py/test/sample/joins/sample_team/sample_online_join.py
  • api/py/test/test_join.py
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
✅ Files skipped from review due to trivial changes (6)
  • api/py/ai/chronon/repo/compile.py
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py

7-7: from ai.chronon.types import * used; unable to detect undefined names

(F403)


9-9: Join may be undefined, or defined from star imports

(F405)


12-12: JoinPart may be undefined, or defined from star imports

(F405)


16-16: JoinPart may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/chaining_group_by.py

1-1: from ai.chronon.types import * used; unable to detect undefined names

(F403)


4-4: GroupBy may be undefined, or defined from star imports

(F405)


5-5: JoinSource may be undefined, or defined from star imports

(F405)


7-7: Query may be undefined, or defined from star imports

(F405)


8-8: selects may be undefined, or defined from star imports

(F405)


18-18: Aggregation may be undefined, or defined from star imports

(F405)


18-18: Operation may be undefined, or defined from star imports

(F405)


20-20: Accuracy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py

2-2: from ai.chronon.types import * used; unable to detect undefined names

(F403)


4-4: GroupBy may be undefined, or defined from star imports

(F405)


8-8: Aggregation may be undefined, or defined from star imports

(F405)


10-10: Operation may be undefined, or defined from star imports

(F405)


17-17: GroupBy may be undefined, or defined from star imports

(F405)

api/py/ai/chronon/repo/run.py

894-894: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (143)
api/py/test/test_run.py (3)

48-48: Formatting improvement.

Added blank line after context function.


220-222: Reformatted assertion for readability.

Multiline format improves readability.


232-234: Reformatted assertion for readability.

Consistent with previous formatting change.

spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1)

99-101: Improved code indentation.

Better aligned indentation for the serializer chain.

orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala (2)

3-3: Consolidated imports.

Combined imports from same package.


38-38: Simplified code by removing unnecessary Table object creation.

Direct method call is cleaner.

api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill (1)

1-59: New configuration file aligns with PR objectives.

Clean implementation of the new configuration approach with direct executionInfo support.

Key points:

  • Proper metadata structure
  • Historical backfill properly configured
  • Clear aggregation operations
  • Includes derivations with wildcard support
api/py/ai/chronon/repo/run.py (2)

69-74: Style improvement to dictionary formatting.

Improved readability with multiline structure.


344-355: Updated environment configuration to use new executionInfo structure.

Replaces deprecated modeToEnvMap with executionInfo.env approach, maintaining backward compatibility through fallback.

api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)

9-24: New parent join definition looks good.

Clean implementation of the parent join with appropriate parameters.

🧰 Tools
🪛 Ruff (0.8.2)

9-9: Join may be undefined, or defined from star imports

(F405)


12-12: JoinPart may be undefined, or defined from star imports

(F405)


16-16: JoinPart may be undefined, or defined from star imports

(F405)

api/py/ai/chronon/types.py (2)

10-10: Good addition of common types import.

Now exposes common.ttypes.


52-56: Appropriate exposure of new configuration types.

These new types support the PR goal of direct configuration support.

api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1)

10-13: Streamlined configuration with executionInfo.

Removes deprecated fields in favor of structured execution information.

api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1 (1)

1-50: New configuration file looks good.

Clean JSON structure with appropriate metadata, sources, key columns, and aggregations.

api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)

4-28: GroupBy configuration is well-structured.

Contains appropriate sources, keys, aggregations, and metadata configuration.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: GroupBy may be undefined, or defined from star imports

(F405)


5-5: JoinSource may be undefined, or defined from star imports

(F405)


7-7: Query may be undefined, or defined from star imports

(F405)


8-8: selects may be undefined, or defined from star imports

(F405)


18-18: Aggregation may be undefined, or defined from star imports

(F405)


18-18: Operation may be undefined, or defined from star imports

(F405)


20-20: Accuracy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py (2)

4-15: GroupBy configuration is well-structured.

First group by correctly aggregates with a 7-day window sum.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: GroupBy may be undefined, or defined from star imports

(F405)


8-8: Aggregation may be undefined, or defined from star imports

(F405)


10-10: Operation may be undefined, or defined from star imports

(F405)


17-22: GroupBy with null aggregation.

Confirm that setting aggregations=None is intentional and supported by the GroupBy implementation.

🧰 Tools
🪛 Ruff (0.8.2)

17-17: GroupBy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/sample_group_by.py (4)

16-16: Import condensed to single line.

Import statements now appear on a single line for cleaner code.


29-30: Added missing comma in table_properties dictionary.

Fixed formatting by adding a trailing comma after the description value.


32-32: Added online=True parameter.

Added parameter to specify online serving capability.


46-48: Derivations list reformatted.

Each Derivation now appears on its own line for improved readability.

api/py/test/sample/group_bys/sample_team/event_sample_group_by.py (1)

23-23: Simplified Aggregation configuration.

Removed redundant tags parameter, keeping only essential windows parameter.

api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1 (1)

1-47: New configuration file added.

New JSON configuration for sample_group_by_group_by with executionInfo structure replacing legacy fields, aligning with PR objectives.

api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1)

26-26: Looks good - imported variable replaces locally defined.

The change aligns with the PR's goal to remove unused API flags, importing label_part_group_by instead of defining it locally.

api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3)

12-14: Clean metadata structure with executionInfo object.

Removed customJson/dependencies in favor of executionInfo containing scheduleCron.


37-40: Consolidated execution parameters into executionInfo.

Replaced offlineSchedule with executionInfo object containing scheduleCron and historicalBackfill.


108-112: Consistent executionInfo pattern applied here too.

Same pattern as previous groupBy sections, maintaining consistency throughout the file.

api/src/main/scala/ai/chronon/api/Builders.scala (1)

290-292: Encapsulated execution settings into ExecutionInfo object.

Removed dependencies handling and moved historicalBackfill into ExecutionInfo object, aligning with PR cleanup goals.

orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala (2)

5-5: Simplified imports with consolidated statement.

Cleaner import syntax while maintaining the same functionality.


63-63: Removed redundant Table object creation.

Directly sets table variable in result.setTable(), eliminating unnecessary object instantiation.

orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala (1)

3-3:

✅ Verification successful

New import added but usage not visible

The TableDependency import was added but no implementation is visible in this snippet.


🏁 Script executed:

#!/bin/bash
# Check if TableDependency is used elsewhere in the file
cat orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala | grep -n "TableDependency"

Length of output: 200


Usage Verified: The imported TableDependency is now used in the tableDependencies method (line 9). No further action is required.

api/py/test/test_group_by.py (3)

254-254: Direct tag access

Accessing tag directly instead of through customJson.


272-272: Boolean changed to string

Same change as line 252 - boolean to string conversion.


281-281: Direct tag access

Same pattern as line 254 - directly accessing tag.

orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1)

23-23:

❓ Verification inconclusive

Method implementation removed

Replaced with ??? placeholder which throws NotImplementedError.


🏁 Script executed:

#!/bin/bash
# Check if this method is called anywhere
rg "StagingQueryNodeImpl.*parents" --type scala

Length of output: 49


Review Alert: Verify parents Usage

The initial script produced no results. To be sure no call sites exist, please run the following script to search for invocations of the .parents method across Scala files:

#!/bin/bash
# Find invocations of '.parents' in Scala files.
rg "\.parents\s*\(" --type scala
api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1 (1)

10-13: Replace offline schedule with executionInfo

Added structured executionInfo to replace removed offline scheduling.

api/py/test/sample/joins/sample_team/sample_join.py (6)

15-15: New import for environment variables structure.

Added import for EnvironmentVariables to support new environment configuration approach.


26-27: Removed tags parameter and added table properties.

Experimental tag removed as part of API cleanup. Table properties now directly configured.


29-32: Environment configuration structure updated.

Replaced env with structured env_vars using EnvironmentVariables class. Added online=True parameter.


37-39: Removed tags and updated schedule format.

Experimental tag removed and schedule string now uses double quotes for consistency.


50-50: Removed experimental tag.

Experimental tag removed as part of API cleanup.


57-57: Removed experimental tag.

Experimental tag removed as part of API cleanup.

api/py/test/sample/joins/sample_team/sample_label_join.py (2)

29-29: Import external group by instead of local definition.

Now importing label_part_group_by_2 from dedicated module, removing duplicate definition.


46-46: Updated to use imported group by.

Using imported label_part_group_by_2 instead of locally defined version.

api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (2)

9-14: Restructured execution configuration.

Removed offlineSchedule in favor of new executionInfo structure with scheduleCron. Reorganized metadata fields.


36-46: Consistent execution configuration in nested group by.

Applied same executionInfo structure to nested group by metadata, replacing custom fields and dependencies.

api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1 (1)

9-13: Standardized execution configuration.

Added structured executionInfo object with schedule and backfill settings, replacing custom fields and dependencies.

api/py/test/test_utils.py (2)

292-293: Updated table naming convention.

Table name updated to new format that includes version in namespace.


310-310: Updated join table name format.

Join table name now properly references chaining_group_by_v1.

orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (2)

15-15: Removed dependency handling.

Dependencies replaced with empty sequence, aligned with PR objective to eliminate unused dependencies.


27-27: Simplified table name setting.

Direct assignment is cleaner than creating unnecessary Table object.

api/py/ai/chronon/cli/compile/parse_teams.py (3)

4-9: Updated imports for new ExecutionInfo structure.

Added ExecutionInfo import to support the new configuration approach.


72-85: Moved configuration to ExecutionInfo structure.

Environment and configuration are now properly encapsulated in ExecutionInfo.


113-127: Improved mode map merging logic.

More robust handling when result is None, with proper deep copying.

api/py/test/sample/joins/sample_team/sample_join_from_module.py (2)

28-28: Removed unnecessary spaces.

Consistent formatting for parameter assignment.


32-37: Updated to double quotes and removed unused parameters.

Consistent string quoting and removal of additional_args/additional_env parameters.

api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (2)

12-15: Confirm zero usage for historicalBackfill.
Zero can be confusing if a boolean is expected.


84-108: Validate operation codes.
Ensure the numeric values (7, 12, etc.) align with your aggregator logic.

api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (3)

4-5: Check consistent usage of team/outputNamespace.
Looks aligned with other configurations.


12-14: Verify nothing else relies on the removed offlineSchedule.
New executionInfo block looks fine.


37-40: Check groupBy executionInfo.
Again, confirm zero for historicalBackfill if intended as a boolean.

api/src/main/scala/ai/chronon/api/Extensions.scala (1)

854-855: Guard executionInfo usage.
Handle null executionInfo gracefully when reading historicalBackfill.

api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (5)

9-14: Clean metadata structure.

Replaced various metadata flags with a consistent executionInfo object containing scheduleCron setting.


41-45: Good consistency in joinParts metadata.

Standardized execution configuration with scheduleCron and historicalBackfill.


106-110: Consistent structure maintained.

Same pattern applied to all groupBy metadata sections.


183-189: Label part metadata updated properly.

Updated name from "sample_label_group_by" to fully qualified "sample_team.label_part_group_by.label_part_group_by" and standardized execution info.


231-233: Simplified labelParts metadata.

Removed dependencies and moved scheduling to executionInfo structure.

api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by (2)

9-14: Clean metadata restructuring.

Replaced custom fields with standardized executionInfo object.


42-46: Consistent joinParts metadata pattern.

Properly configured executionInfo with both scheduleCron and historicalBackfill.

api/py/test/sample/joins/sample_team/sample_chaining_join.py (2)

20-20: Simplified import structure.

Now directly imports chaining_group_by_v1 instead of defining it locally.


24-34: Join definition simplified.

Removed additional_args and additional_env parameters, aligning with PR objective to clean up unused API flags.

api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3)

9-14: Consistent metadata structure.

Same clean pattern of replacing custom fields with executionInfo.


41-45: Properly standardized joinParts.

First groupBy correctly updated with executionInfo structure.


106-110: Consistent metadata in second groupBy.

Maintains pattern consistency throughout the configuration.

api/py/test/sample/production/joins/sample_team/sample_join.consistency_check (2)

10-16: Configuration structure improved.

Streamlined metadata with executionInfo replacing customJson and dependencies.


44-49: ExecutionInfo consistently applied.

Matches parent structure pattern.

api/py/ai/chronon/utils.py (2)

374-374: Updated execution schedule reference.

Now reads from the new executionInfo structure.


405-420: Join modes logic simplified.

Clean removal of dependencies references.

api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2)

9-14: Configuration structure improved.

Streamlined with executionInfo pattern.


42-47: Consistent structure in joinParts.

Matches parent pattern and properly configures scheduling.

api/py/test/sample/production/joins/sample_team/sample_join.never (2)

9-14: Special case handling with @Never.

Properly configured for non-execution.


42-47: Nested configuration properly structured.

Child groupBy maintains consistent pattern.

api/py/test/sample/production/joins/sample_team/sample_join.v1 (2)

12-19: Improved execution configuration structure.

The introduction of a structured executionInfo object improves configuration management with explicit environment variable handling for different execution modes.


47-52: Consistent metadata structure in group by section.

Same execution info pattern applied consistently to the group by section, with explicit historical backfill configuration.

api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join (3)

11-11: Added consistency check capability.

Added consistencyCheck flag which wasn't mentioned in other files or PR description.

Is this intended to be a common feature across all join configurations?


14-17: Standardized execution configuration.

Execution configuration now follows consistent pattern with dedicated structure.


45-48: Consistent execution info across all group by sections.

Configuration pattern is applied consistently throughout all metadata sections.

Also applies to: 110-113

api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (2)

9-9: Different online settings between metadata levels.

Main metadata has online: 0 while group by metadata has online: 1. This difference may be intentional but could lead to confusion.

Is this difference in online status between levels intentional?

Also applies to: 42-42


12-14: Simplified execution info without environment variables.

This configuration uses minimal execution info with just scheduleCron, omitting environment variables that are present in other files.

api/py/ai/chronon/group_by.py (4)

434-447: Updated GroupBy function signature to remove dependencies.

Removed legacy parameters (dependencies, env, lag) and added new parameters for direct configuration support (conf, env_vars, step_days, disable_historical_backfill).


558-576: Well-documented new parameters.

Documentation clearly explains the configuration layering priority for both conf and env_vars parameters.


637-643: Replaced dependencies list with ExecutionInfo object.

New ExecutionInfo structure consolidates execution configuration in a single object.


658-661: Updated metadata structure.

Metadata now uses the ExecutionInfo object and properly handles tags.

api/thrift/orchestration.thrift (3)

113-113: Switch to MERGE
Cleaner enum.


166-166: No functional change
Just a blank line.


174-174: Renamed lineageHash to version
Clearer naming.

api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (8)

4-5: Team & namespace updated
Looks consistent.


9-11: online/production/consistencyCheck
Confirm if this combo is intentional.


13-13: New consistencySamplePercent
Seems fine.


15-16: Daily schedule
Works for regular updates.


37-38: Renamed name/team
More explicit naming.


44-49: Execution info
Good placement.


58-59: ConsistencyCheck & daily schedule
Validate if 1 is correct.

Also applies to: 61-64


86-92: GroupBy metadata
All good.

api/py/test/sample/production/joins/sample_team/sample_label_join.v1 (7)

4-5: Team & namespace
Consistent update.


9-10: online/production
Double-check environment usage.


12-14: Added executionInfo
Nice injection of scheduling.


36-37: Meta updates
Removes old fields, good.

Also applies to: 41-45


101-102: Revised outputNamespace & executionInfo
Matches new style.

Also applies to: 106-110


183-183: Name, online, and schedule
Clear separation.

Also applies to: 185-189


216-218: Weekly schedule
Check if it conflicts with daily.

api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4)

42-45: Verify consistent historicalBackfill configuration

The executionInfo structure now includes historicalBackfill flag set to 0.


107-110: Consistent configuration in nested structures

ExecutionInfo structure consistently applied to nested GroupBy configurations.


166-169: Ensure all JobParts follow the same pattern

The consistent pattern of removing dependencies and adding executionInfo is maintained across all JobParts.


12-14:

✅ Verification successful

Confirm the replacement of dependencies with executionInfo is properly documented

The removed dependencies and customJson fields have been replaced with executionInfo structure that now handles scheduling via scheduleCron.


🏁 Script executed:

#!/bin/bash
# Check if there are any remaining references to dependencies field in other JSON files
rg -g "*.v[0-9]" -g "*.json" --json "dependencies" | jq 'select(.type == "match") | .data.path.text'

Length of output: 7453


Replacement Verified:
The production file (api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2) correctly uses the new executionInfo structure. Occurrences of "dependencies" in test files and configuration files (e.g., package locks) are expected and not relevant to this change. The removal of dependencies and customJson in favor of executionInfo is properly documented.

api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3)

12-14: Consistent metadata structure transformation

ExecutionInfo replaces former dependencies and customJson fields appropriately.


42-45: Check historicalBackfill default settings

The historical backfill is consistently disabled across configurations.


107-110: Consistent application of executionInfo pattern

The pattern is consistently applied to all nested GroupBy configurations.

api/thrift/common.thrift (3)

28-33: Well-structured environment variables grouping

Good organization of environment variables into logical categories.


35-40: Consistent structure between ConfigProperties and EnvironmentVariables

ConfigProperties mirrors EnvironmentVariables structure, maintaining consistency.


42-69: Comprehensive TableDependency structure

The TableDependency structure provides a complete set of parameters for defining table dependencies, with good documentation.

api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1 (3)

12-14: Consistent pattern of removing unused fields

The removal of dependencies and customJson is consistent with other files.


42-45: Check scheduleCron format consistency

The @daily cron format is consistently used across configurations.


107-110: Consistent configuration in final nested structure

The pattern is consistently applied to all GroupBy configurations in this file too.

api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (5)

4-5: Team and namespace now explicitly defined.

The team and namespace are now explicitly defined at the top level, ensuring consistency.


9-16: Replaced dependency management with executionInfo structure.

Dependencies and customJson fields removed, replaced with more structured executionInfo object containing scheduleCron.


38-47: Consistent metadata structure across all group by configurations.

Standardized team, namespace, and executionInfo structure replacing loose dependency configuration.


103-112: Unified execution configuration style.

Consistent use of executionInfo pattern for scheduling configuration.


162-171: Standardized configuration pattern.

All groupBy configurations now follow the same pattern with executionInfo structure.

api/thrift/api.thrift (3)

238-240: Updated metadata documentation.

Documentation clarifies that these are configuration parameters that don't affect output content.


262-284: Field renumbering for better organization.

Fields are logically grouped with numbering gaps to allow for future additions.


485-486: Simplified environment and config references.

Team now references common environment and config property structures.

api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1 (5)

3-5: Updated naming scheme.

Name updated to use more structured versioning format, with consistent team and namespace.


12-15: Standardized execution configuration.

Clear scheduleCron and historicalBackfill settings in executionInfo structure.


22-35: Consistent metadata structure.

Join metadata follows the same pattern with team, namespace, and executionInfo.


58-63: Unified groupBy configuration.

All groupBy elements now use the same configuration pattern.


120-124: Simplified configuration in nested components.

Last groupBy also updated to use executionInfo structure.

api/py/ai/chronon/join.py (6)

202-203: Simplified ExternalSource metadata.

Removed customJson parameter, streamlining the metadata structure.


292-296: Updated LabelParts to use ExecutionInfo.

Now uses common ExecutionInfo structure for scheduling configuration.


397-419: Streamlined Join parameters.

Removed dependencies, additional_args, additional_env in favor of structured conf, env_vars, and step_days parameters.


491-509: Improved parameter documentation.

Clear documentation of new parameters explaining layering priority for configuration and environment variables.


564-570: Structured execution configuration.

Creates an ExecutionInfo object with all necessary execution parameters.


578-581: Simplified metadata construction.

Uses executionInfo instead of scattered configuration parameters.

def generate_dataproc_submitter_args(
user_args: str,
job_type: DataprocJobType = DataprocJobType.SPARK,
local_files_to_upload_to_gcs: List[str] = [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix mutable default argument.

Empty list as default argument could cause unexpected behavior.

-def generate_dataproc_submitter_args(
-    user_args: str,
-    job_type: DataprocJobType = DataprocJobType.SPARK,
-    local_files_to_upload_to_gcs: List[str] = [],
-):
+def generate_dataproc_submitter_args(
+    user_args: str,
+    job_type: DataprocJobType = DataprocJobType.SPARK,
+    local_files_to_upload_to_gcs: List[str] = None,
+):
+    if local_files_to_upload_to_gcs is None:
+        local_files_to_upload_to_gcs = []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local_files_to_upload_to_gcs: List[str] = [],
def generate_dataproc_submitter_args(
user_args: str,
job_type: DataprocJobType = DataprocJobType.SPARK,
local_files_to_upload_to_gcs: List[str] = None,
):
if local_files_to_upload_to_gcs is None:
local_files_to_upload_to_gcs = []
# Rest of the function implementation...
🧰 Tools
🪛 Ruff (0.8.2)

894-894: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Comment on lines +46 to +58
# check if staging_output_dir exists
staging_dir = self.compile_context.staging_output_dir()
output_dir = self.compile_context.output_dir()
shutil.rmtree(output_dir)
shutil.move(staging_dir, output_dir)
if os.path.exists(staging_dir):
# replace staging_output_dir to output_dir
output_dir = self.compile_context.output_dir()
if os.path.exists(output_dir):
shutil.rmtree(output_dir)
shutil.move(staging_dir, output_dir)
else:
print(
f"Staging directory {staging_dir} does not exist. "
"Happens when every chronon config fails to compile or when no chronon configs exist."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Added safety check for staging directory existence.

Improves error handling by checking if staging directory exists before attempting operations, with informative error message when missing.

)
],
tags={"to_deprecate": True},
tags={"to_deprecate": "true"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Boolean changed to string

Changed from boolean True to string "true".


🌐 Web query:

Does the Python Thrift library handle boolean values differently than string values when serializing to JSON?

💡 Result:

The Python Thrift library does handle boolean values differently than string values when serializing to JSON, primarily due to the type-specific encoding enforced by Thrift's TJSONProtocol. This protocol ensures that each data type is distinctly represented in the JSON output:

  1. Boolean Values
    Thrift serializes boolean values (true/false) as JSON boolean literals. For example:

    {"is_valid": true}
  2. String Values
    Strings are serialized as JSON strings (with quotes):

    {"name": "Alice"}

Key Considerations:

  • Type Fidelity: Thrift’s TJSONProtocol maintains strict type separation between primitives like booleans and strings, avoiding ambiguity during deserialization[1][5].
  • Compatibility Issues: Some Thrift implementations (e.g., older versions of Thriftpy) may deviate from Apache Thrift’s official JSON encoding, causing mismatches. For instance, a boolean might be incorrectly serialized as a string if protocol implementations are incompatible[4][7].
  • Custom Handling: Tools like thrift-converter explicitly manage type conversions (e.g., enums as strings, sets as specially formatted objects), highlighting Thrift’s need for precise type mapping in JSON[7].

For reliable interoperability, ensure consistent protocol usage (e.g., TJSONProtocol for Apache Thrift) across all services[4][9].

Citations:


Fix Type Mismatch: Use Boolean, Not String

  • In api/py/test/test_group_by.py at line 252, the tag value should be a boolean, not the string "true".
  • Thrift’s TJSONProtocol distinguishes boolean values from strings. Using "true" breaks type fidelity.
  • Action: Change
        tags={"to_deprecate": "true"},
    to
        tags={"to_deprecate": True},

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ae7c4bc and 46ebf48.

📒 Files selected for processing (2)
  • frontend/src/routes/[conf]/[name]/overview/+page.svelte (1 hunks)
  • frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte (1)

40-40: Updated property path to match new metadata structure

The change from 'offlineSchedule' to 'executionInfo.scheduleCron' reflects the PR's metadata restructuring.

frontend/src/routes/[conf]/[name]/overview/+page.svelte (1)

371-391: Enhanced tooltip metadata display

The implementation now supports both direct and nested properties for better metadata representation.

Comment on lines 380 to 388
? data.value.conf.metaData?.[prop] !== undefined
: getNestedProperty(data.value.conf.metaData, prop.path) !== undefined}
<Tooltip.Item
{...tooltipProps.item}
label={typeof prop === 'string' ? prop : prop.display}
value={typeof prop === 'string'
? data.value.conf.metaData?.[prop]
: getNestedProperty(data.value.conf.metaData, prop.path)}
valueAlign="left"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Verify getNestedProperty availability

Function is used but not defined in visible code.


🏁 Script executed:

#!/bin/bash
# Check if getNestedProperty function is defined in the file or imported
rg "function getNestedProperty|import.*getNestedProperty" frontend/src/routes/\[conf\]/\[name\]/overview/+page.svelte

Length of output: 117


Action Required: Verify getNestedProperty Availability

The function getNestedProperty is used in frontend/src/routes/[conf]/[name]/overview/+page.svelte (lines 380–388) but isn’t defined or imported visibly in that file. Please confirm whether it’s defined or imported elsewhere in the repository (e.g., in a shared utilities module).


🏁 Script executed:

#!/bin/bash
# Search entire repository for function definition or import of getNestedProperty.
rg "function getNestedProperty" .
rg "import.*getNestedProperty" .

Length of output: 192


Action: Missing getNestedProperty Definition

  • In frontend/src/routes/[conf]/[name]/overview/+page.svelte (lines 380–388), getNestedProperty is used but isn’t defined or imported.
  • Please add the function definition or import it from the proper module.

@nikhil-zlai nikhil-zlai force-pushed the api_cleanup branch 2 times, most recently from 5101b01 to 5863d26 Compare March 7, 2025 01:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (17)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)

7-7: Avoid wildcard imports

Use explicit imports instead of import * for better code clarity and maintainability.

-from ai.chronon.types import *
+from ai.chronon.types import Join, JoinPart
🧰 Tools
🪛 Ruff (0.8.2)

7-7: from ai.chronon.types import * used; unable to detect undefined names

(F403)

api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)

1-1: Avoid wildcard imports

Use explicit imports for better code clarity and maintainability.

-from ai.chronon.types import *
+from ai.chronon.types import GroupBy, JoinSource, Query, selects, Aggregation, Operation, Accuracy
🧰 Tools
🪛 Ruff (0.8.2)

1-1: from ai.chronon.types import * used; unable to detect undefined names

(F403)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1)

1-3: Consider using explicit imports instead of star imports.

Replace from ai.chronon.types import * with explicit imports of the needed types.

from sources import test_sources
-from ai.chronon.types import *
+from ai.chronon.types import GroupBy, Aggregation, Operation
🧰 Tools
🪛 Ruff (0.8.2)

2-2: from ai.chronon.types import * used; unable to detect undefined names

(F403)

orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (1)

4-4: Unused import.

TableDependency is imported but not used explicitly.

api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (3)

4-5: Use booleans for clarity
Numeric flags for online and production might be less readable than booleans.

Also applies to: 9-10, 12-14


101-102: Check naming consistency
Confirm that newly updated team and outputNamespace follow naming standards.

Also applies to: 106-110


183-183: Revisit repeated naming
sample_team.label_part_group_by.label_part_group_by might be simplified.

Also applies to: 185-185, 186-189

api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (1)

9-10: Production off
If no production usage, consider removing.

api/py/ai/chronon/group_by.py (1)

434-448: Signature extended
Update docstring references.

api/thrift/common.thrift (4)

28-33: Consider defaulting empty maps.
Optional maps can be null; defaulting them avoids extra null checks.


35-40: Clarify difference from EnvironmentVariables.
This struct seems quite similar. Consider merging or explaining usage.


71-74: Enhance doc for ALL vs LATEST.
More clarity here helps avoid incorrect partition scans.


87-106: Check poll intervals.
Ensure nonzero defaults for dependencyPollIntervalMillis/healthCheckIntervalMillis.

api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (2)

37-47: Review daily scheduling.
Make sure scheduleCron and historicalBackfill are valid for expected data volumes.


162-171: Keep naming consistent.
Check that this group_by_with_kwargs meta matches prior references and key usage.

api/thrift/api.thrift (1)

327-329: Comments on tags.
Short doc is fine, but confirm no leftover references to removed tag usage.

api/py/ai/chronon/join.py (1)

409-409: online param addition

Please update docstring accordingly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 5101b01 and 5863d26.

📒 Files selected for processing (69)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (3 hunks)
  • api/py/ai/chronon/group_by.py (3 hunks)
  • api/py/ai/chronon/join.py (6 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/run.py (3 hunks)
  • api/py/ai/chronon/types.py (2 hunks)
  • api/py/ai/chronon/utils.py (2 hunks)
  • api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/event_sample_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py (0 hunks)
  • api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/sample_group_by.py (3 hunks)
  • api/py/test/sample/joins/sample_team/sample_chaining_join.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_join.py (3 hunks)
  • api/py/test/sample/joins/sample_team/sample_join_from_module.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_label_join.py (2 hunks)
  • api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_online_join.py (0 hunks)
  • api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1 (3 hunks)
  • api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1 (1 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (4 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.consistency_check (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by (0 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.never (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_label_join.v1 (5 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (5 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (4 hunks)
  • api/py/test/sample/staging_queries/sample_team/sample_staging_query.py (0 hunks)
  • api/py/test/test_group_by.py (3 hunks)
  • api/py/test/test_join.py (0 hunks)
  • api/py/test/test_run.py (3 hunks)
  • api/py/test/test_utils.py (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (0 hunks)
  • api/thrift/api.thrift (3 hunks)
  • api/thrift/common.thrift (1 hunks)
  • api/thrift/orchestration.thrift (2 hunks)
  • frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte (2 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala (2 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (3 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (2 hunks)
💤 Files with no reviewable changes (6)
  • api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
  • api/py/test/sample/joins/sample_team/sample_online_join.py
  • api/py/test/sample/staging_queries/sample_team/sample_staging_query.py
  • api/py/test/test_join.py
  • api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by
🚧 Files skipped from review as they are similar to previous changes (41)
  • api/py/test/sample/group_bys/sample_team/event_sample_group_by.py
  • api/py/ai/chronon/types.py
  • api/py/test/sample/group_bys/sample_team/sample_group_by.py
  • frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte
  • api/py/test/test_run.py
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala
  • api/py/ai/chronon/repo/compile.py
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1
  • orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala
  • api/py/ai/chronon/cli/compile/compiler.py
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala
  • api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1
  • api/py/test/test_group_by.py
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill
  • api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1
  • api/py/ai/chronon/cli/compile/parse_teams.py
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2
  • api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1
  • api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener
  • api/py/test/test_utils.py
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1
  • api/py/test/sample/production/joins/sample_team/sample_label_join.v1
  • api/py/test/sample/joins/sample_team/sample_join_from_module.py
  • api/py/test/sample/production/joins/sample_team/sample_join.v1
  • api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala
  • api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1
  • api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1
  • api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1
  • api/py/test/sample/joins/sample_team/sample_label_join.py
  • api/py/test/sample/production/joins/sample_team/sample_join.consistency_check
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

1015-1015: SyntaxError: Expected a statement


1015-1015: SyntaxError: Expected a statement


1015-1016: SyntaxError: Expected a statement


1016-1016: SyntaxError: Unexpected indentation

api/py/test/sample/group_bys/sample_team/chaining_group_by.py

1-1: from ai.chronon.types import * used; unable to detect undefined names

(F403)


4-4: GroupBy may be undefined, or defined from star imports

(F405)


5-5: JoinSource may be undefined, or defined from star imports

(F405)


7-7: Query may be undefined, or defined from star imports

(F405)


8-8: selects may be undefined, or defined from star imports

(F405)


18-18: Aggregation may be undefined, or defined from star imports

(F405)


18-18: Operation may be undefined, or defined from star imports

(F405)


20-20: Accuracy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py

2-2: from ai.chronon.types import * used; unable to detect undefined names

(F403)


4-4: GroupBy may be undefined, or defined from star imports

(F405)


8-8: Aggregation may be undefined, or defined from star imports

(F405)


10-10: Operation may be undefined, or defined from star imports

(F405)


17-17: GroupBy may be undefined, or defined from star imports

(F405)

api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py

7-7: from ai.chronon.types import * used; unable to detect undefined names

(F403)


9-9: Join may be undefined, or defined from star imports

(F405)


12-12: JoinPart may be undefined, or defined from star imports

(F405)


16-16: JoinPart may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: join_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: frontend_tests
🔇 Additional comments (86)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)

9-24: Join structure looks good

Clean implementation of parent join with appropriate configuration parameters.

🧰 Tools
🪛 Ruff (0.8.2)

9-9: Join may be undefined, or defined from star imports

(F405)


12-12: JoinPart may be undefined, or defined from star imports

(F405)


16-16: JoinPart may be undefined, or defined from star imports

(F405)

api/py/test/sample/joins/sample_team/sample_join.py (5)

15-15: Good explicit import

Clear import of EnvironmentVariables type.


26-33: Improved configuration structure

Removed experimental tags and replaced environment parameters with structured EnvironmentVariables class.


37-40: Consistent update

Removed tags and standardized string quote format.


50-51: Removed experimental tag

Clean removal of unnecessary experimental tag.


57-58: Removed experimental tag

Clean removal of unnecessary experimental tag.

api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1)

26-26: Imported instead of local definition

Good refactoring to import the label_part_group_by instead of defining it locally.

api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)

4-28: Well-structured GroupBy implementation

Clear definition with appropriate configuration for sources, keys, aggregations and properties.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: GroupBy may be undefined, or defined from star imports

(F405)


5-5: JoinSource may be undefined, or defined from star imports

(F405)


7-7: Query may be undefined, or defined from star imports

(F405)


8-8: selects may be undefined, or defined from star imports

(F405)


18-18: Aggregation may be undefined, or defined from star imports

(F405)


18-18: Operation may be undefined, or defined from star imports

(F405)


20-20: Accuracy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py (2)

4-15: LGTM - GroupBy definition is correct.

Well-structured GroupBy definition with clear keys and aggregations setup.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: GroupBy may be undefined, or defined from star imports

(F405)


8-8: Aggregation may be undefined, or defined from star imports

(F405)


10-10: Operation may be undefined, or defined from star imports

(F405)


17-22: LGTM - Second GroupBy configuration is valid.

Valid configuration with no aggregations specified.

🧰 Tools
🪛 Ruff (0.8.2)

17-17: GroupBy may be undefined, or defined from star imports

(F405)

api/src/main/scala/ai/chronon/api/Builders.scala (1)

290-292: Refactored historical backfill management.

Successfully replaced direct property setting with the new ExecutionInfo object pattern.

api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3)

4-14: New metadata structure looks good.

Successfully migrated to the new structure with executionInfo object.


37-40: GroupBy metadata properly updated.

Correctly implemented new format with executionInfo containing scheduleCron and historicalBackfill.


103-112: EntityGroupBy metadata correctly updated.

Successfully migrated to the new metadata structure with executionInfo.

orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (2)

15-15: Temporary solution for dependencies.

Replacing actual dependencies with empty sequence. TODO comment explains intent.

Is a future PR planned to address this TODO comment?


27-27: Simplified table name assignment.

More direct approach to setting the table name.

api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (2)

36-37: Confirm usage of historicalBackfill
Ensure this field is actually used downstream; remove if unnecessary.

Also applies to: 41-45


231-233: Validate weekly schedule
Confirm @weekly suits the intended frequency.

api/py/test/sample/joins/sample_team/sample_chaining_join.py (1)

20-20: Confirm new import
Validate that chaining_group_by_v1 correctly replaces any previous join references.

api/py/ai/chronon/utils.py (2)

374-374: Graceful fallback
Defaulting to @daily is fine; ensure no unexpected scheduling needs are missed.


402-411: Check null references
Make sure metaData.executionInfo is always set to avoid runtime errors.

api/py/test/sample/production/joins/sample_team/sample_join.never (5)

4-5: No issues


9-10: Double-check usage


12-14: Cron check


36-37: No issues


42-47: Scheduling mismatch?
Check if daily schedule aligns with earlier "@Never" usage.

api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (6)

4-5: Ok


12-14: Daily schedule
All good.


36-37: Ok


41-45: Might conflict
Ensure daily schedule aligns with top-level.


100-102: Ok


106-110: Execution details
Matches prior pattern.

api/py/ai/chronon/group_by.py (2)

637-644: Exec info
No issues.


658-660: Metadata usage
Looks consistent.

api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (5)

4-16: Great removal of unused fields and introduction of scheduling.


37-49: Consistent scheduling metadata added.


58-64: Execution info looks uniform.


87-92: Online config simplified properly.


101-110: Dependency cleanup is well done.

api/thrift/orchestration.thrift (3)

113-113: Renamed enum value to MERGE.


166-166: Removal of extraneous attribute.


174-174: Changed lineageHash to version for clarity.

api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3)

4-14: Good consolidation into executionInfo.


36-46: Removed duplicates, added schedule.


100-110: Implementation aligns with streamlined scheduling.

api/thrift/common.thrift (2)

42-69: Document forceCompute overhead.
If forceCompute is expensive, clarify usage to prevent unintended overhead.


76-85: Mention keyBase64 format.
Clarify how keys are encoded/decoded to ensure consistency.

api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (2)

4-16: Validate new meta fields.
Check that online, production, and consistency settings align with actual usage.


100-112: Confirm consistency with left event logic.
Ensure these groupBy parts won’t conflict with prior partial data or leftover fields.

api/thrift/api.thrift (1)

485-486: Check duplication.
env and conf might conflict with any local fields. Keep them consistent.

api/py/ai/chronon/join.py (9)

202-202: Streamlined ExternalSource metadata

Dropping custom_json is consistent with the cleanup objective.


292-295: Initialized label schedule metadata

Straightforward code for scheduling.


399-405: Expanded Join signature with new params

Validate usage in tests to ensure no regressions.


411-411: Optional consistency check

Confirm None default logic is tested.


414-414: offline_schedule default

Looks fine.


416-418: Added conf, env_vars, step_days

Helps modularize environment and scheduling.


564-569: exec_info for offline scheduling

Implementation is clear.


578-578: consistencyCheck assignment

All good.


580-580: executionInfo assignment

No issues noted.

api/py/ai/chronon/repo/run.py (27)

22-26: New imports for modules

All fine.


29-39: GCP-specific imports

Appropriate for Dataproc integration.


40-44: Utils import block

No concerns.


48-50: ONLINE_ARGS, OFFLINE_ARGS, ONLINE_WRITE_ARGS definitions

Clear constants.


52-52: ONLINE_OFFLINE_WRITE_ARGS

Simple combination of existing constants.


53-59: ONLINE_MODES array

Straightforward.


60-76: SPARK_MODES list

Proper enumeration of modes.


77-77: MODES_USING_EMBEDDED

No issues.


79-79: Comment for spark versions

No problem.


80-80: SUPPORTED_SPARK array

Covers major versions.


81-81: SCALA_VERSION_FOR_SPARK mapping

Makes sense.


87-108: MODE_ARGS dictionary

Well-structured argument templates.


110-142: ROUTES dictionary

Mapping looks consistent.


144-144: UNIVERSAL_ROUTES

Contains generic modes.


150-155: DATAPROC_ENTRY and jar constants

Helpful for GCP deployments.

Also applies to: 157-157


160-163: DataprocJobType enum

Straightforward enum.


165-187: retry_decorator

Simple retry pattern.


189-194: custom_json function

Concise implementation.


196-199: check_call function

Looks good.


201-204: check_output function

No issues.


206-233: download_only_once function

Nicely handles caching.


235-285: download_jar function

Checks Spark and Scala versions.


287-289: get_teams_json_file_path

Trivial utility.


291-437: set_runtime_env function

Provides layered environment resolution nicely.


1003-1013: Dataproc check block

Branching logic is sound so far.


1014-1014: Setting jar_path to chronon_jar

No immediate issues.


1015-1022: ⚠️ Potential issue

Syntax error with second else:

This causes a parsing error. Consider removing or rewriting:

   else:
-     jar_path = chronon_jar
- else:
+ elif some_other_condition:
     service_jar_path = download_zipline_jar(

Likely an incorrect or invalid review comment.

🧰 Tools
🪛 Ruff (0.8.2)

1015-1015: SyntaxError: Expected a statement


1015-1015: SyntaxError: Expected a statement


1015-1016: SyntaxError: Expected a statement


1016-1016: SyntaxError: Unexpected indentation

Comment on lines +238 to +286
/**
* contains configs params that don't change the contents of the output.
**/
struct MetaData {
1: optional string name


2: optional string team

// will be set by the compiler based on changes to column lineage - do not manually set
3: optional string version

4: optional string outputNamespace

5: optional map<string, string> tableProperties

// tag_key -> tag_value - tags allow for repository wide querying, deprecations etc
// this is object level tag - applies to all columns produced by the object - GroupBy, Join, Model etc
6: optional map<string, string> tags
// column -> tag_key -> tag_value
7: optional map<string, map<string, string>> columnTags

// marking this as true means that the conf can be served online
// once marked online, a conf cannot be changed - compiling the conf won't be allowed
2: optional bool online
100: optional bool online

// marking this as true means that the conf automatically generates a staging copy
// this flag is also meant to help a monitoring system re-direct alerts appropriately
3: optional bool production
4: optional string customJson
5: optional list<string> dependencies
6: optional map<string, string> tableProperties
// todo: add sanity check in materialize script
7: optional string outputNamespace
// team name for the job
8: optional string team
// modes - backfill, upload, streaming
// join streaming makes sense & join upload probably also makes sense
// (These just aren't implemented yet)
// The inner map should contain environment variables

// DEPRECATED
9: optional map<string, map<string, string>> modeToEnvMap
101: optional bool production

102: optional string sourceFile

// users can put anything they want in here, but the compiler shouldn't
103: optional string customJson

// enable job to compute consistency metrics
10: optional bool consistencyCheck
200: optional bool consistencyCheck

// percentage of online serving requests to log to warehouse
11: optional double samplePercent
// cron expression for airflow DAG schedule
12: optional string offlineSchedule
201: optional double samplePercent

// percentage of online serving requests used to compute consistency metrics
13: optional double consistencySamplePercent
// Flag to indicate whether join backfill should backfill previous holes.
// Setting to false will only backfill latest single partition
14: optional bool historicalBackfill
202: optional double consistencySamplePercent

// specify how to compute drift
15: optional observability.DriftSpec driftSpec

16: optional EnvironmentVariables env
17: optional ConfigProperties conf
203: optional observability.DriftSpec driftSpec

18: optional string sourceFile
# information that needs to be present on every physical node
204: optional common.ExecutionInfo executionInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use optional fields cautiously.
Excess optional fields can lead to confusion; consider marking critical fields as required.

AwsRunner(ctx.params).run()
else:
raise ValueError(f"Unsupported cloud provider: {cloud_provider}")
Runner(ctx.params, os.path.expanduser(jar_path)).run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Runner call with undefined cloud_provider

cloud_provider is used at line 1024 but never defined.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
api/py/ai/chronon/join.py (1)

396-581: 🛠️ Refactor suggestion

Remove outdated doc references
Docstring still mentions removed params (e.g. additional_args). Keep it aligned with the new signature.

🧰 Tools
🪛 Ruff (0.8.2)

514-514: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

🧹 Nitpick comments (8)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)

7-7: Avoid star imports.

Replace with explicit imports for better code traceability.

-from ai.chronon.types import *
+from ai.chronon.types import Join, JoinPart
🧰 Tools
🪛 Ruff (0.8.2)

7-7: from ai.chronon.types import * used; unable to detect undefined names

(F403)

api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)

1-1: Avoid star imports.

Replace with explicit imports for better code traceability.

-from ai.chronon.types import *
+from ai.chronon.types import GroupBy, JoinSource, Query, selects, Aggregation, Operation, Accuracy
🧰 Tools
🪛 Ruff (0.8.2)

1-1: from ai.chronon.types import * used; unable to detect undefined names

(F403)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1)

2-2: Avoid star imports.

Replace with explicit imports for better code traceability.

-from ai.chronon.types import *
+from ai.chronon.types import GroupBy, Aggregation, Operation
🧰 Tools
🪛 Ruff (0.8.2)

2-2: from ai.chronon.types import * used; unable to detect undefined names

(F403)

spark/src/test/resources/joins/team/example_join_failure.v1 (1)

5-8: Consider using boolean values.
If online and production are meant to be booleans, using false instead of 0 may improve clarity.

5     "team": "relevance",
7-    "online": 0,
8-    "production": 0
+    "online": false,
+    "production": false
api/py/ai/chronon/repo/utils.py (1)

231-244: New execution info handling

Added priority-based environment selection with new executionInfo structure in config JSON.

Consider adding a comment explaining the preference of new_env over old_env for future maintainers.

online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (1)

64-71: Improved null handling in response map creation

Enhanced safety with null checks when creating field name/value mappings.

Consider extracting this logic to a helper method to improve readability.

api/thrift/common.thrift (2)

28-40: Consider mandatory fields
All properties are optional. Some may need to be mandatory for consistent usage.


42-106: Test for partial usage
These new dependencies and execution info are broad. Please add unit tests covering partial and full usage.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 2bbac28 and b979f47.

📒 Files selected for processing (78)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (3 hunks)
  • api/py/ai/chronon/group_by.py (3 hunks)
  • api/py/ai/chronon/join.py (6 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/run.py (1 hunks)
  • api/py/ai/chronon/repo/utils.py (8 hunks)
  • api/py/ai/chronon/types.py (2 hunks)
  • api/py/ai/chronon/utils.py (2 hunks)
  • api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/event_sample_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py (0 hunks)
  • api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1 hunks)
  • api/py/test/sample/group_bys/sample_team/sample_group_by.py (3 hunks)
  • api/py/test/sample/joins/sample_team/sample_chaining_join.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_join.py (3 hunks)
  • api/py/test/sample/joins/sample_team/sample_join_from_module.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_label_join.py (2 hunks)
  • api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1 hunks)
  • api/py/test/sample/joins/sample_team/sample_online_join.py (0 hunks)
  • api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1 (3 hunks)
  • api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1 (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill (1 hunks)
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1 (1 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (4 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.consistency_check (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by (0 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.never (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (2 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_label_join.v1 (5 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (5 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (4 hunks)
  • api/py/test/sample/staging_queries/sample_team/sample_staging_query.py (0 hunks)
  • api/py/test/sample/teams.py (1 hunks)
  • api/py/test/test_group_by.py (3 hunks)
  • api/py/test/test_join.py (0 hunks)
  • api/py/test/test_run.py (3 hunks)
  • api/py/test/test_utils.py (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (0 hunks)
  • api/thrift/api.thrift (3 hunks)
  • api/thrift/common.thrift (1 hunks)
  • api/thrift/orchestration.thrift (2 hunks)
  • frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (0 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (3 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala (2 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (3 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala (1 hunks)
  • orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala (2 hunks)
  • scripts/distribution/run_zipline_quickstart.sh (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (2 hunks)
  • spark/src/test/resources/group_bys/team/example_group_by.v1 (1 hunks)
  • spark/src/test/resources/joins/team/example_join.v1 (2 hunks)
  • spark/src/test/resources/joins/team/example_join_failure.v1 (1 hunks)
💤 Files with no reviewable changes (7)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
  • api/py/test/sample/staging_queries/sample_team/sample_staging_query.py
  • api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by
  • api/py/test/sample/joins/sample_team/sample_online_join.py
  • api/py/test/test_join.py
  • api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py
✅ Files skipped from review due to trivial changes (1)
  • spark/src/test/resources/group_bys/team/example_group_by.v1
🚧 Files skipped from review as they are similar to previous changes (46)
  • api/py/test/sample/group_bys/sample_team/event_sample_group_by.py
  • api/py/test/sample/teams.py
  • api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1
  • orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala
  • api/py/ai/chronon/cli/compile/compiler.py
  • api/py/ai/chronon/repo/compile.py
  • api/py/ai/chronon/types.py
  • api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1
  • frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • api/py/test/sample/joins/sample_team/sample_label_join.py
  • api/py/ai/chronon/repo/run.py
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill
  • api/py/test/sample/group_bys/sample_team/sample_group_by.py
  • api/py/test/test_run.py
  • api/py/test/sample/joins/sample_team/sample_join.py
  • api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala
  • orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala
  • api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
  • api/py/test/test_utils.py
  • api/py/test/sample/joins/sample_team/sample_join_from_module.py
  • api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1
  • api/py/test/test_group_by.py
  • api/py/test/sample/production/joins/sample_team/sample_join.consistency_check
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill
  • orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala
  • api/py/test/sample/production/joins/sample_team/sample_label_join.v1
  • api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1
  • api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0
  • api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1
  • api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1
  • api/py/test/sample/production/joins/sample_team/sample_online_join.v1
  • api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join
  • api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1
  • api/thrift/api.thrift
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/group_bys/sample_team/chaining_group_by.py

1-1: from ai.chronon.types import * used; unable to detect undefined names

(F403)


4-4: GroupBy may be undefined, or defined from star imports

(F405)


5-5: JoinSource may be undefined, or defined from star imports

(F405)


7-7: Query may be undefined, or defined from star imports

(F405)


8-8: selects may be undefined, or defined from star imports

(F405)


18-18: Aggregation may be undefined, or defined from star imports

(F405)


18-18: Operation may be undefined, or defined from star imports

(F405)


20-20: Accuracy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py

2-2: from ai.chronon.types import * used; unable to detect undefined names

(F403)


4-4: GroupBy may be undefined, or defined from star imports

(F405)


8-8: Aggregation may be undefined, or defined from star imports

(F405)


10-10: Operation may be undefined, or defined from star imports

(F405)


17-17: GroupBy may be undefined, or defined from star imports

(F405)

api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py

7-7: from ai.chronon.types import * used; unable to detect undefined names

(F403)


9-9: Join may be undefined, or defined from star imports

(F405)


12-12: JoinPart may be undefined, or defined from star imports

(F405)


16-16: JoinPart may be undefined, or defined from star imports

(F405)

🪛 Shellcheck (0.10.0)
scripts/distribution/run_zipline_quickstart.sh

[warning] 63-63: CHRONON_ROOT appears unused. Verify use (or export if used externally).

(SC2034)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (64)
api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1)

10-13: New executionInfo object replaces deprecated fields.

Added structured configuration for scheduling via executionInfo object.

api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4)

4-14: Metadata restructuring looks good

Repositioned team/outputNamespace fields and added executionInfo object with scheduleCron to replace removed customJson/dependencies fields.


36-46: Consistent metadata pattern applied

Clean restructuring with team/outputNamespace repositioning and executionInfo replacing legacy fields.


101-111: Same pattern maintained

Metadata structure consistent with other sections.


160-170: Structure consistency confirmed

Final groupBy section follows same cleanup pattern.

api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)

9-24: Join configuration looks good.

The join configuration properly follows the new structure without using deprecated parameters.

🧰 Tools
🪛 Ruff (0.8.2)

9-9: Join may be undefined, or defined from star imports

(F405)


12-12: JoinPart may be undefined, or defined from star imports

(F405)


16-16: JoinPart may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)

4-28: GroupBy implementation looks good.

The implementation correctly uses the simplified API structure without deprecated parameters.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: GroupBy may be undefined, or defined from star imports

(F405)


5-5: JoinSource may be undefined, or defined from star imports

(F405)


7-7: Query may be undefined, or defined from star imports

(F405)


8-8: selects may be undefined, or defined from star imports

(F405)


18-18: Aggregation may be undefined, or defined from star imports

(F405)


18-18: Operation may be undefined, or defined from star imports

(F405)


20-20: Accuracy may be undefined, or defined from star imports

(F405)

api/py/test/sample/group_bys/sample_team/label_part_group_by.py (2)

4-15: GroupBy implementation looks good.

Implementation correctly uses the simplified API structure.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: GroupBy may be undefined, or defined from star imports

(F405)


8-8: Aggregation may be undefined, or defined from star imports

(F405)


10-10: Operation may be undefined, or defined from star imports

(F405)


17-22: GroupBy implementation with null aggregations looks good.

Second GroupBy correctly implements the pattern with null aggregations.

🧰 Tools
🪛 Ruff (0.8.2)

17-17: GroupBy may be undefined, or defined from star imports

(F405)

scripts/distribution/run_zipline_quickstart.sh (1)

23-23: SSH host alias usage looks correct.

The modified GitHub URL format uses SSH host alias for authentication.

api/py/ai/chronon/cli/compile/parse_teams.py (3)

1-9: New imports look fine.
They correctly bring in deepcopy and the needed Thrift types.


72-84: Good approach to ensure ExecutionInfo is initialized.
Creating metadata.executionInfo when absent improves consistency.


113-130: Refined _merge_mode_maps logic.
Using deepcopy and clearing common ensures safe merges without accidental mutations.

online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (3)

75-77: Intermediate fileList variable.
This adds clarity without changing core behavior.


80-82: Path validation is good.
Ensures files are loaded strictly under the intended directory.


86-86: Relative path usage.
Double-check that older references are unaffected by the shift from absolute to relative paths.

api/py/test/sample/joins/sample_team/sample_chaining_join.py (1)

20-20: Importing chaining_group_by_v1.
This direct reference is valid and consistent.

api/py/test/sample/production/joins/sample_team/sample_join.never (2)

11-14: Configuration structure modernized with executionInfo object

The executionInfo object replaces the previous dependencies and customJson fields, providing a cleaner way to specify execution parameters.


42-47:

❓ Verification inconclusive

Online status changed to 1 and execution info added

The online value changed from 0 to 1, and execution settings are now consolidated into the executionInfo object.


🏁 Script executed:

#!/bin/bash
# Check if this online status change impacts other components
rg "sample_team.sample_group_by.v1.*online" --type json

Length of output: 57


Manual Verification Required: Confirm "online" Key Impact

The automated search returned no output. Please manually verify that no other components depend on the previous online value (0) and that the new executionInfo settings are correctly integrated.

  • Check any components referencing "online": across the repository.
  • Confirm that no pipeline logic uses the former online status.
api/py/ai/chronon/utils.py (2)

374-374: Updated to reference the new executionInfo structure

Function now correctly references the new executionInfo.scheduleCron field instead of a direct property.


402-420: Simplified code flow with removal of dependency management

The code now has cleaner control flow after the removal of dependency-related functions and fields.

api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3)

11-14: Replaced scheduling configuration with executionInfo

The new executionInfo object cleanly encapsulates scheduling information that was previously scattered across different fields.


41-45: Standardized execution configuration for group_by

Consistent with other changes, replaced dependencies with structured executionInfo object.


106-110: Standardized execution configuration for entity_sample_group_by

Similar pattern applied across all components for consistency.

api/py/ai/chronon/group_by.py (4)

434-447: API signature improved with better parameter organization

Removed unused parameters and added direct support for configuration and step days, aligning with PR objectives.


558-576: Added documentation for new parameters

Clear documentation added for the new configuration parameters, explaining the priority order for configuration and environment variables.


637-643: Replaced dependency management with ExecutionInfo

Consolidated execution parameters into a single ExecutionInfo object, improving code organization.


658-661: Updated metadata structure in GroupBy

Removed dependency fields and replaced with executionInfo, consistent with JSON structure changes.

api/py/ai/chronon/repo/utils.py (8)

63-65: Good switch to logging from print

Replaced print with logging.info for better output management.


68-70: Good logging practice

Consistent use of logging instead of print statements.


81-83: Improved logging

Standardized on logging.info for better observability.


91-97: Better logging format

Improved log message structure using logging.info.


102-104: More descriptive logging

Enhanced log message with better context.


120-123: Improved assertion formatting

Better structured error message for unsupported spark versions.


130-131: Better log message

More informative log for jar download source.


305-313: Switched to logging from print

Standardized on logging.info for environment variable settings.

api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2)

12-14: Added executionInfo with scheduleCron

Replaced removed dependencies and customJson with structured executionInfo object.


44-47: Added executionInfo with scheduleCron and historical backfill

Structured metadata with executionInfo containing scheduling parameters.

online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (2)

115-124: Simplified control flow

Removed unnecessary output variable assignment and directly returns conditional results.


182-182: Added explicit return type

Improved type safety with explicit Array[Any] return type.

spark/src/test/resources/joins/team/example_join.v1 (2)

4-7: Simplified metadata structure

Removed unused fields (dependencies, customJson, tableProperties) and reordered properties.


24-25: Removed dependencies field

Streamlined groupBy metadata by removing unused fields.

api/py/test/sample/production/joins/sample_team/sample_join.v1 (5)

4-5: Looks good.
Consistent team and namespace fields.


9-10: No issue.
Switching online to 1 and production to 0 is clear.


12-19: Execution info is well-structured.
ScheduleCron usage is concise.


41-42: Matches earlier pattern.
Naming consistent with team/namespace.


47-52: GroupBy config looks good.
Online and production align with approach; historicalBackfill is optional.

api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (5)

4-5: All good.
Team and outputNamespace updated.


9-10: Clear.
Online remains 0, production is 0.


12-14: Execution info block is fine.
ScheduleCron set.


36-37: Team/outputNamespace alignment.
Same pattern as above.


42-47: Consistent.
Online is 1, new executionInfo with historicalBackfill.

api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (8)

4-5: Team/namespace.
Good consistency with approach.


9-11: Online, production, consistencyCheck.
Nicely structured.


13-16: executionInfo addition is short and clear.
ScheduleCron daily is standard.


37-39: GroupBy name re-labeled.
No conflicts.


44-49: Online, production, executionInfo.
Change is aligned with others.


58-64: Consistent usage.
Reaffirms daily schedule, historicalBackfill is 0.


87-92: Output namespace, online, executionInfo.
Minimal changes, no conflict.


149-153: Same pattern for entity groupBy.
Matches the rest.

api/thrift/orchestration.thrift (2)

113-113: Check backward compatibility
Changing enum values might break older references. Verify usage in dependent modules.


174-174: Rename references across code
Renaming lineageHash to version is good. Please ensure all references are updated.

api/py/ai/chronon/join.py (2)

202-202: Good usage of MetaData
No functional concerns.


292-295: Check offline schedule usage
Ensure label_offline_schedule is tested if job intervals are critical.

DATAPROC_SUBMITTER_ID_STR="Dataproc submitter job id"

echo -e "${GREEN}<<<<<.....................................COMPILE.....................................>>>>>\033[0m"
CHRONON_ROOT=`pwd`/api/py/test/sample
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

CHRONON_ROOT variable appears unused.

The variable is defined but not referenced elsewhere in the script.


🏁 Script executed:

#!/bin/bash
# Check if CHRONON_ROOT is used elsewhere in the script
grep -n "CHRONON_ROOT" scripts/distribution/run_zipline_quickstart.sh

Length of output: 110


Unused Variable Removal: The grep output confirms that CHRONON_ROOT is only defined at line 63 with no further use. Please remove it or add a comment if it's intended for future use.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 63-63: CHRONON_ROOT appears unused. Verify use (or export if used externally).

(SC2034)

@nikhil-zlai nikhil-zlai merged commit 888bb12 into main Mar 7, 2025
22 checks passed
@nikhil-zlai nikhil-zlai deleted the api_cleanup branch March 7, 2025 04:45
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary
we want to remove api flags that are not used & not usable beyond oss.
if we don't remove these now, we will end up having to migrate these
when the orchestrator is ready.

added direct support for conf and step days, removed dependencies,
tableProps, old style env and customJson

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**  
- Introduced a consolidated execution configuration that simplifies
scheduling, environment, and backfill settings.

- **Refactor**  
- Streamlined metadata and dependency management in data processing and
join/group-by workflows.
- Enhanced error handling for file operations to better manage missing
directories and improve feedback.

- **Frontend**  
- Updated the metadata display to now show scheduling details, providing
clearer overview information.

- **Tests**  
- Revised test cases to reflect the new naming conventions and metadata
structure.
  - Updated assertions to simplify the access of metadata tags.  
  - Removed obsolete tests related to dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary
we want to remove api flags that are not used & not usable beyond oss.
if we don't remove these now, we will end up having to migrate these
when the orchestrator is ready.

added direct support for conf and step days, removed dependencies,
tableProps, old style env and customJson

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**  
- Introduced a consolidated execution configuration that simplifies
scheduling, environment, and backfill settings.

- **Refactor**  
- Streamlined metadata and dependency management in data processing and
join/group-by workflows.
- Enhanced error handling for file operations to better manage missing
directories and improve feedback.

- **Frontend**  
- Updated the metadata display to now show scheduling details, providing
clearer overview information.

- **Tests**  
- Revised test cases to reflect the new naming conventions and metadata
structure.
  - Updated assertions to simplify the access of metadata tags.  
  - Removed obsolete tests related to dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
we want to remove api flags that are not used & not usable beyond oss.
if we don't remove these now, we will end up having to migrate these
when the orchestrator is ready.

added direct support for conf and step days, removed dependencies,
tableProps, old style env and customJson

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**  
- Introduced a consolidated execution configuration that simplifies
scheduling, environment, and backfill settings.

- **Refactor**  
- Streamlined metadata and dependency management in data processing and
join/group-by workflows.
- Enhanced error handling for file operations to better manage missing
directories and improve feedback.

- **Frontend**  
- Updated the metadata display to now show scheduling details, providing
clearer overview information.

- **Tests**  
- Revised test cases to reflect the new naming conventions and metadata
structure.
  - Updated assertions to simplify the access of metadata tags.  
  - Removed obsolete tests related to dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
we want to remove api flags that are not used & not usable beyond oss.
if we don't remove these now, we will end up having to migrate these
when the orchestrator is ready.

added direct support for conf and step days, removed dependencies,
tableProps, old style env and customJson

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**  
- Introduced a consolidated execution configuration that simplifies
scheduling, environment, and backfill settings.

- **Refactor**  
- Streamlined metadata and dependency management in data processing and
join/group-by workflows.
- Enhanced error handling for file operations to better manage missing
directories and improve feedback.

- **Frontend**  
- Updated the metadata display to now show scheduling details, providing
clearer overview information.

- **Tests**  
- Revised test cases to reflect the new naming conventions and metadata
structure.
  - Updated assertions to simplify the access of metadata tags.  
  - Removed obsolete tests related to dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary
we want to remove api flags that are not used & not usable beyond oss.
if we don't remove these now, we will end up having to migrate these
when the orchestrator is ready.

added direct support for conf and step days, removed dependencies,
tableProps, old style env and customJson

## Cheour clientslist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**  
- Introduced a consolidated execution configuration that simplifies
scheduling, environment, and baour clientsfill settings.

- **Refactor**  
- Streamlined metadata and dependency management in data processing and
join/group-by workflows.
- Enhanced error handling for file operations to better manage missing
directories and improve feedbaour clients.

- **Frontend**  
- Updated the metadata display to now show scheduling details, providing
clearer overview information.

- **Tests**  
- Revised test cases to reflect the new naming conventions and metadata
structure.
  - Updated assertions to simplify the access of metadata tags.  
  - Removed obsolete tests related to dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@coderabbitai coderabbitai bot mentioned this pull request Aug 7, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Aug 20, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants