-
Notifications
You must be signed in to change notification settings - Fork 8
cleanup: remove unused api flags #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR updates several API functions by removing deprecated parameters and introducing new ones to encapsulate execution details via an Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Compiler as Compiler
User->>Compiler: Trigger compile process
Compiler->>OS: Check for staging directory existence
alt Staging exists
Compiler->>OS: If output dir exists, remove it
Compiler->>OS: Move staging directory to output directory
else Staging missing
Compiler->>User: Log warning about missing staging directory
end
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
api/py/ai/chronon/join.py (1)
532-536: Remove unused code.custom_json is created but never used.
- custom_json = {"check_consistency": check_consistency, "lag": lag}🧰 Tools
🪛 Ruff (0.8.2)
532-532: Local variable
custom_jsonis assigned to but never usedRemove assignment to unused variable
custom_json(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (4)
api/py/ai/chronon/group_by.py(3 hunks)api/py/ai/chronon/join.py(4 hunks)api/py/ai/chronon/utils.py(0 hunks)api/thrift/api.thrift(1 hunks)
💤 Files with no reviewable changes (1)
- api/py/ai/chronon/utils.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
🔇 Additional comments (6)
api/thrift/api.thrift (1)
279-279: Clean addition.The optional stepDays field is properly defined.
api/py/ai/chronon/group_by.py (2)
444-446: Good API cleanup.Replaced legacy parameters with more structured configuration approach.
655-657: Config properly applied.New parameters correctly passed to metadata.
api/py/ai/chronon/join.py (3)
204-205: Clean removal.Simplified ExternalSource signature by removing custom_json.
414-416: Good API cleanup.Consistent with GroupBy changes.
572-574: Config properly applied.New parameters correctly passed to metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (9)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)
7-7: Avoid wildcard imports.Replace with explicit imports for better code clarity.
-from ai.chronon.types import * +from ai.chronon.types import Join, JoinPart🧰 Tools
🪛 Ruff (0.8.2)
7-7:
from ai.chronon.types import *used; unable to detect undefined names(F403)
orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala (1)
8-9: Methods commented out rather than removed.Consider removing these methods completely rather than commenting them out since they're no longer needed according to the PR objectives.
api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)
1-2: Avoid wildcard imports.Replace wildcard imports with explicit imports for better code clarity and maintenance.
-from ai.chronon.types import * +from ai.chronon.types import GroupBy, JoinSource, Query, selects, Aggregation, Operation, Accuracy🧰 Tools
🪛 Ruff (0.8.2)
1-1:
from ai.chronon.types import *used; unable to detect undefined names(F403)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1)
1-2: Avoid wildcard imports.Replace wildcard imports with explicit imports for better code clarity and maintenance.
-from ai.chronon.types import * +from ai.chronon.types import GroupBy, Aggregation, Operation🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from ai.chronon.types import *used; unable to detect undefined names(F403)
api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (1)
5-9: Consider using booleans for online/production flags.
Might simplify checks if these fields are actually boolean.api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (1)
9-10: Consider booleans for online/production.
If these are meant to be true/false, booleans may be clearer.api/src/main/scala/ai/chronon/api/Extensions.scala (1)
647-647: Extract duplicated logic.
The same TODO repeats; consider a shared helper.api/thrift/common.thrift (1)
87-106: Well-designed ExecutionInfo structureThe ExecutionInfo structure elegantly combines the various execution parameters that were previously scattered across multiple fields.
Consider adding documentation comments to the stepDays field to clarify its purpose, especially as it's a new addition mentioned in the PR objectives.
api/thrift/api.thrift (1)
285-286: Comment style inconsistency.Hash symbol used instead of double slash for comment in Thrift file.
- # information that needs to be present on every physical node + // information that needs to be present on every physical node
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (69)
api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(3 hunks)api/py/ai/chronon/group_by.py(3 hunks)api/py/ai/chronon/join.py(6 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/ai/chronon/repo/run.py(25 hunks)api/py/ai/chronon/types.py(2 hunks)api/py/ai/chronon/utils.py(2 hunks)api/py/test/sample/group_bys/sample_team/chaining_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/event_sample_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py(0 hunks)api/py/test/sample/group_bys/sample_team/label_part_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/sample_group_by.py(3 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join.py(1 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py(1 hunks)api/py/test/sample/joins/sample_team/sample_join.py(3 hunks)api/py/test/sample/joins/sample_team/sample_join_from_module.py(1 hunks)api/py/test/sample/joins/sample_team/sample_label_join.py(2 hunks)api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py(1 hunks)api/py/test/sample/joins/sample_team/sample_online_join.py(0 hunks)api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1(3 hunks)api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1(1 hunks)api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0(2 hunks)api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1(4 hunks)api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join.consistency_check(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.never(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2(4 hunks)api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_label_join.v1(5 hunks)api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1(5 hunks)api/py/test/sample/production/joins/sample_team/sample_online_join.v1(4 hunks)api/py/test/sample/staging_queries/sample_team/sample_staging_query.py(0 hunks)api/py/test/test_group_by.py(3 hunks)api/py/test/test_join.py(0 hunks)api/py/test/test_run.py(3 hunks)api/py/test/test_utils.py(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(2 hunks)api/src/main/scala/ai/chronon/api/TilingUtils.scala(0 hunks)api/thrift/api.thrift(3 hunks)api/thrift/common.thrift(1 hunks)api/thrift/orchestration.thrift(2 hunks)orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala(2 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala(3 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala(1 hunks)
💤 Files with no reviewable changes (5)
- api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py
- api/py/test/sample/staging_queries/sample_team/sample_staging_query.py
- api/py/test/sample/joins/sample_team/sample_online_join.py
- api/py/test/test_join.py
- api/src/main/scala/ai/chronon/api/TilingUtils.scala
✅ Files skipped from review due to trivial changes (6)
- api/py/ai/chronon/repo/compile.py
- spark/src/main/scala/ai/chronon/spark/GroupBy.scala
- orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala
- api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill
- orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala
- orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py
7-7: from ai.chronon.types import * used; unable to detect undefined names
(F403)
9-9: Join may be undefined, or defined from star imports
(F405)
12-12: JoinPart may be undefined, or defined from star imports
(F405)
16-16: JoinPart may be undefined, or defined from star imports
(F405)
api/py/test/sample/group_bys/sample_team/chaining_group_by.py
1-1: from ai.chronon.types import * used; unable to detect undefined names
(F403)
4-4: GroupBy may be undefined, or defined from star imports
(F405)
5-5: JoinSource may be undefined, or defined from star imports
(F405)
7-7: Query may be undefined, or defined from star imports
(F405)
8-8: selects may be undefined, or defined from star imports
(F405)
18-18: Aggregation may be undefined, or defined from star imports
(F405)
18-18: Operation may be undefined, or defined from star imports
(F405)
20-20: Accuracy may be undefined, or defined from star imports
(F405)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py
2-2: from ai.chronon.types import * used; unable to detect undefined names
(F403)
4-4: GroupBy may be undefined, or defined from star imports
(F405)
8-8: Aggregation may be undefined, or defined from star imports
(F405)
10-10: Operation may be undefined, or defined from star imports
(F405)
17-17: GroupBy may be undefined, or defined from star imports
(F405)
api/py/ai/chronon/repo/run.py
894-894: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🔇 Additional comments (143)
api/py/test/test_run.py (3)
48-48: Formatting improvement.Added blank line after context function.
220-222: Reformatted assertion for readability.Multiline format improves readability.
232-234: Reformatted assertion for readability.Consistent with previous formatting change.
spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (1)
99-101: Improved code indentation.Better aligned indentation for the serializer chain.
orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala (2)
3-3: Consolidated imports.Combined imports from same package.
38-38: Simplified code by removing unnecessary Table object creation.Direct method call is cleaner.
api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill (1)
1-59: New configuration file aligns with PR objectives.Clean implementation of the new configuration approach with direct executionInfo support.
Key points:
- Proper metadata structure
- Historical backfill properly configured
- Clear aggregation operations
- Includes derivations with wildcard support
api/py/ai/chronon/repo/run.py (2)
69-74: Style improvement to dictionary formatting.Improved readability with multiline structure.
344-355: Updated environment configuration to use new executionInfo structure.Replaces deprecated modeToEnvMap with executionInfo.env approach, maintaining backward compatibility through fallback.
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)
9-24: New parent join definition looks good.Clean implementation of the parent join with appropriate parameters.
🧰 Tools
🪛 Ruff (0.8.2)
9-9:
Joinmay be undefined, or defined from star imports(F405)
12-12:
JoinPartmay be undefined, or defined from star imports(F405)
16-16:
JoinPartmay be undefined, or defined from star imports(F405)
api/py/ai/chronon/types.py (2)
10-10: Good addition of common types import.Now exposes common.ttypes.
52-56: Appropriate exposure of new configuration types.These new types support the PR goal of direct configuration support.
api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1)
10-13: Streamlined configuration with executionInfo.Removes deprecated fields in favor of structured execution information.
api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1 (1)
1-50: New configuration file looks good.Clean JSON structure with appropriate metadata, sources, key columns, and aggregations.
api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)
4-28: GroupBy configuration is well-structured.Contains appropriate sources, keys, aggregations, and metadata configuration.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
GroupBymay be undefined, or defined from star imports(F405)
5-5:
JoinSourcemay be undefined, or defined from star imports(F405)
7-7:
Querymay be undefined, or defined from star imports(F405)
8-8:
selectsmay be undefined, or defined from star imports(F405)
18-18:
Aggregationmay be undefined, or defined from star imports(F405)
18-18:
Operationmay be undefined, or defined from star imports(F405)
20-20:
Accuracymay be undefined, or defined from star imports(F405)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py (2)
4-15: GroupBy configuration is well-structured.First group by correctly aggregates with a 7-day window sum.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
GroupBymay be undefined, or defined from star imports(F405)
8-8:
Aggregationmay be undefined, or defined from star imports(F405)
10-10:
Operationmay be undefined, or defined from star imports(F405)
17-22: GroupBy with null aggregation.Confirm that setting aggregations=None is intentional and supported by the GroupBy implementation.
🧰 Tools
🪛 Ruff (0.8.2)
17-17:
GroupBymay be undefined, or defined from star imports(F405)
api/py/test/sample/group_bys/sample_team/sample_group_by.py (4)
16-16: Import condensed to single line.Import statements now appear on a single line for cleaner code.
29-30: Added missing comma in table_properties dictionary.Fixed formatting by adding a trailing comma after the description value.
32-32: Added online=True parameter.Added parameter to specify online serving capability.
46-48: Derivations list reformatted.Each Derivation now appears on its own line for improved readability.
api/py/test/sample/group_bys/sample_team/event_sample_group_by.py (1)
23-23: Simplified Aggregation configuration.Removed redundant tags parameter, keeping only essential windows parameter.
api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1 (1)
1-47: New configuration file added.New JSON configuration for sample_group_by_group_by with executionInfo structure replacing legacy fields, aligning with PR objectives.
api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1)
26-26: Looks good - imported variable replaces locally defined.The change aligns with the PR's goal to remove unused API flags, importing label_part_group_by instead of defining it locally.
api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3)
12-14: Clean metadata structure with executionInfo object.Removed customJson/dependencies in favor of executionInfo containing scheduleCron.
37-40: Consolidated execution parameters into executionInfo.Replaced offlineSchedule with executionInfo object containing scheduleCron and historicalBackfill.
108-112: Consistent executionInfo pattern applied here too.Same pattern as previous groupBy sections, maintaining consistency throughout the file.
api/src/main/scala/ai/chronon/api/Builders.scala (1)
290-292: Encapsulated execution settings into ExecutionInfo object.Removed dependencies handling and moved historicalBackfill into ExecutionInfo object, aligning with PR cleanup goals.
orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala (2)
5-5: Simplified imports with consolidated statement.Cleaner import syntax while maintaining the same functionality.
63-63: Removed redundant Table object creation.Directly sets table variable in result.setTable(), eliminating unnecessary object instantiation.
orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala (1)
3-3:✅ Verification successful
New import added but usage not visible
The TableDependency import was added but no implementation is visible in this snippet.
🏁 Script executed:
#!/bin/bash # Check if TableDependency is used elsewhere in the file cat orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala | grep -n "TableDependency"Length of output: 200
Usage Verified: The imported
TableDependencyis now used in thetableDependenciesmethod (line 9). No further action is required.api/py/test/test_group_by.py (3)
254-254: Direct tag accessAccessing tag directly instead of through customJson.
272-272: Boolean changed to stringSame change as line 252 - boolean to string conversion.
281-281: Direct tag accessSame pattern as line 254 - directly accessing tag.
orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala (1)
23-23:❓ Verification inconclusive
Method implementation removed
Replaced with
???placeholder which throws NotImplementedError.
🏁 Script executed:
#!/bin/bash # Check if this method is called anywhere rg "StagingQueryNodeImpl.*parents" --type scalaLength of output: 49
Review Alert: Verify
parentsUsageThe initial script produced no results. To be sure no call sites exist, please run the following script to search for invocations of the
.parentsmethod across Scala files:#!/bin/bash # Find invocations of '.parents' in Scala files. rg "\.parents\s*\(" --type scalaapi/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1 (1)
10-13: Replace offline schedule with executionInfoAdded structured executionInfo to replace removed offline scheduling.
api/py/test/sample/joins/sample_team/sample_join.py (6)
15-15: New import for environment variables structure.Added import for
EnvironmentVariablesto support new environment configuration approach.
26-27: Removed tags parameter and added table properties.Experimental tag removed as part of API cleanup. Table properties now directly configured.
29-32: Environment configuration structure updated.Replaced
envwith structuredenv_varsusingEnvironmentVariablesclass. Addedonline=Trueparameter.
37-39: Removed tags and updated schedule format.Experimental tag removed and schedule string now uses double quotes for consistency.
50-50: Removed experimental tag.Experimental tag removed as part of API cleanup.
57-57: Removed experimental tag.Experimental tag removed as part of API cleanup.
api/py/test/sample/joins/sample_team/sample_label_join.py (2)
29-29: Import external group by instead of local definition.Now importing
label_part_group_by_2from dedicated module, removing duplicate definition.
46-46: Updated to use imported group by.Using imported
label_part_group_by_2instead of locally defined version.api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (2)
9-14: Restructured execution configuration.Removed
offlineSchedulein favor of newexecutionInfostructure withscheduleCron. Reorganized metadata fields.
36-46: Consistent execution configuration in nested group by.Applied same
executionInfostructure to nested group by metadata, replacing custom fields and dependencies.api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1 (1)
9-13: Standardized execution configuration.Added structured
executionInfoobject with schedule and backfill settings, replacing custom fields and dependencies.api/py/test/test_utils.py (2)
292-293: Updated table naming convention.Table name updated to new format that includes version in namespace.
310-310: Updated join table name format.Join table name now properly references chaining_group_by_v1.
orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (2)
15-15: Removed dependency handling.Dependencies replaced with empty sequence, aligned with PR objective to eliminate unused dependencies.
27-27: Simplified table name setting.Direct assignment is cleaner than creating unnecessary Table object.
api/py/ai/chronon/cli/compile/parse_teams.py (3)
4-9: Updated imports for new ExecutionInfo structure.Added ExecutionInfo import to support the new configuration approach.
72-85: Moved configuration to ExecutionInfo structure.Environment and configuration are now properly encapsulated in ExecutionInfo.
113-127: Improved mode map merging logic.More robust handling when result is None, with proper deep copying.
api/py/test/sample/joins/sample_team/sample_join_from_module.py (2)
28-28: Removed unnecessary spaces.Consistent formatting for parameter assignment.
32-37: Updated to double quotes and removed unused parameters.Consistent string quoting and removal of additional_args/additional_env parameters.
api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1 (2)
12-15: Confirm zero usage for historicalBackfill.
Zero can be confusing if a boolean is expected.
84-108: Validate operation codes.
Ensure the numeric values (7, 12, etc.) align with your aggregator logic.api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0 (3)
4-5: Check consistent usage of team/outputNamespace.
Looks aligned with other configurations.
12-14: Verify nothing else relies on the removed offlineSchedule.
New executionInfo block looks fine.
37-40: Check groupBy executionInfo.
Again, confirm zero for historicalBackfill if intended as a boolean.api/src/main/scala/ai/chronon/api/Extensions.scala (1)
854-855: Guard executionInfo usage.
Handle null executionInfo gracefully when reading historicalBackfill.api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (5)
9-14: Clean metadata structure.Replaced various metadata flags with a consistent executionInfo object containing scheduleCron setting.
41-45: Good consistency in joinParts metadata.Standardized execution configuration with scheduleCron and historicalBackfill.
106-110: Consistent structure maintained.Same pattern applied to all groupBy metadata sections.
183-189: Label part metadata updated properly.Updated name from "sample_label_group_by" to fully qualified "sample_team.label_part_group_by.label_part_group_by" and standardized execution info.
231-233: Simplified labelParts metadata.Removed dependencies and moved scheduling to executionInfo structure.
api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by (2)
9-14: Clean metadata restructuring.Replaced custom fields with standardized executionInfo object.
42-46: Consistent joinParts metadata pattern.Properly configured executionInfo with both scheduleCron and historicalBackfill.
api/py/test/sample/joins/sample_team/sample_chaining_join.py (2)
20-20: Simplified import structure.Now directly imports chaining_group_by_v1 instead of defining it locally.
24-34: Join definition simplified.Removed additional_args and additional_env parameters, aligning with PR objective to clean up unused API flags.
api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3)
9-14: Consistent metadata structure.Same clean pattern of replacing custom fields with executionInfo.
41-45: Properly standardized joinParts.First groupBy correctly updated with executionInfo structure.
106-110: Consistent metadata in second groupBy.Maintains pattern consistency throughout the configuration.
api/py/test/sample/production/joins/sample_team/sample_join.consistency_check (2)
10-16: Configuration structure improved.Streamlined metadata with executionInfo replacing customJson and dependencies.
44-49: ExecutionInfo consistently applied.Matches parent structure pattern.
api/py/ai/chronon/utils.py (2)
374-374: Updated execution schedule reference.Now reads from the new executionInfo structure.
405-420: Join modes logic simplified.Clean removal of dependencies references.
api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2)
9-14: Configuration structure improved.Streamlined with executionInfo pattern.
42-47: Consistent structure in joinParts.Matches parent pattern and properly configures scheduling.
api/py/test/sample/production/joins/sample_team/sample_join.never (2)
9-14: Special case handling with @Never.Properly configured for non-execution.
42-47: Nested configuration properly structured.Child groupBy maintains consistent pattern.
api/py/test/sample/production/joins/sample_team/sample_join.v1 (2)
12-19: Improved execution configuration structure.The introduction of a structured
executionInfoobject improves configuration management with explicit environment variable handling for different execution modes.
47-52: Consistent metadata structure in group by section.Same execution info pattern applied consistently to the group by section, with explicit historical backfill configuration.
api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join (3)
11-11: Added consistency check capability.Added
consistencyCheckflag which wasn't mentioned in other files or PR description.Is this intended to be a common feature across all join configurations?
14-17: Standardized execution configuration.Execution configuration now follows consistent pattern with dedicated structure.
45-48: Consistent execution info across all group by sections.Configuration pattern is applied consistently throughout all metadata sections.
Also applies to: 110-113
api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (2)
9-9: Different online settings between metadata levels.Main metadata has
online: 0while group by metadata hasonline: 1. This difference may be intentional but could lead to confusion.Is this difference in online status between levels intentional?
Also applies to: 42-42
12-14: Simplified execution info without environment variables.This configuration uses minimal execution info with just scheduleCron, omitting environment variables that are present in other files.
api/py/ai/chronon/group_by.py (4)
434-447: Updated GroupBy function signature to remove dependencies.Removed legacy parameters (
dependencies,env,lag) and added new parameters for direct configuration support (conf,env_vars,step_days,disable_historical_backfill).
558-576: Well-documented new parameters.Documentation clearly explains the configuration layering priority for both conf and env_vars parameters.
637-643: Replaced dependencies list with ExecutionInfo object.New ExecutionInfo structure consolidates execution configuration in a single object.
658-661: Updated metadata structure.Metadata now uses the ExecutionInfo object and properly handles tags.
api/thrift/orchestration.thrift (3)
113-113: Switch to MERGE
Cleaner enum.
166-166: No functional change
Just a blank line.
174-174: Renamed lineageHash to version
Clearer naming.api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (8)
4-5: Team & namespace updated
Looks consistent.
9-11: online/production/consistencyCheck
Confirm if this combo is intentional.
13-13: New consistencySamplePercent
Seems fine.
15-16: Daily schedule
Works for regular updates.
37-38: Renamed name/team
More explicit naming.
44-49: Execution info
Good placement.
58-59: ConsistencyCheck & daily schedule
Validate if 1 is correct.Also applies to: 61-64
86-92: GroupBy metadata
All good.api/py/test/sample/production/joins/sample_team/sample_label_join.v1 (7)
4-5: Team & namespace
Consistent update.
9-10: online/production
Double-check environment usage.
12-14: Added executionInfo
Nice injection of scheduling.
36-37: Meta updates
Removes old fields, good.Also applies to: 41-45
101-102: Revised outputNamespace & executionInfo
Matches new style.Also applies to: 106-110
183-183: Name, online, and schedule
Clear separation.Also applies to: 185-189
216-218: Weekly schedule
Check if it conflicts with daily.api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4)
42-45: Verify consistent historicalBackfill configurationThe executionInfo structure now includes historicalBackfill flag set to 0.
107-110: Consistent configuration in nested structuresExecutionInfo structure consistently applied to nested GroupBy configurations.
166-169: Ensure all JobParts follow the same patternThe consistent pattern of removing dependencies and adding executionInfo is maintained across all JobParts.
12-14:✅ Verification successful
Confirm the replacement of dependencies with executionInfo is properly documented
The removed dependencies and customJson fields have been replaced with executionInfo structure that now handles scheduling via scheduleCron.
🏁 Script executed:
#!/bin/bash # Check if there are any remaining references to dependencies field in other JSON files rg -g "*.v[0-9]" -g "*.json" --json "dependencies" | jq 'select(.type == "match") | .data.path.text'Length of output: 7453
Replacement Verified:
The production file (api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2) correctly uses the newexecutionInfostructure. Occurrences of"dependencies"in test files and configuration files (e.g., package locks) are expected and not relevant to this change. The removal ofdependenciesandcustomJsonin favor ofexecutionInfois properly documented.api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3)
12-14: Consistent metadata structure transformationExecutionInfo replaces former dependencies and customJson fields appropriately.
42-45: Check historicalBackfill default settingsThe historical backfill is consistently disabled across configurations.
107-110: Consistent application of executionInfo patternThe pattern is consistently applied to all nested GroupBy configurations.
api/thrift/common.thrift (3)
28-33: Well-structured environment variables groupingGood organization of environment variables into logical categories.
35-40: Consistent structure between ConfigProperties and EnvironmentVariablesConfigProperties mirrors EnvironmentVariables structure, maintaining consistency.
42-69: Comprehensive TableDependency structureThe TableDependency structure provides a complete set of parameters for defining table dependencies, with good documentation.
api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1 (3)
12-14: Consistent pattern of removing unused fieldsThe removal of dependencies and customJson is consistent with other files.
42-45: Check scheduleCron format consistencyThe @daily cron format is consistently used across configurations.
107-110: Consistent configuration in final nested structureThe pattern is consistently applied to all GroupBy configurations in this file too.
api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (5)
4-5: Team and namespace now explicitly defined.The team and namespace are now explicitly defined at the top level, ensuring consistency.
9-16: Replaced dependency management with executionInfo structure.Dependencies and customJson fields removed, replaced with more structured executionInfo object containing scheduleCron.
38-47: Consistent metadata structure across all group by configurations.Standardized team, namespace, and executionInfo structure replacing loose dependency configuration.
103-112: Unified execution configuration style.Consistent use of executionInfo pattern for scheduling configuration.
162-171: Standardized configuration pattern.All groupBy configurations now follow the same pattern with executionInfo structure.
api/thrift/api.thrift (3)
238-240: Updated metadata documentation.Documentation clarifies that these are configuration parameters that don't affect output content.
262-284: Field renumbering for better organization.Fields are logically grouped with numbering gaps to allow for future additions.
485-486: Simplified environment and config references.Team now references common environment and config property structures.
api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1 (5)
3-5: Updated naming scheme.Name updated to use more structured versioning format, with consistent team and namespace.
12-15: Standardized execution configuration.Clear scheduleCron and historicalBackfill settings in executionInfo structure.
22-35: Consistent metadata structure.Join metadata follows the same pattern with team, namespace, and executionInfo.
58-63: Unified groupBy configuration.All groupBy elements now use the same configuration pattern.
120-124: Simplified configuration in nested components.Last groupBy also updated to use executionInfo structure.
api/py/ai/chronon/join.py (6)
202-203: Simplified ExternalSource metadata.Removed customJson parameter, streamlining the metadata structure.
292-296: Updated LabelParts to use ExecutionInfo.Now uses common ExecutionInfo structure for scheduling configuration.
397-419: Streamlined Join parameters.Removed dependencies, additional_args, additional_env in favor of structured conf, env_vars, and step_days parameters.
491-509: Improved parameter documentation.Clear documentation of new parameters explaining layering priority for configuration and environment variables.
564-570: Structured execution configuration.Creates an ExecutionInfo object with all necessary execution parameters.
578-581: Simplified metadata construction.Uses executionInfo instead of scattered configuration parameters.
api/py/ai/chronon/repo/run.py
Outdated
| def generate_dataproc_submitter_args( | ||
| user_args: str, | ||
| job_type: DataprocJobType = DataprocJobType.SPARK, | ||
| local_files_to_upload_to_gcs: List[str] = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix mutable default argument.
Empty list as default argument could cause unexpected behavior.
-def generate_dataproc_submitter_args(
- user_args: str,
- job_type: DataprocJobType = DataprocJobType.SPARK,
- local_files_to_upload_to_gcs: List[str] = [],
-):
+def generate_dataproc_submitter_args(
+ user_args: str,
+ job_type: DataprocJobType = DataprocJobType.SPARK,
+ local_files_to_upload_to_gcs: List[str] = None,
+):
+ if local_files_to_upload_to_gcs is None:
+ local_files_to_upload_to_gcs = []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local_files_to_upload_to_gcs: List[str] = [], | |
| def generate_dataproc_submitter_args( | |
| user_args: str, | |
| job_type: DataprocJobType = DataprocJobType.SPARK, | |
| local_files_to_upload_to_gcs: List[str] = None, | |
| ): | |
| if local_files_to_upload_to_gcs is None: | |
| local_files_to_upload_to_gcs = [] | |
| # Rest of the function implementation... |
🧰 Tools
🪛 Ruff (0.8.2)
894-894: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
| # check if staging_output_dir exists | ||
| staging_dir = self.compile_context.staging_output_dir() | ||
| output_dir = self.compile_context.output_dir() | ||
| shutil.rmtree(output_dir) | ||
| shutil.move(staging_dir, output_dir) | ||
| if os.path.exists(staging_dir): | ||
| # replace staging_output_dir to output_dir | ||
| output_dir = self.compile_context.output_dir() | ||
| if os.path.exists(output_dir): | ||
| shutil.rmtree(output_dir) | ||
| shutil.move(staging_dir, output_dir) | ||
| else: | ||
| print( | ||
| f"Staging directory {staging_dir} does not exist. " | ||
| "Happens when every chronon config fails to compile or when no chronon configs exist." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Added safety check for staging directory existence.
Improves error handling by checking if staging directory exists before attempting operations, with informative error message when missing.
| ) | ||
| ], | ||
| tags={"to_deprecate": True}, | ||
| tags={"to_deprecate": "true"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Boolean changed to string
Changed from boolean True to string "true".
🌐 Web query:
Does the Python Thrift library handle boolean values differently than string values when serializing to JSON?
💡 Result:
The Python Thrift library does handle boolean values differently than string values when serializing to JSON, primarily due to the type-specific encoding enforced by Thrift's TJSONProtocol. This protocol ensures that each data type is distinctly represented in the JSON output:
-
Boolean Values
Thrift serializes boolean values (true/false) as JSON boolean literals. For example:{"is_valid": true} -
String Values
Strings are serialized as JSON strings (with quotes):{"name": "Alice"}
Key Considerations:
- Type Fidelity: Thrift’s TJSONProtocol maintains strict type separation between primitives like booleans and strings, avoiding ambiguity during deserialization[1][5].
- Compatibility Issues: Some Thrift implementations (e.g., older versions of Thriftpy) may deviate from Apache Thrift’s official JSON encoding, causing mismatches. For instance, a boolean might be incorrectly serialized as a string if protocol implementations are incompatible[4][7].
- Custom Handling: Tools like
thrift-converterexplicitly manage type conversions (e.g., enums as strings, sets as specially formatted objects), highlighting Thrift’s need for precise type mapping in JSON[7].
For reliable interoperability, ensure consistent protocol usage (e.g., TJSONProtocol for Apache Thrift) across all services[4][9].
Citations:
- 1: https://www.tutorialspoint.com/apache-thrift/apache-thrift-serialization.htm
- 2: https://diwakergupta.github.io/thrift-missing-guide/
- 3: https://dev.realworldocaml.org/json.html
- 4: Question about JSON protocol Thriftpy/thriftpy2#13
- 5: https://livebook.manning.com/book/programmers-guide-to-apache-thrift/chapter-5/
- 6: https://www.mongodb.com/resources/basics/json-and-bson
- 7: https://github.com/shiyuan/thrift-converter
- 8: https://thrift-tutorial.readthedocs.io/_/downloads/en/latest/pdf/
- 9: http://thrift-tutorial.readthedocs.io/en/latest/usage-example.html
- 10: https://diwakergupta.github.io/thrift-missing-guide/thrift.pdf
Fix Type Mismatch: Use Boolean, Not String
- In
api/py/test/test_group_by.pyat line 252, the tag value should be a boolean, not the string"true". - Thrift’s TJSONProtocol distinguishes boolean values from strings. Using
"true"breaks type fidelity. - Action: Change
to
tags={"to_deprecate": "true"},
tags={"to_deprecate": True},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
frontend/src/routes/[conf]/[name]/overview/+page.svelte(1 hunks)frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (15)
- GitHub Check: streaming_tests
- GitHub Check: streaming_tests
- GitHub Check: spark_tests
- GitHub Check: spark_tests
- GitHub Check: analyzer_tests
- GitHub Check: fetcher_tests
- GitHub Check: join_tests
- GitHub Check: groupby_tests
- GitHub Check: fetcher_tests
- GitHub Check: analyzer_tests
- GitHub Check: groupby_tests
- GitHub Check: join_tests
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte (1)
40-40: Updated property path to match new metadata structureThe change from 'offlineSchedule' to 'executionInfo.scheduleCron' reflects the PR's metadata restructuring.
frontend/src/routes/[conf]/[name]/overview/+page.svelte (1)
371-391: Enhanced tooltip metadata displayThe implementation now supports both direct and nested properties for better metadata representation.
| ? data.value.conf.metaData?.[prop] !== undefined | ||
| : getNestedProperty(data.value.conf.metaData, prop.path) !== undefined} | ||
| <Tooltip.Item | ||
| {...tooltipProps.item} | ||
| label={typeof prop === 'string' ? prop : prop.display} | ||
| value={typeof prop === 'string' | ||
| ? data.value.conf.metaData?.[prop] | ||
| : getNestedProperty(data.value.conf.metaData, prop.path)} | ||
| valueAlign="left" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Verify getNestedProperty availability
Function is used but not defined in visible code.
🏁 Script executed:
#!/bin/bash
# Check if getNestedProperty function is defined in the file or imported
rg "function getNestedProperty|import.*getNestedProperty" frontend/src/routes/\[conf\]/\[name\]/overview/+page.svelteLength of output: 117
Action Required: Verify getNestedProperty Availability
The function getNestedProperty is used in frontend/src/routes/[conf]/[name]/overview/+page.svelte (lines 380–388) but isn’t defined or imported visibly in that file. Please confirm whether it’s defined or imported elsewhere in the repository (e.g., in a shared utilities module).
🏁 Script executed:
#!/bin/bash
# Search entire repository for function definition or import of getNestedProperty.
rg "function getNestedProperty" .
rg "import.*getNestedProperty" .Length of output: 192
Action: Missing getNestedProperty Definition
- In
frontend/src/routes/[conf]/[name]/overview/+page.svelte(lines 380–388),getNestedPropertyis used but isn’t defined or imported. - Please add the function definition or import it from the proper module.
5101b01 to
5863d26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (17)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)
7-7: Avoid wildcard importsUse explicit imports instead of
import *for better code clarity and maintainability.-from ai.chronon.types import * +from ai.chronon.types import Join, JoinPart🧰 Tools
🪛 Ruff (0.8.2)
7-7:
from ai.chronon.types import *used; unable to detect undefined names(F403)
api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)
1-1: Avoid wildcard importsUse explicit imports for better code clarity and maintainability.
-from ai.chronon.types import * +from ai.chronon.types import GroupBy, JoinSource, Query, selects, Aggregation, Operation, Accuracy🧰 Tools
🪛 Ruff (0.8.2)
1-1:
from ai.chronon.types import *used; unable to detect undefined names(F403)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1)
1-3: Consider using explicit imports instead of star imports.Replace
from ai.chronon.types import *with explicit imports of the needed types.from sources import test_sources -from ai.chronon.types import * +from ai.chronon.types import GroupBy, Aggregation, Operation🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from ai.chronon.types import *used; unable to detect undefined names(F403)
orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (1)
4-4: Unused import.
TableDependencyis imported but not used explicitly.api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (3)
4-5: Use booleans for clarity
Numeric flags foronlineandproductionmight be less readable than booleans.Also applies to: 9-10, 12-14
101-102: Check naming consistency
Confirm that newly updatedteamandoutputNamespacefollow naming standards.Also applies to: 106-110
183-183: Revisit repeated naming
sample_team.label_part_group_by.label_part_group_bymight be simplified.Also applies to: 185-185, 186-189
api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (1)
9-10: Production off
If no production usage, consider removing.api/py/ai/chronon/group_by.py (1)
434-448: Signature extended
Update docstring references.api/thrift/common.thrift (4)
28-33: Consider defaulting empty maps.
Optional maps can be null; defaulting them avoids extra null checks.
35-40: Clarify difference from EnvironmentVariables.
This struct seems quite similar. Consider merging or explaining usage.
71-74: Enhance doc for ALL vs LATEST.
More clarity here helps avoid incorrect partition scans.
87-106: Check poll intervals.
Ensure nonzero defaults for dependencyPollIntervalMillis/healthCheckIntervalMillis.api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (2)
37-47: Review daily scheduling.
Make sure scheduleCron and historicalBackfill are valid for expected data volumes.
162-171: Keep naming consistent.
Check that this group_by_with_kwargs meta matches prior references and key usage.api/thrift/api.thrift (1)
327-329: Comments on tags.
Short doc is fine, but confirm no leftover references to removed tag usage.api/py/ai/chronon/join.py (1)
409-409:onlineparam additionPlease update docstring accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (69)
api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(3 hunks)api/py/ai/chronon/group_by.py(3 hunks)api/py/ai/chronon/join.py(6 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/ai/chronon/repo/run.py(3 hunks)api/py/ai/chronon/types.py(2 hunks)api/py/ai/chronon/utils.py(2 hunks)api/py/test/sample/group_bys/sample_team/chaining_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/event_sample_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py(0 hunks)api/py/test/sample/group_bys/sample_team/label_part_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/sample_group_by.py(3 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join.py(1 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py(1 hunks)api/py/test/sample/joins/sample_team/sample_join.py(3 hunks)api/py/test/sample/joins/sample_team/sample_join_from_module.py(1 hunks)api/py/test/sample/joins/sample_team/sample_label_join.py(2 hunks)api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py(1 hunks)api/py/test/sample/joins/sample_team/sample_online_join.py(0 hunks)api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1(3 hunks)api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1(1 hunks)api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0(2 hunks)api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1(4 hunks)api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join.consistency_check(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by(0 hunks)api/py/test/sample/production/joins/sample_team/sample_join.never(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2(4 hunks)api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_label_join.v1(5 hunks)api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1(5 hunks)api/py/test/sample/production/joins/sample_team/sample_online_join.v1(4 hunks)api/py/test/sample/staging_queries/sample_team/sample_staging_query.py(0 hunks)api/py/test/test_group_by.py(3 hunks)api/py/test/test_join.py(0 hunks)api/py/test/test_run.py(3 hunks)api/py/test/test_utils.py(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(2 hunks)api/src/main/scala/ai/chronon/api/TilingUtils.scala(0 hunks)api/thrift/api.thrift(3 hunks)api/thrift/common.thrift(1 hunks)api/thrift/orchestration.thrift(2 hunks)frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte(2 hunks)orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala(2 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala(3 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(2 hunks)
💤 Files with no reviewable changes (6)
- api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py
- api/src/main/scala/ai/chronon/api/TilingUtils.scala
- api/py/test/sample/joins/sample_team/sample_online_join.py
- api/py/test/sample/staging_queries/sample_team/sample_staging_query.py
- api/py/test/test_join.py
- api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by
🚧 Files skipped from review as they are similar to previous changes (41)
- api/py/test/sample/group_bys/sample_team/event_sample_group_by.py
- api/py/ai/chronon/types.py
- api/py/test/sample/group_bys/sample_team/sample_group_by.py
- frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte
- api/py/test/test_run.py
- orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala
- orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala
- api/py/ai/chronon/repo/compile.py
- orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala
- api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill
- orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala
- api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1
- orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala
- api/py/ai/chronon/cli/compile/compiler.py
- orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala
- spark/src/main/scala/ai/chronon/spark/GroupBy.scala
- orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala
- api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1
- api/py/test/test_group_by.py
- api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
- api/src/main/scala/ai/chronon/api/Extensions.scala
- api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1
- api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill
- api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1
- api/py/ai/chronon/cli/compile/parse_teams.py
- api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join
- api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2
- api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1
- api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener
- api/py/test/test_utils.py
- api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1
- api/py/test/sample/production/joins/sample_team/sample_label_join.v1
- api/py/test/sample/joins/sample_team/sample_join_from_module.py
- api/py/test/sample/production/joins/sample_team/sample_join.v1
- api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0
- orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala
- api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1
- api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1
- api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1
- api/py/test/sample/joins/sample_team/sample_label_join.py
- api/py/test/sample/production/joins/sample_team/sample_join.consistency_check
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
1015-1015: SyntaxError: Expected a statement
1015-1015: SyntaxError: Expected a statement
1015-1016: SyntaxError: Expected a statement
1016-1016: SyntaxError: Unexpected indentation
api/py/test/sample/group_bys/sample_team/chaining_group_by.py
1-1: from ai.chronon.types import * used; unable to detect undefined names
(F403)
4-4: GroupBy may be undefined, or defined from star imports
(F405)
5-5: JoinSource may be undefined, or defined from star imports
(F405)
7-7: Query may be undefined, or defined from star imports
(F405)
8-8: selects may be undefined, or defined from star imports
(F405)
18-18: Aggregation may be undefined, or defined from star imports
(F405)
18-18: Operation may be undefined, or defined from star imports
(F405)
20-20: Accuracy may be undefined, or defined from star imports
(F405)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py
2-2: from ai.chronon.types import * used; unable to detect undefined names
(F403)
4-4: GroupBy may be undefined, or defined from star imports
(F405)
8-8: Aggregation may be undefined, or defined from star imports
(F405)
10-10: Operation may be undefined, or defined from star imports
(F405)
17-17: GroupBy may be undefined, or defined from star imports
(F405)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py
7-7: from ai.chronon.types import * used; unable to detect undefined names
(F403)
9-9: Join may be undefined, or defined from star imports
(F405)
12-12: JoinPart may be undefined, or defined from star imports
(F405)
16-16: JoinPart may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (16)
- GitHub Check: streaming_tests
- GitHub Check: streaming_tests
- GitHub Check: groupby_tests
- GitHub Check: analyzer_tests
- GitHub Check: groupby_tests
- GitHub Check: analyzer_tests
- GitHub Check: join_tests
- GitHub Check: spark_tests
- GitHub Check: spark_tests
- GitHub Check: fetcher_tests
- GitHub Check: join_tests
- GitHub Check: non_spark_tests
- GitHub Check: fetcher_tests
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: frontend_tests
🔇 Additional comments (86)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)
9-24: Join structure looks goodClean implementation of parent join with appropriate configuration parameters.
🧰 Tools
🪛 Ruff (0.8.2)
9-9:
Joinmay be undefined, or defined from star imports(F405)
12-12:
JoinPartmay be undefined, or defined from star imports(F405)
16-16:
JoinPartmay be undefined, or defined from star imports(F405)
api/py/test/sample/joins/sample_team/sample_join.py (5)
15-15: Good explicit importClear import of EnvironmentVariables type.
26-33: Improved configuration structureRemoved experimental tags and replaced environment parameters with structured EnvironmentVariables class.
37-40: Consistent updateRemoved tags and standardized string quote format.
50-51: Removed experimental tagClean removal of unnecessary experimental tag.
57-58: Removed experimental tagClean removal of unnecessary experimental tag.
api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py (1)
26-26: Imported instead of local definitionGood refactoring to import the label_part_group_by instead of defining it locally.
api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)
4-28: Well-structured GroupBy implementationClear definition with appropriate configuration for sources, keys, aggregations and properties.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
GroupBymay be undefined, or defined from star imports(F405)
5-5:
JoinSourcemay be undefined, or defined from star imports(F405)
7-7:
Querymay be undefined, or defined from star imports(F405)
8-8:
selectsmay be undefined, or defined from star imports(F405)
18-18:
Aggregationmay be undefined, or defined from star imports(F405)
18-18:
Operationmay be undefined, or defined from star imports(F405)
20-20:
Accuracymay be undefined, or defined from star imports(F405)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py (2)
4-15: LGTM - GroupBy definition is correct.Well-structured
GroupBydefinition with clear keys and aggregations setup.🧰 Tools
🪛 Ruff (0.8.2)
4-4:
GroupBymay be undefined, or defined from star imports(F405)
8-8:
Aggregationmay be undefined, or defined from star imports(F405)
10-10:
Operationmay be undefined, or defined from star imports(F405)
17-22: LGTM - Second GroupBy configuration is valid.Valid configuration with no aggregations specified.
🧰 Tools
🪛 Ruff (0.8.2)
17-17:
GroupBymay be undefined, or defined from star imports(F405)
api/src/main/scala/ai/chronon/api/Builders.scala (1)
290-292: Refactored historical backfill management.Successfully replaced direct property setting with the new
ExecutionInfoobject pattern.api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1 (3)
4-14: New metadata structure looks good.Successfully migrated to the new structure with
executionInfoobject.
37-40: GroupBy metadata properly updated.Correctly implemented new format with
executionInfocontainingscheduleCronandhistoricalBackfill.
103-112: EntityGroupBy metadata correctly updated.Successfully migrated to the new metadata structure with
executionInfo.orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala (2)
15-15: Temporary solution for dependencies.Replacing actual dependencies with empty sequence. TODO comment explains intent.
Is a future PR planned to address this TODO comment?
27-27: Simplified table name assignment.More direct approach to setting the table name.
api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1 (2)
36-37: Confirm usage ofhistoricalBackfill
Ensure this field is actually used downstream; remove if unnecessary.Also applies to: 41-45
231-233: Validate weekly schedule
Confirm@weeklysuits the intended frequency.api/py/test/sample/joins/sample_team/sample_chaining_join.py (1)
20-20: Confirm new import
Validate thatchaining_group_by_v1correctly replaces any previous join references.api/py/ai/chronon/utils.py (2)
374-374: Graceful fallback
Defaulting to@dailyis fine; ensure no unexpected scheduling needs are missed.
402-411: Check null references
Make suremetaData.executionInfois always set to avoid runtime errors.api/py/test/sample/production/joins/sample_team/sample_join.never (5)
4-5: No issues
9-10: Double-check usage
12-14: Cron check
36-37: No issues
42-47: Scheduling mismatch?
Check if daily schedule aligns with earlier "@Never" usage.api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (6)
4-5: Ok
12-14: Daily schedule
All good.
36-37: Ok
41-45: Might conflict
Ensure daily schedule aligns with top-level.
100-102: Ok
106-110: Execution details
Matches prior pattern.api/py/ai/chronon/group_by.py (2)
637-644: Exec info
No issues.
658-660: Metadata usage
Looks consistent.api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (5)
4-16: Great removal of unused fields and introduction of scheduling.
37-49: Consistent scheduling metadata added.
58-64: Execution info looks uniform.
87-92: Online config simplified properly.
101-110: Dependency cleanup is well done.api/thrift/orchestration.thrift (3)
113-113: Renamed enum value to MERGE.
166-166: Removal of extraneous attribute.
174-174: Changed lineageHash to version for clarity.api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1 (3)
4-14: Good consolidation into executionInfo.
36-46: Removed duplicates, added schedule.
100-110: Implementation aligns with streamlined scheduling.api/thrift/common.thrift (2)
42-69: Document forceCompute overhead.
If forceCompute is expensive, clarify usage to prevent unintended overhead.
76-85: Mention keyBase64 format.
Clarify how keys are encoded/decoded to ensure consistency.api/py/test/sample/production/joins/sample_team/sample_online_join.v1 (2)
4-16: Validate new meta fields.
Check that online, production, and consistency settings align with actual usage.
100-112: Confirm consistency with left event logic.
Ensure these groupBy parts won’t conflict with prior partial data or leftover fields.api/thrift/api.thrift (1)
485-486: Check duplication.
env and conf might conflict with any local fields. Keep them consistent.api/py/ai/chronon/join.py (9)
202-202: StreamlinedExternalSourcemetadataDropping
custom_jsonis consistent with the cleanup objective.
292-295: Initialized label schedule metadataStraightforward code for scheduling.
399-405: Expanded Join signature with new paramsValidate usage in tests to ensure no regressions.
411-411: Optional consistency checkConfirm
Nonedefault logic is tested.
414-414:offline_scheduledefaultLooks fine.
416-418: Added conf, env_vars, step_daysHelps modularize environment and scheduling.
564-569: exec_info for offline schedulingImplementation is clear.
578-578: consistencyCheck assignmentAll good.
580-580: executionInfo assignmentNo issues noted.
api/py/ai/chronon/repo/run.py (27)
22-26: New imports for modulesAll fine.
29-39: GCP-specific importsAppropriate for Dataproc integration.
40-44: Utils import blockNo concerns.
48-50: ONLINE_ARGS, OFFLINE_ARGS, ONLINE_WRITE_ARGS definitionsClear constants.
52-52: ONLINE_OFFLINE_WRITE_ARGSSimple combination of existing constants.
53-59: ONLINE_MODES arrayStraightforward.
60-76: SPARK_MODES listProper enumeration of modes.
77-77: MODES_USING_EMBEDDEDNo issues.
79-79: Comment for spark versionsNo problem.
80-80: SUPPORTED_SPARK arrayCovers major versions.
81-81: SCALA_VERSION_FOR_SPARK mappingMakes sense.
87-108: MODE_ARGS dictionaryWell-structured argument templates.
110-142: ROUTES dictionaryMapping looks consistent.
144-144: UNIVERSAL_ROUTESContains generic modes.
150-155: DATAPROC_ENTRY and jar constantsHelpful for GCP deployments.
Also applies to: 157-157
160-163: DataprocJobType enumStraightforward enum.
165-187: retry_decoratorSimple retry pattern.
189-194: custom_json functionConcise implementation.
196-199: check_call functionLooks good.
201-204: check_output functionNo issues.
206-233: download_only_once functionNicely handles caching.
235-285: download_jar functionChecks Spark and Scala versions.
287-289: get_teams_json_file_pathTrivial utility.
291-437: set_runtime_env functionProvides layered environment resolution nicely.
1003-1013: Dataproc check blockBranching logic is sound so far.
1014-1014: Setting jar_path tochronon_jarNo immediate issues.
1015-1022:⚠️ Potential issueSyntax error with second
else:This causes a parsing error. Consider removing or rewriting:
else: - jar_path = chronon_jar - else: + elif some_other_condition: service_jar_path = download_zipline_jar(Likely an incorrect or invalid review comment.
🧰 Tools
🪛 Ruff (0.8.2)
1015-1015: SyntaxError: Expected a statement
1015-1015: SyntaxError: Expected a statement
1015-1016: SyntaxError: Expected a statement
1016-1016: SyntaxError: Unexpected indentation
| /** | ||
| * contains configs params that don't change the contents of the output. | ||
| **/ | ||
| struct MetaData { | ||
| 1: optional string name | ||
|
|
||
|
|
||
| 2: optional string team | ||
|
|
||
| // will be set by the compiler based on changes to column lineage - do not manually set | ||
| 3: optional string version | ||
|
|
||
| 4: optional string outputNamespace | ||
|
|
||
| 5: optional map<string, string> tableProperties | ||
|
|
||
| // tag_key -> tag_value - tags allow for repository wide querying, deprecations etc | ||
| // this is object level tag - applies to all columns produced by the object - GroupBy, Join, Model etc | ||
| 6: optional map<string, string> tags | ||
| // column -> tag_key -> tag_value | ||
| 7: optional map<string, map<string, string>> columnTags | ||
|
|
||
| // marking this as true means that the conf can be served online | ||
| // once marked online, a conf cannot be changed - compiling the conf won't be allowed | ||
| 2: optional bool online | ||
| 100: optional bool online | ||
|
|
||
| // marking this as true means that the conf automatically generates a staging copy | ||
| // this flag is also meant to help a monitoring system re-direct alerts appropriately | ||
| 3: optional bool production | ||
| 4: optional string customJson | ||
| 5: optional list<string> dependencies | ||
| 6: optional map<string, string> tableProperties | ||
| // todo: add sanity check in materialize script | ||
| 7: optional string outputNamespace | ||
| // team name for the job | ||
| 8: optional string team | ||
| // modes - backfill, upload, streaming | ||
| // join streaming makes sense & join upload probably also makes sense | ||
| // (These just aren't implemented yet) | ||
| // The inner map should contain environment variables | ||
|
|
||
| // DEPRECATED | ||
| 9: optional map<string, map<string, string>> modeToEnvMap | ||
| 101: optional bool production | ||
|
|
||
| 102: optional string sourceFile | ||
|
|
||
| // users can put anything they want in here, but the compiler shouldn't | ||
| 103: optional string customJson | ||
|
|
||
| // enable job to compute consistency metrics | ||
| 10: optional bool consistencyCheck | ||
| 200: optional bool consistencyCheck | ||
|
|
||
| // percentage of online serving requests to log to warehouse | ||
| 11: optional double samplePercent | ||
| // cron expression for airflow DAG schedule | ||
| 12: optional string offlineSchedule | ||
| 201: optional double samplePercent | ||
|
|
||
| // percentage of online serving requests used to compute consistency metrics | ||
| 13: optional double consistencySamplePercent | ||
| // Flag to indicate whether join backfill should backfill previous holes. | ||
| // Setting to false will only backfill latest single partition | ||
| 14: optional bool historicalBackfill | ||
| 202: optional double consistencySamplePercent | ||
|
|
||
| // specify how to compute drift | ||
| 15: optional observability.DriftSpec driftSpec | ||
|
|
||
| 16: optional EnvironmentVariables env | ||
| 17: optional ConfigProperties conf | ||
| 203: optional observability.DriftSpec driftSpec | ||
|
|
||
| 18: optional string sourceFile | ||
| # information that needs to be present on every physical node | ||
| 204: optional common.ExecutionInfo executionInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use optional fields cautiously.
Excess optional fields can lead to confusion; consider marking critical fields as required.
api/py/ai/chronon/repo/run.py
Outdated
| AwsRunner(ctx.params).run() | ||
| else: | ||
| raise ValueError(f"Unsupported cloud provider: {cloud_provider}") | ||
| Runner(ctx.params, os.path.expanduser(jar_path)).run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runner call with undefined cloud_provider
cloud_provider is used at line 1024 but never defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
api/py/ai/chronon/join.py (1)
396-581: 🛠️ Refactor suggestionRemove outdated doc references
Docstring still mentions removed params (e.g. additional_args). Keep it aligned with the new signature.🧰 Tools
🪛 Ruff (0.8.2)
514-514: Use
key not in dictinstead ofkey not in dict.keys()Remove
.keys()(SIM118)
🧹 Nitpick comments (8)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)
7-7: Avoid star imports.Replace with explicit imports for better code traceability.
-from ai.chronon.types import * +from ai.chronon.types import Join, JoinPart🧰 Tools
🪛 Ruff (0.8.2)
7-7:
from ai.chronon.types import *used; unable to detect undefined names(F403)
api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)
1-1: Avoid star imports.Replace with explicit imports for better code traceability.
-from ai.chronon.types import * +from ai.chronon.types import GroupBy, JoinSource, Query, selects, Aggregation, Operation, Accuracy🧰 Tools
🪛 Ruff (0.8.2)
1-1:
from ai.chronon.types import *used; unable to detect undefined names(F403)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py (1)
2-2: Avoid star imports.Replace with explicit imports for better code traceability.
-from ai.chronon.types import * +from ai.chronon.types import GroupBy, Aggregation, Operation🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from ai.chronon.types import *used; unable to detect undefined names(F403)
spark/src/test/resources/joins/team/example_join_failure.v1 (1)
5-8: Consider using boolean values.
Ifonlineandproductionare meant to be booleans, usingfalseinstead of0may improve clarity.5 "team": "relevance", 7- "online": 0, 8- "production": 0 + "online": false, + "production": falseapi/py/ai/chronon/repo/utils.py (1)
231-244: New execution info handlingAdded priority-based environment selection with new executionInfo structure in config JSON.
Consider adding a comment explaining the preference of new_env over old_env for future maintainers.
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (1)
64-71: Improved null handling in response map creationEnhanced safety with null checks when creating field name/value mappings.
Consider extracting this logic to a helper method to improve readability.
api/thrift/common.thrift (2)
28-40: Consider mandatory fields
All properties are optional. Some may need to be mandatory for consistent usage.
42-106: Test for partial usage
These new dependencies and execution info are broad. Please add unit tests covering partial and full usage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (78)
api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(3 hunks)api/py/ai/chronon/group_by.py(3 hunks)api/py/ai/chronon/join.py(6 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/ai/chronon/repo/run.py(1 hunks)api/py/ai/chronon/repo/utils.py(8 hunks)api/py/ai/chronon/types.py(2 hunks)api/py/ai/chronon/utils.py(2 hunks)api/py/test/sample/group_bys/sample_team/chaining_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/event_sample_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py(0 hunks)api/py/test/sample/group_bys/sample_team/label_part_group_by.py(1 hunks)api/py/test/sample/group_bys/sample_team/sample_group_by.py(3 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join.py(1 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py(1 hunks)api/py/test/sample/joins/sample_team/sample_join.py(3 hunks)api/py/test/sample/joins/sample_team/sample_join_from_module.py(1 hunks)api/py/test/sample/joins/sample_team/sample_label_join.py(2 hunks)api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py(1 hunks)api/py/test/sample/joins/sample_team/sample_online_join.py(0 hunks)api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1(3 hunks)api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill(1 hunks)api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1(1 hunks)api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0(2 hunks)api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1(4 hunks)api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join.consistency_check(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by(0 hunks)api/py/test/sample/production/joins/sample_team/sample_join.never(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2(4 hunks)api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1(2 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1(3 hunks)api/py/test/sample/production/joins/sample_team/sample_label_join.v1(5 hunks)api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1(5 hunks)api/py/test/sample/production/joins/sample_team/sample_online_join.v1(4 hunks)api/py/test/sample/staging_queries/sample_team/sample_staging_query.py(0 hunks)api/py/test/sample/teams.py(1 hunks)api/py/test/test_group_by.py(3 hunks)api/py/test/test_join.py(0 hunks)api/py/test/test_run.py(3 hunks)api/py/test/test_utils.py(2 hunks)api/src/main/scala/ai/chronon/api/Builders.scala(1 hunks)api/src/main/scala/ai/chronon/api/Extensions.scala(2 hunks)api/src/main/scala/ai/chronon/api/TilingUtils.scala(0 hunks)api/thrift/api.thrift(3 hunks)api/thrift/common.thrift(1 hunks)api/thrift/orchestration.thrift(2 hunks)frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte(2 hunks)online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala(1 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala(0 hunks)online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala(3 hunks)orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala(2 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala(3 hunks)orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala(1 hunks)orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala(2 hunks)scripts/distribution/run_zipline_quickstart.sh(2 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(2 hunks)spark/src/test/resources/group_bys/team/example_group_by.v1(1 hunks)spark/src/test/resources/joins/team/example_join.v1(2 hunks)spark/src/test/resources/joins/team/example_join_failure.v1(1 hunks)
💤 Files with no reviewable changes (7)
- online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
- api/src/main/scala/ai/chronon/api/TilingUtils.scala
- api/py/test/sample/staging_queries/sample_team/sample_staging_query.py
- api/py/test/sample/production/joins/sample_team/sample_join.group_by_of_group_by
- api/py/test/sample/joins/sample_team/sample_online_join.py
- api/py/test/test_join.py
- api/py/test/sample/group_bys/sample_team/group_by_with_kwargs.py
✅ Files skipped from review due to trivial changes (1)
- spark/src/test/resources/group_bys/team/example_group_by.v1
🚧 Files skipped from review as they are similar to previous changes (46)
- api/py/test/sample/group_bys/sample_team/event_sample_group_by.py
- api/py/test/sample/teams.py
- api/py/test/sample/joins/sample_team/sample_label_join_with_agg.py
- orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByPrepareUpload.scala
- api/py/test/sample/production/group_bys/sample_team/sample_group_by.v1
- orchestration/src/main/scala/ai/chronon/orchestration/utils/DependencyResolver.scala
- api/py/ai/chronon/cli/compile/compiler.py
- api/py/ai/chronon/repo/compile.py
- api/py/ai/chronon/types.py
- api/py/test/sample/production/joins/sample_team/sample_join_from_module.v1
- frontend/src/routes/[conf]/[name]/overview/EntityProperties.svelte
- orchestration/src/main/scala/ai/chronon/orchestration/physical/JoinBackfill.scala
- orchestration/src/main/scala/ai/chronon/orchestration/physical/TabularNode.scala
- api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.v1
- spark/src/main/scala/ai/chronon/spark/GroupBy.scala
- api/src/main/scala/ai/chronon/api/Builders.scala
- api/py/test/sample/joins/sample_team/sample_label_join.py
- api/py/ai/chronon/repo/run.py
- api/py/test/sample/production/group_bys/sample_team/sample_group_by_group_by.require_backfill
- api/py/test/sample/group_bys/sample_team/sample_group_by.py
- api/py/test/test_run.py
- api/py/test/sample/joins/sample_team/sample_join.py
- api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1
- orchestration/src/main/scala/ai/chronon/orchestration/physical/StagingQueryNode.scala
- orchestration/src/main/scala/ai/chronon/orchestration/physical/GroupByBackfill.scala
- orchestration/src/main/scala/ai/chronon/orchestration/logical/StagingQueryNodeImpl.scala
- api/py/test/sample/production/joins/sample_team/sample_label_join_with_agg.v1
- api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
- api/py/test/test_utils.py
- api/py/test/sample/joins/sample_team/sample_join_from_module.py
- api/py/test/sample/production/group_bys/sample_team/sample_chaining_group_by.chaining_group_by_v1
- api/py/test/test_group_by.py
- api/py/test/sample/production/joins/sample_team/sample_join.consistency_check
- orchestration/src/main/scala/ai/chronon/orchestration/physical/ComputeNodeRunner.scala
- api/src/main/scala/ai/chronon/api/Extensions.scala
- api/py/test/sample/production/group_bys/sample_team/sample_group_by.require_backfill
- orchestration/src/main/scala/ai/chronon/orchestration/physical/LabelJoin.scala
- api/py/test/sample/production/joins/sample_team/sample_label_join.v1
- api/py/test/sample/production/group_bys/sample_team/group_by_with_kwargs.v1
- api/py/test/sample/production/joins/sample_team/sample_backfill_mutation_join.v0
- api/py/test/sample/production/joins/sample_team/sample_join_derivation.v1
- api/py/test/sample/production/group_bys/sample_team/chaining_group_by.chaining_group_by_v1
- api/py/test/sample/production/joins/sample_team/sample_online_join.v1
- api/py/test/sample/production/joins/sample_team/sample_chaining_join_parent.parent_join
- api/py/test/sample/production/joins/sample_team/sample_join_with_derivations_on_external_parts.v1
- api/thrift/api.thrift
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/group_bys/sample_team/chaining_group_by.py
1-1: from ai.chronon.types import * used; unable to detect undefined names
(F403)
4-4: GroupBy may be undefined, or defined from star imports
(F405)
5-5: JoinSource may be undefined, or defined from star imports
(F405)
7-7: Query may be undefined, or defined from star imports
(F405)
8-8: selects may be undefined, or defined from star imports
(F405)
18-18: Aggregation may be undefined, or defined from star imports
(F405)
18-18: Operation may be undefined, or defined from star imports
(F405)
20-20: Accuracy may be undefined, or defined from star imports
(F405)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py
2-2: from ai.chronon.types import * used; unable to detect undefined names
(F403)
4-4: GroupBy may be undefined, or defined from star imports
(F405)
8-8: Aggregation may be undefined, or defined from star imports
(F405)
10-10: Operation may be undefined, or defined from star imports
(F405)
17-17: GroupBy may be undefined, or defined from star imports
(F405)
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py
7-7: from ai.chronon.types import * used; unable to detect undefined names
(F403)
9-9: Join may be undefined, or defined from star imports
(F405)
12-12: JoinPart may be undefined, or defined from star imports
(F405)
16-16: JoinPart may be undefined, or defined from star imports
(F405)
🪛 Shellcheck (0.10.0)
scripts/distribution/run_zipline_quickstart.sh
[warning] 63-63: CHRONON_ROOT appears unused. Verify use (or export if used externally).
(SC2034)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: spark_tests
- GitHub Check: join_tests
- GitHub Check: spark_tests
- GitHub Check: groupby_tests
- GitHub Check: join_tests
- GitHub Check: fetcher_tests
- GitHub Check: non_spark_tests
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (64)
api/py/test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1 (1)
10-13: New executionInfo object replaces deprecated fields.Added structured configuration for scheduling via executionInfo object.
api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v2 (4)
4-14: Metadata restructuring looks goodRepositioned team/outputNamespace fields and added executionInfo object with scheduleCron to replace removed customJson/dependencies fields.
36-46: Consistent metadata pattern appliedClean restructuring with team/outputNamespace repositioning and executionInfo replacing legacy fields.
101-111: Same pattern maintainedMetadata structure consistent with other sections.
160-170: Structure consistency confirmedFinal groupBy section follows same cleanup pattern.
api/py/test/sample/joins/sample_team/sample_chaining_join_parent.py (1)
9-24: Join configuration looks good.The join configuration properly follows the new structure without using deprecated parameters.
🧰 Tools
🪛 Ruff (0.8.2)
9-9:
Joinmay be undefined, or defined from star imports(F405)
12-12:
JoinPartmay be undefined, or defined from star imports(F405)
16-16:
JoinPartmay be undefined, or defined from star imports(F405)
api/py/test/sample/group_bys/sample_team/chaining_group_by.py (1)
4-28: GroupBy implementation looks good.The implementation correctly uses the simplified API structure without deprecated parameters.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
GroupBymay be undefined, or defined from star imports(F405)
5-5:
JoinSourcemay be undefined, or defined from star imports(F405)
7-7:
Querymay be undefined, or defined from star imports(F405)
8-8:
selectsmay be undefined, or defined from star imports(F405)
18-18:
Aggregationmay be undefined, or defined from star imports(F405)
18-18:
Operationmay be undefined, or defined from star imports(F405)
20-20:
Accuracymay be undefined, or defined from star imports(F405)
api/py/test/sample/group_bys/sample_team/label_part_group_by.py (2)
4-15: GroupBy implementation looks good.Implementation correctly uses the simplified API structure.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
GroupBymay be undefined, or defined from star imports(F405)
8-8:
Aggregationmay be undefined, or defined from star imports(F405)
10-10:
Operationmay be undefined, or defined from star imports(F405)
17-22: GroupBy implementation with null aggregations looks good.Second GroupBy correctly implements the pattern with null aggregations.
🧰 Tools
🪛 Ruff (0.8.2)
17-17:
GroupBymay be undefined, or defined from star imports(F405)
scripts/distribution/run_zipline_quickstart.sh (1)
23-23: SSH host alias usage looks correct.The modified GitHub URL format uses SSH host alias for authentication.
api/py/ai/chronon/cli/compile/parse_teams.py (3)
1-9: New imports look fine.
They correctly bring indeepcopyand the needed Thrift types.
72-84: Good approach to ensureExecutionInfois initialized.
Creatingmetadata.executionInfowhen absent improves consistency.
113-130: Refined_merge_mode_mapslogic.
Usingdeepcopyand clearingcommonensures safe merges without accidental mutations.online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (3)
75-77: IntermediatefileListvariable.
This adds clarity without changing core behavior.
80-82: Path validation is good.
Ensures files are loaded strictly under the intended directory.
86-86: Relative path usage.
Double-check that older references are unaffected by the shift from absolute to relative paths.api/py/test/sample/joins/sample_team/sample_chaining_join.py (1)
20-20: Importingchaining_group_by_v1.
This direct reference is valid and consistent.api/py/test/sample/production/joins/sample_team/sample_join.never (2)
11-14: Configuration structure modernized with executionInfo objectThe
executionInfoobject replaces the previousdependenciesandcustomJsonfields, providing a cleaner way to specify execution parameters.
42-47:❓ Verification inconclusive
Online status changed to 1 and execution info added
The online value changed from 0 to 1, and execution settings are now consolidated into the executionInfo object.
🏁 Script executed:
#!/bin/bash # Check if this online status change impacts other components rg "sample_team.sample_group_by.v1.*online" --type jsonLength of output: 57
Manual Verification Required: Confirm "online" Key Impact
The automated search returned no output. Please manually verify that no other components depend on the previous online value (0) and that the new
executionInfosettings are correctly integrated.
- Check any components referencing
"online":across the repository.- Confirm that no pipeline logic uses the former online status.
api/py/ai/chronon/utils.py (2)
374-374: Updated to reference the new executionInfo structureFunction now correctly references the new executionInfo.scheduleCron field instead of a direct property.
402-420: Simplified code flow with removal of dependency managementThe code now has cleaner control flow after the removal of dependency-related functions and fields.
api/py/test/sample/production/joins/sample_team/sample_join_bootstrap.v1 (3)
11-14: Replaced scheduling configuration with executionInfoThe new executionInfo object cleanly encapsulates scheduling information that was previously scattered across different fields.
41-45: Standardized execution configuration for group_byConsistent with other changes, replaced dependencies with structured executionInfo object.
106-110: Standardized execution configuration for entity_sample_group_bySimilar pattern applied across all components for consistency.
api/py/ai/chronon/group_by.py (4)
434-447: API signature improved with better parameter organizationRemoved unused parameters and added direct support for configuration and step days, aligning with PR objectives.
558-576: Added documentation for new parametersClear documentation added for the new configuration parameters, explaining the priority order for configuration and environment variables.
637-643: Replaced dependency management with ExecutionInfoConsolidated execution parameters into a single ExecutionInfo object, improving code organization.
658-661: Updated metadata structure in GroupByRemoved dependency fields and replaced with executionInfo, consistent with JSON structure changes.
api/py/ai/chronon/repo/utils.py (8)
63-65: Good switch to logging from printReplaced print with logging.info for better output management.
68-70: Good logging practiceConsistent use of logging instead of print statements.
81-83: Improved loggingStandardized on logging.info for better observability.
91-97: Better logging formatImproved log message structure using logging.info.
102-104: More descriptive loggingEnhanced log message with better context.
120-123: Improved assertion formattingBetter structured error message for unsupported spark versions.
130-131: Better log messageMore informative log for jar download source.
305-313: Switched to logging from printStandardized on logging.info for environment variable settings.
api/py/test/sample/production/joins/sample_team/sample_join.no_log_flattener (2)
12-14: Added executionInfo with scheduleCronReplaced removed dependencies and customJson with structured executionInfo object.
44-47: Added executionInfo with scheduleCron and historical backfillStructured metadata with executionInfo containing scheduling parameters.
online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (2)
115-124: Simplified control flowRemoved unnecessary output variable assignment and directly returns conditional results.
182-182: Added explicit return typeImproved type safety with explicit Array[Any] return type.
spark/src/test/resources/joins/team/example_join.v1 (2)
4-7: Simplified metadata structureRemoved unused fields (dependencies, customJson, tableProperties) and reordered properties.
24-25: Removed dependencies fieldStreamlined groupBy metadata by removing unused fields.
api/py/test/sample/production/joins/sample_team/sample_join.v1 (5)
4-5: Looks good.
Consistent team and namespace fields.
9-10: No issue.
Switching online to 1 and production to 0 is clear.
12-19: Execution info is well-structured.
ScheduleCron usage is concise.
41-42: Matches earlier pattern.
Naming consistent with team/namespace.
47-52: GroupBy config looks good.
Online and production align with approach; historicalBackfill is optional.api/py/test/sample/production/joins/sample_team/sample_join_external_parts.v1 (5)
4-5: All good.
Team and outputNamespace updated.
9-10: Clear.
Online remains 0, production is 0.
12-14: Execution info block is fine.
ScheduleCron set.
36-37: Team/outputNamespace alignment.
Same pattern as above.
42-47: Consistent.
Online is 1, new executionInfo with historicalBackfill.api/py/test/sample/production/joins/sample_team/sample_chaining_join.v1 (8)
4-5: Team/namespace.
Good consistency with approach.
9-11: Online, production, consistencyCheck.
Nicely structured.
13-16: executionInfo addition is short and clear.
ScheduleCron daily is standard.
37-39: GroupBy name re-labeled.
No conflicts.
44-49: Online, production, executionInfo.
Change is aligned with others.
58-64: Consistent usage.
Reaffirms daily schedule, historicalBackfill is 0.
87-92: Output namespace, online, executionInfo.
Minimal changes, no conflict.
149-153: Same pattern for entity groupBy.
Matches the rest.api/thrift/orchestration.thrift (2)
113-113: Check backward compatibility
Changing enum values might break older references. Verify usage in dependent modules.
174-174: Rename references across code
Renaming lineageHash to version is good. Please ensure all references are updated.api/py/ai/chronon/join.py (2)
202-202: Good usage of MetaData
No functional concerns.
292-295: Check offline schedule usage
Ensure label_offline_schedule is tested if job intervals are critical.
| DATAPROC_SUBMITTER_ID_STR="Dataproc submitter job id" | ||
|
|
||
| echo -e "${GREEN}<<<<<.....................................COMPILE.....................................>>>>>\033[0m" | ||
| CHRONON_ROOT=`pwd`/api/py/test/sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
CHRONON_ROOT variable appears unused.
The variable is defined but not referenced elsewhere in the script.
🏁 Script executed:
#!/bin/bash
# Check if CHRONON_ROOT is used elsewhere in the script
grep -n "CHRONON_ROOT" scripts/distribution/run_zipline_quickstart.shLength of output: 110
Unused Variable Removal: The grep output confirms that CHRONON_ROOT is only defined at line 63 with no further use. Please remove it or add a comment if it's intended for future use.
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 63-63: CHRONON_ROOT appears unused. Verify use (or export if used externally).
(SC2034)
## Summary we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready. added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a consolidated execution configuration that simplifies scheduling, environment, and backfill settings. - **Refactor** - Streamlined metadata and dependency management in data processing and join/group-by workflows. - Enhanced error handling for file operations to better manage missing directories and improve feedback. - **Frontend** - Updated the metadata display to now show scheduling details, providing clearer overview information. - **Tests** - Revised test cases to reflect the new naming conventions and metadata structure. - Updated assertions to simplify the access of metadata tags. - Removed obsolete tests related to dependency management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready. added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a consolidated execution configuration that simplifies scheduling, environment, and backfill settings. - **Refactor** - Streamlined metadata and dependency management in data processing and join/group-by workflows. - Enhanced error handling for file operations to better manage missing directories and improve feedback. - **Frontend** - Updated the metadata display to now show scheduling details, providing clearer overview information. - **Tests** - Revised test cases to reflect the new naming conventions and metadata structure. - Updated assertions to simplify the access of metadata tags. - Removed obsolete tests related to dependency management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready. added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a consolidated execution configuration that simplifies scheduling, environment, and backfill settings. - **Refactor** - Streamlined metadata and dependency management in data processing and join/group-by workflows. - Enhanced error handling for file operations to better manage missing directories and improve feedback. - **Frontend** - Updated the metadata display to now show scheduling details, providing clearer overview information. - **Tests** - Revised test cases to reflect the new naming conventions and metadata structure. - Updated assertions to simplify the access of metadata tags. - Removed obsolete tests related to dependency management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready. added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a consolidated execution configuration that simplifies scheduling, environment, and backfill settings. - **Refactor** - Streamlined metadata and dependency management in data processing and join/group-by workflows. - Enhanced error handling for file operations to better manage missing directories and improve feedback. - **Frontend** - Updated the metadata display to now show scheduling details, providing clearer overview information. - **Tests** - Revised test cases to reflect the new naming conventions and metadata structure. - Updated assertions to simplify the access of metadata tags. - Removed obsolete tests related to dependency management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready. added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a consolidated execution configuration that simplifies scheduling, environment, and baour clientsfill settings. - **Refactor** - Streamlined metadata and dependency management in data processing and join/group-by workflows. - Enhanced error handling for file operations to better manage missing directories and improve feedbaour clients. - **Frontend** - Updated the metadata display to now show scheduling details, providing clearer overview information. - **Tests** - Revised test cases to reflect the new naming conventions and metadata structure. - Updated assertions to simplify the access of metadata tags. - Removed obsolete tests related to dependency management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
we want to remove api flags that are not used & not usable beyond oss. if we don't remove these now, we will end up having to migrate these when the orchestrator is ready.
added direct support for conf and step days, removed dependencies, tableProps, old style env and customJson
Checklist
Summary by CodeRabbit
New Features
Refactor
Frontend
Tests