cli v2 #277

nikhil-zlai · 2025-01-27T07:29:21Z

Summary

Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New CLI Commands
- Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management.
Enhanced Logging
- Improved colored, structured log outputs for clearer real-time monitoring and debugging.
Configuration & Validation Upgrades
- Strengthened configuration management and validation processes to ensure reliable operations.
- Added a comprehensive validation framework for Chronon API thrift objects.
Build & Infrastructure Improvements
- Transitioned to a new container base and modernized testing/build systems for better performance and stability.
Team & Utility Enhancements
- Expanded team configuration options and refined utility processes to streamline overall workflows.
- Introduced new data classes and methods for improved configuration and compilation context management.
- Enhanced templating functionality for dynamic replacements based on object properties.
- Improved handling of Git operations and enhanced error logging for better traceability.

coderabbitai · 2025-01-27T07:29:32Z

Walkthrough

This update enhances the system’s configuration and compilation pipeline. New data classes and methods have been added for managing compile contexts, templating, validation, and serialization. The CLI now features commands for sync, backfill, and deploy, while Git utilities have been expanded for branch and change management. Additional improvements include colored logging, upgraded Thrift structures and enums, updated Docker and build configurations, and various test and formatting adjustments.

Changes

File(s)	Change Summary
`api/py/ai/chronon/cli/compile/compile_context.py`, `compiler.py`, `fill_templates.py`, `parse_configs.py`, `parse_teams.py`, `serializer.py`, `conf_validator.py`	Added new data classes, methods, and functions to manage configuration details, compilation context, templating, parsing, serialization, and validation logic.
`api/py/ai/chronon/cli/entrypoint.py`, `git_utils.py`	Introduced CLI commands (sync, backfill, deploy) and Git utility functions for branch retrieval and change detection.
`api/py/ai/chronon/group_by.py`, `join.py`	Removed method attachments (backfill, deploy, info/upload) from the group_by and join objects.
`api/py/ai/chronon/cli/logger.py`, `logging.py`	Added color-coded logging classes and utility functions for enhanced, visually distinct log output.
`api/thrift/api.thrift`, `api/thrift/common.thrift`	Added new Thrift structures and enum (`ConfigType`), and updated `MetaData` with an environment variables field.
`spark/.../SparkSessionBuilder.scala`, various Spark and Hub test files	Updated log formatting in Spark builder; commented out or reformatted tests to improve clarity and align with Bazel/SBT transitions.
`.github/image/Dockerfile`, `.github/workflows/*`, `.bazelrc`, `tools/build_rules/thrift/thrift.bzl`	Transitioned Docker base from Alpine to Ubuntu, switched build/test commands from SBT to Bazel, and adjusted build rule arguments and workflow configurations.
`api/py/test/sample/teams.py`, `api/py/test/sample/production/joins/...`	Added new team configurations and sample JSON configuration for join operations.
`api/py/test/test_git_utils.py`, `api/py/test/test_utils.py`, `api/py/test/test_validator.py`	Introduced new test utilities and fixtures while reformatting existing tests for better readability and consistent file-path handling.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant E as CLI Entrypoint
    participant G as GitUtils
    participant C as CompileContext
    participant P as Compiler
    participant L as Logger

    U->>E: Run CLI command (sync/backfill/deploy)
    E->>G: Get current branch
    G-->>E: Return branch name
    E->>C: Initialize compile context
    E->>P: Call compiler.main(...)
    P-->>E: Return compile results
    E->>L: Log results
    E-->>U: Output final result

Possibly related PRs

feat: support providing additional confs as yaml file for Driver.scala #164: Enhances configuration capabilities in Spark jobs, aligning closely with the new configuration management features.
Python compile and API skeleton/docs #113: Implements a compilation process that utilizes the CompileContext class, complementing the updates here.
Join and Group Bys for POC #30: Involves handling of configuration objects for join and groupBy processes, which connects to the current modifications.

Suggested reviewers

varant-zlai
piyush-zlai

Poem

In code we trust with vibrant hue,
Configs and logs now shine anew.
Commands and contexts take the stage,
Compiling dreams on a digital page.
Git and Thrift join this merry tune,
A merge of progress under a bright full moon.
Cheers to change in every line!

Tip

🌐 Web search-backed reviews and chat

We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
You can disable this feature by setting web_search: false in the knowledge_base settings.
Please share any feedback in the Discord discussion.

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 845949f and 3858d3d.

📒 Files selected for processing (1)

.github/workflows/test_scala_fmt.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/test_scala_fmt.yaml

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: no_spark_scala_tests
GitHub Check: other_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 17

🧹 Nitpick comments (15)

api/py/ai/chronon/cli/git_utils.py (1)

1-4: Remove unused import.
Set is never used.
-from typing import List, Optional, Set
+from typing import List, Optional
🧰 Tools

🪛 Ruff (0.8.2)

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

api/py/ai/chronon/cli/logging.py (1)

17-30: Add error handling for timestamp creation.

 def format(self, record):
-    time_str = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
+    try:
+        time_str = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
+    except (ValueError, TypeError, OSError) as e:
+        time_str = "??:??:??"
+        logger = logging.getLogger(__name__)
+        logger.error(f"Failed to format timestamp: {e}")

api/py/ai/chronon/cli/compile/compiler.py (1)

48-66: Add progress bar for better UX.

+        with tqdm(total=len(obj_dict), desc=f"Processing {config_info.cls}") as pbar:
             for name, obj in obj_dict.items():
                 errors = _write_config_or_return_errors(
                     name, obj, config_info, compile_context
                 )
                 if errors:
                     compile_result.error_dict[name] = errors
                 else:
                     compile_result.obj_dict[name] = obj
+                pbar.update(1)

api/py/ai/chronon/cli/compile/parse_teams.py (1)

11-28: Add error handling for module import failures.

Consider wrapping the module import in a try-except block to provide more informative error messages.

🧰 Tools

🪛 Ruff (0.8.2)

17-17: Undefined name importlib

(F821)

api/py/test/test_git_utils.py (1)

1-11: Remove unused import.
 import pytest
 import os
 import shutil
 import subprocess
-from pathlib import Path
 from ai.chronon.cli.git_utils import (
     get_current_branch,
     get_changes_since_commit,
     get_changes_since_fork,
 )
🧰 Tools

🪛 Ruff (0.8.2)

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

api/py/ai/chronon/cli/compile/serializer.py (5)

41-86: Consider adding type hints and docstrings.

The _convert method handles type conversion comprehensively but lacks documentation.

Add type hints and docstrings:

-    def _convert(self, val, ttype, ttype_info):
+    def _convert(self, val: Any, ttype: int, ttype_info: Tuple) -> Any:
+        """Convert JSON value to Thrift type.
+        
+        Args:
+            val: Value to convert
+            ttype: Thrift type constant
+            ttype_info: Type information tuple
+            
+        Returns:
+            Converted value
+            
+        Raises:
+            TypeError: If type is not recognized
+        """

🧰 Tools

🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

93-100: Add error handling for Thrift serialization.

The function should catch potential Thrift serialization errors.

 def json2binary(json_str, thrift_class):
-    thrift = json2thrift(json_str, thrift_class)
-    transport = TMemoryBuffer()
-    protocol = TBinaryProtocolAccelerated(transport)
-    thrift.write(protocol)
-    return transport.getvalue()
+    try:
+        thrift = json2thrift(json_str, thrift_class)
+        transport = TMemoryBuffer()
+        protocol = TBinaryProtocolAccelerated(transport)
+        thrift.write(protocol)
+        return transport.getvalue()
+    except Exception as e:
+        raise Exception(f"Failed to serialize {thrift_class.__name__} to binary: {str(e)}") from e

113-118: Add error handling for JSON serialization.

The function should catch potential serialization errors.

 def thrift_simple_json(obj):
-    simple = TSerialization.serialize(obj, protocol_factory=TSimpleJSONProtocolFactory())
-    parsed = json.loads(simple)
-    return json.dumps(parsed, indent=2)
+    try:
+        simple = TSerialization.serialize(obj, protocol_factory=TSimpleJSONProtocolFactory())
+        parsed = json.loads(simple)
+        return json.dumps(parsed, indent=2)
+    except Exception as e:
+        raise Exception(f"Failed to serialize {type(obj).__name__} to JSON: {str(e)}") from e

17-20: Remove unused import.

TJSONProtocolFactory is not used in the code.

 from thrift.protocol.TJSONProtocol import (
     TSimpleJSONProtocolFactory,
-    TJSONProtocolFactory,
 )

🧰 Tools

🪛 Ruff (0.8.2)

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)

78-82: Combine if branches for integer types.

Simplify the code by combining the integer type checks.

-        elif ttype == TType.I64:
-            ret = int(val)
-        elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE:
-            ret = int(val)
+        elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE):
+            ret = int(val)

🧰 Tools

🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/ai/chronon/cli/compile/validator.py (5)

35-36: Remove unused imports.

-from ai.chronon.repo import JOIN_FOLDER_NAME, GROUP_BY_FOLDER_NAME
-from ai.chronon.repo.serializer import thrift_simple_json, file2thrift
+from ai.chronon.repo.serializer import thrift_simple_json

🧰 Tools

🪛 Ruff (0.8.2)

35-35: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)

35-35: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)

36-36: ai.chronon.repo.serializer.file2thrift imported but unused

Remove unused import: ai.chronon.repo.serializer.file2thrift

(F401)

184-191: Add docstring to helper function.

The function's purpose and return value should be documented.

 def _source_has_topic(source: Source) -> bool:
+    """
+    Check if the source has a topic defined.
+    
+    Args:
+        source: The source object to check
+    Returns:
+        bool: True if the source has a topic, False otherwise
+    """
     if source.events:

198-209: Consider early return for better readability.

The function can be simplified.

 def _group_by_has_hourly_windows(groupBy: GroupBy) -> bool:
     aggs: List[Aggregation] = groupBy.aggregations
 
     if not aggs:
         return False
 
     for agg in aggs:
         for window in agg.windows:
             if window.timeUnit == common.TimeUnit.HOUR:
                 return True
-
-    return False
+    return False

410-416: Simplify nested if statements.

Combine the conditions for better readability.

-        if join.metaData.online:
-            if offline_included_group_bys:
-                errors.append(
-                    "join {} is online but includes the following offline group_bys: {}".format(
-                        join.metaData.name, ", ".join(offline_included_group_bys)
-                    )
-                )
+        if join.metaData.online and offline_included_group_bys:
+            errors.append(
+                "join {} is online but includes the following offline group_bys: {}".format(
+                    join.metaData.name, ", ".join(offline_included_group_bys)
+                )
+            )

🧰 Tools

🪛 Ruff (0.8.2)

410-411: Use a single if statement instead of nested if statements

(SIM102)

455-459: Improve error message clarity.

The error message could be more specific about the incompatibility.

-                f"group_by {group_by.metaData.name} is defined to be daily refreshed but contains hourly windows. "
+                f"group_by {group_by.metaData.name} is defined as a batch (daily) feature but contains hourly windows, which is incompatible."

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c0f1645 and 2c28746.

📒 Files selected for processing (16)

api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
api/py/ai/chronon/cli/compile/serializer.py (1 hunks)
api/py/ai/chronon/cli/compile/validator.py (12 hunks)
api/py/ai/chronon/cli/entrypoint.py (1 hunks)
api/py/ai/chronon/cli/git_utils.py (1 hunks)
api/py/ai/chronon/cli/logging.py (1 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/test/sample/teams.py (1 hunks)
api/py/test/test_git_utils.py (1 hunks)
api/thrift/api.thrift (4 hunks)
api/thrift/common.thrift (1 hunks)
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/cli/compile/parse_configs.py

3-3: logging imported but unused

Remove unused import: logging

(F401)

8-8: ai.chronon.cli.compile.constants.ConfigInfo imported but unused

Remove unused import: ai.chronon.cli.compile.constants.ConfigInfo

(F401)

12-12: Undefined name log_level

(F821)

api/py/ai/chronon/cli/compile/compiler.py

8-8: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

9-9: ai.chronon.cli.compile.constants imported but unused

Remove unused import: ai.chronon.cli.compile.constants

(F401)

13-13: ai.chronon.api.ttypes.Team imported but unused

Remove unused import: ai.chronon.api.ttypes.Team

(F401)

15-15: ai.chronon.repo.validator.ChrononRepoValidator imported but unused

Remove unused import: ai.chronon.repo.validator.ChrononRepoValidator

(F401)

16-16: ai.chronon.api.ttypes.GroupBy imported but unused

Remove unused import

(F401)

16-16: ai.chronon.api.ttypes.Join imported but unused

Remove unused import

(F401)

16-16: ai.chronon.api.ttypes.StagingQuery imported but unused

Remove unused import

(F401)

16-16: ai.chronon.api.ttypes.Model imported but unused

Remove unused import

(F401)

22-22: Undefined name dataclass

(F821)

24-24: Undefined name ConfigInfo

(F821)

70-70: Undefined name ConfigInfo

(F821)

api/py/test/sample/teams.py

8-8: Undefined name EnvironmentVariables

(F821)

45-45: Undefined name EnvironmentVariables

(F821)

66-66: Undefined name EnvironmentVariables

(F821)

api/py/ai/chronon/cli/compile/serializer.py

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/test/test_git_utils.py

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

api/py/ai/chronon/cli/compile/compile_context.py

19-19: Undefined name Team

(F821)

19-19: Undefined name teams

(F821)

21-21: Undefined name ChrononRepoValidator

(F821)

21-21: Undefined name ChrononRepoValidator

(F821)

24-24: Undefined name Join

(F821)

26-26: Undefined name GroupBy

(F821)

30-30: Undefined name StagingQuery

(F821)

34-34: Undefined name Model

(F821)

87-87: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

94-94: Undefined name file2thrift

(F821)

api/py/ai/chronon/cli/compile/fill_templates.py

8-8: Undefined name utils

(F821)

16-16: Undefined name teams_path

(F821)

18-18: Undefined name Join

(F821)

28-28: Undefined name Join

(F821)

33-33: Undefined name utils

(F821)

api/py/ai/chronon/cli/compile/parse_teams.py

2-2: ai.chronon.api.ttypes.MetaData imported but unused

Remove unused import: ai.chronon.api.ttypes.MetaData

(F401)

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

17-17: Undefined name importlib

(F821)

30-30: Undefined name Any

(F821)

32-32: Undefined name metadata

(F821)

47-47: Undefined name team_conf_file

(F821)

49-49: Undefined name metadata

(F821)

51-51: Undefined name metadata

(F821)

54-54: Undefined name metadata

(F821)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

31-31: Do not use bare except

(E722)

71-71: Do not use bare except

(E722)

api/py/ai/chronon/cli/entrypoint.py

2-2: sys imported but unused

Remove unused import: sys

(F401)

5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)

7-7: ai.chronon.cli.compile.write_files imported but unused

Remove unused import: ai.chronon.cli.compile.write_files

(F401)

16-16: Do not use bare except

(E722)

api/py/ai/chronon/cli/compile/validator.py

35-35: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)

35-35: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)

36-36: ai.chronon.repo.serializer.file2thrift imported but unused

Remove unused import: ai.chronon.repo.serializer.file2thrift

(F401)

213-213: Undefined name CompileContext

(F821)

410-411: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: no_spark_scala_tests
GitHub Check: fetcher_spark_tests

🔇 Additional comments (28)

api/py/ai/chronon/cli/git_utils.py (4)

42-57: LGTM.
Logic for fork point retrieval looks adequate.

59-67: LGTM.
Fetching file content at commit is straightforward.

75-125: Ensure correct error handling.
The fallback for untracked files is good, but confirm you handle partial failures properly.

128-154: Overall logic looks good.
Filtering real_changes is a neat approach.

api/thrift/common.thrift (1)

16-20: Addition of ConfigType looks good.
No issues detected.

api/py/ai/chronon/cli/logging.py (1)

1-14: LGTM! Color codes are well-organized and follow standard ANSI escape sequences.

api/py/ai/chronon/cli/entrypoint.py (4)

20-24: LGTM! Well-documented CLI group.

26-31: Implement sync functionality.

The sync command only echoes the branch name without actual implementation.

Would you like me to help implement the sync functionality?

33-56: Implement backfill functionality.

The backfill command only echoes parameters without actual implementation.

Would you like me to help implement the backfill functionality?

58-67: Implement deploy functionality.

The deploy command only echoes parameters without actual implementation.

Would you like me to help implement the deploy functionality?

api/py/ai/chronon/cli/compile/parse_configs.py (1)

15-41: LGTM! Well-structured function with proper error handling.

api/py/ai/chronon/cli/compile/compiler.py (1)

29-47: LGTM! Clear initialization and logging.

api/py/ai/chronon/cli/compile/parse_teams.py (1)

58-94: LGTM! Well-documented helper functions.

api/py/test/sample/teams.py (2)

6-20: Replace placeholder values in TODOs.

Several configuration values need to be replaced with actual values:

Infra team email

Serde class and args

Customer ID

GCP Project ID and resources

🧰 Tools

🪛 Ruff (0.8.2)

8-8: Undefined name EnvironmentVariables

(F821)

84-90: LGTM! Simple team configurations.

api/py/test/test_git_utils.py (1)

13-85: LGTM! Well-structured test with proper cleanup.

The test provides comprehensive coverage of Git utilities with proper setup and teardown.

api/py/ai/chronon/cli/compile/compile_context.py (1)

40-80: LGTM! Well-documented methods with clear examples.

api/py/ai/chronon/cli/compile/serializer.py (3)

27-40: LGTM! Clean implementation of JSON to Thrift decoder.

The class properly handles both dict and JSON string inputs.

89-90: LGTM! Clean wrapper function.

102-110: LGTM! Good error handling.

The error message is descriptive and helpful.

spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1)

53-53: LGTM! Simplified log formatting.

Removed unnecessary ITALIC style while keeping the GREEN color for file:line in logs.

api/py/ai/chronon/repo/compile.py (4)

241-244: LGTM! Good logging of object details.

Clear logging of team and object name.

249-256: LGTM! Proper skip handling.

Good warning messages for skipped objects.

258-264: LGTM! Validation error handling.

Clear error messages for validation failures.

267-275: LGTM! Safe overwrite checks.

Proper handling of force overwrite and online config protection.

api/thrift/api.thrift (3)

Line range hint 236-254: Good deprecation of old env vars structure.

Properly marked old modeToEnvMap as deprecated in favor of new env field.

441-446: LGTM! Clean environment variables structure.

Well-organized structure for different deployment contexts.

448-456: LGTM! Comprehensive team configuration.

Good organization of team metadata and configuration.

coderabbitai · 2025-01-27T07:33:19Z

api/py/ai/chronon/cli/git_utils.py

+def get_current_file_content(file_path: str) -> Optional[str]:
+    try:
+        return Path(file_path).read_text()
+    except:
+        return None
+


⚠️ Potential issue

Avoid the bare except: at line 71.
Use a more precise exception type.

-except: +except OSError: return None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_current_file_content(file_path: str) -> Optional[str]:

try:

return Path(file_path).read_text()

except:

return None

def get_current_file_content(file_path: str) -> Optional[str]:

try:

return Path(file_path).read_text()

except OSError:

return None

🧰 Tools

🪛 Ruff (0.8.2)

71-71: Do not use bare except

(E722)

coderabbitai · 2025-01-27T07:33:19Z

api/py/ai/chronon/cli/git_utils.py

+def get_current_branch() -> str:
+
+    try:
+        subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL)
+
+        return (
+            subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
+            .decode("utf-8")
+            .strip()
+        )
+
+    except subprocess.CalledProcessError as e:
+
+        try:
+            head_file = Path(".git/HEAD").resolve()
+            if head_file.exists():
+                content = head_file.read_text().strip()
+                if content.startswith("ref: refs/heads/"):
+                    return content.split("/")[-1]
+
+        except:
+            pass
+
+        print(
+            f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}",
+            file=sys.stderr,
+        )
+
+        raise


⚠️ Potential issue

Avoid the bare except: at line 31.
Specify a more precise exception (e.g. except OSError:).

-except: +except OSError: pass

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_current_branch() -> str:

try:

subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL)

return (

subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])

.decode("utf-8")

.strip()

)

except subprocess.CalledProcessError as e:

try:

head_file = Path(".git/HEAD").resolve()

if head_file.exists():

content = head_file.read_text().strip()

if content.startswith("ref: refs/heads/"):

return content.split("/")[-1]

except:

pass

print(

f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}",

file=sys.stderr,

)

raise

def get_current_branch() -> str:

try:

subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL)

return (

subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])

.decode("utf-8")

.strip()

)

except subprocess.CalledProcessError as e:

try:

head_file = Path(".git/HEAD").resolve()

if head_file.exists():

content = head_file.read_text().strip()

if content.startswith("ref: refs/heads/"):

return content.split("/")[-1]

except OSError:

pass

print(

f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}",

file=sys.stderr,

)

raise

🧰 Tools

🪛 Ruff (0.8.2)

31-31: Do not use bare except

(E722)

coderabbitai · 2025-01-27T07:33:20Z

api/py/ai/chronon/cli/compile/fill_templates.py

+import ai.chronon.cli.compile.parse_teams as teams
+
+
+def _fill_template(table, obj, namespace):
+
+    if table:
+        table = table.replace(
+            "{{ logged_table }}", utils.log_table_name(obj, full_name=True)
+        )
+        table = table.replace("{{ db }}", namespace)
+
+    return table
+
+
+def set_templated_values(obj, cls, team_name):
+    namespace = teams.get_team_conf(teams_path, team_name, "namespace")
+
+    if cls == Join and obj.bootstrapParts:
+
+        for bootstrap in obj.bootstrapParts:
+            bootstrap.table = _fill_template(bootstrap.table, obj, namespace)
+
+        if obj.metaData.dependencies:
+            obj.metaData.dependencies = [
+                _fill_template(dep, obj, namespace) for dep in obj.metaData.dependencies
+            ]
+
+    if cls == Join and obj.labelParts:
+
+        obj.labelParts.metaData.dependencies = [
+            label_dep.replace(
+                "{{ join_backfill_table }}",
+                utils.output_table_name(obj, full_name=True),
+            )
+            for label_dep in obj.labelParts.metaData.dependencies
+        ]


⚠️ Potential issue

Undefined references for utils, teams_path, Join.
Declare or import them to avoid runtime errors.

+import ai.chronon.cli.compile.utils as utils +from ai.chronon.cli.compile.join import Join +teams_path = "path/to/teams" # or retrieve from config

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

8-8: Undefined name utils

(F821)

16-16: Undefined name teams_path

(F821)

18-18: Undefined name Join

(F821)

28-28: Undefined name Join

(F821)

33-33: Undefined name utils

(F821)

coderabbitai · 2025-01-27T07:33:20Z

api/py/ai/chronon/cli/logging.py

+def get_logger(log_level=logging.INFO):
+
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setFormatter(ColorFormatter())
+
+    logger = logging.getLogger(__name__)
+    logger.addHandler(handler)
+    logger.setLevel(log_level)
+
+    return logger


🛠️ Refactor suggestion

Prevent duplicate handlers.

def get_logger(log_level=logging.INFO): handler = logging.StreamHandler(sys.stdout) handler.setFormatter(ColorFormatter()) logger = logging.getLogger(__name__) + logger.handlers.clear() # Prevent duplicate handlers logger.addHandler(handler) logger.setLevel(log_level)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_logger(log_level=logging.INFO):

handler = logging.StreamHandler(sys.stdout)

handler.setFormatter(ColorFormatter())

logger = logging.getLogger(__name__)

logger.addHandler(handler)

logger.setLevel(log_level)

return logger

def get_logger(log_level=logging.INFO):

handler = logging.StreamHandler(sys.stdout)

handler.setFormatter(ColorFormatter())

logger = logging.getLogger(__name__)

logger.handlers.clear() # Prevent duplicate handlers

logger.addHandler(handler)

logger.setLevel(log_level)

return logger

coderabbitai · 2025-01-27T07:33:20Z

api/py/ai/chronon/cli/entrypoint.py

+import click
+import sys
+from datetime import datetime, timedelta
+import subprocess
+from enum import Enum
+
+import ai.chronon.cli.compile.write_files as write_files
+


🛠️ Refactor suggestion

Remove unused imports.

import click -import sys from datetime import datetime, timedelta import subprocess -from enum import Enum - -import ai.chronon.cli.compile.write_files as write_files

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import click

import sys

from datetime import datetime, timedelta

import subprocess

from enum import Enum

import ai.chronon.cli.compile.write_files as write_files

import click

from datetime import datetime, timedelta

import subprocess

🧰 Tools

🪛 Ruff (0.8.2)

2-2: sys imported but unused

Remove unused import: sys

(F401)

5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)

7-7: ai.chronon.cli.compile.write_files imported but unused

Remove unused import: ai.chronon.cli.compile.write_files

(F401)

coderabbitai · 2025-01-27T07:33:21Z

api/py/ai/chronon/cli/compile/parse_teams.py

+def update_metadata(obj: Any, team_dict: Dict[str, Team]):
+
+    assert metadata is not None, "Cannot update empty metadata"
+
+    name = obj.metadata.name
+    team = obj.metadata.team
+
+    assert (
+        team is not None
+    ), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error."
+
+    assert (
+        team in team_dict
+    ), f"Team '{team}' not found in teams.py. Please add an entry."
+
+    assert (
+        DEFAULT_CONF_TEAM in team_dict
+    ), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry."
+
+    metadata.outputNamespace = team_dict[team].outputNamespace
+
+    metadata.env = _merge_environment_variables(
+        team_dict[DEFAULT_CONF_TEAM].env,
+        team_dict[team].env,
+        metadata.env,
+    )
+


⚠️ Potential issue

Fix undefined variables and add type hints.

+from typing import Any + def update_metadata(obj: Any, team_dict: Dict[str, Team]): - assert metadata is not None, "Cannot update empty metadata" + assert obj.metadata is not None, "Cannot update empty metadata" # ... - ), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry." + ), f"Default team '{DEFAULT_CONF_TEAM}' not found in teams.py. Please add an entry."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def update_metadata(obj: Any, team_dict: Dict[str, Team]):

assert metadata is not None, "Cannot update empty metadata"

name = obj.metadata.name

team = obj.metadata.team

assert (

team is not None

), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error."

assert (

team in team_dict

), f"Team '{team}' not found in teams.py. Please add an entry."

assert (

DEFAULT_CONF_TEAM in team_dict

), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry."

metadata.outputNamespace = team_dict[team].outputNamespace

metadata.env = _merge_environment_variables(

team_dict[DEFAULT_CONF_TEAM].env,

team_dict[team].env,

metadata.env,

)

from typing import Any

def update_metadata(obj: Any, team_dict: Dict[str, Team]):

assert obj.metadata is not None, "Cannot update empty metadata"

name = obj.metadata.name

team = obj.metadata.team

assert (

team is not None

), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error."

assert (

team in team_dict

), f"Team '{team}' not found in teams.py. Please add an entry."

assert (

DEFAULT_CONF_TEAM in team_dict

), f"Default team '{DEFAULT_CONF_TEAM}' not found in teams.py. Please add an entry."

metadata.outputNamespace = team_dict[team].outputNamespace

metadata.env = _merge_environment_variables(

team_dict[DEFAULT_CONF_TEAM].env,

team_dict[team].env,

metadata.env,

)

🧰 Tools

🪛 Ruff (0.8.2)

30-30: Undefined name Any

(F821)

32-32: Undefined name metadata

(F821)

47-47: Undefined name team_conf_file

(F821)

49-49: Undefined name metadata

(F821)

51-51: Undefined name metadata

(F821)

54-54: Undefined name metadata

(F821)

coderabbitai · 2025-01-27T07:33:21Z

api/py/test/sample/teams.py

@@ -0,0 +1,90 @@
+from ai.chronon.api.ttypes import Team


⚠️ Potential issue

Add missing import.

-from ai.chronon.api.ttypes import Team +from ai.chronon.api.ttypes import Team, EnvironmentVariables

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from ai.chronon.api.ttypes import Team

from ai.chronon.api.ttypes import Team, EnvironmentVariables

coderabbitai · 2025-01-27T07:33:22Z

api/py/ai/chronon/cli/compile/compile_context.py

+from dataclasses import dataclass
+import os
+from typing import Any, Dict, List
+
+from ai.chronon.api.common.ttypes import ConfigType
+


⚠️ Potential issue

Add missing imports.

from dataclasses import dataclass import os from typing import Any, Dict, List from ai.chronon.api.common.ttypes import ConfigType +from ai.chronon.api.ttypes import Team +from ai.chronon.cli.compile import teams +from ai.chronon.cli.validation import ChrononRepoValidator +from ai.chronon.api.models import Join, GroupBy, StagingQuery, Model +from ai.chronon.cli.utils import file2thrift

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from dataclasses import dataclass

import os

from typing import Any, Dict, List

from ai.chronon.api.common.ttypes import ConfigType

from dataclasses import dataclass

import os

from typing import Any, Dict, List

from ai.chronon.api.common.ttypes import ConfigType

from ai.chronon.api.ttypes import Team

from ai.chronon.cli.compile import teams

from ai.chronon.cli.validation import ChrononRepoValidator

from ai.chronon.api.models import Join, GroupBy, StagingQuery, Model

from ai.chronon.cli.utils import file2thrift

coderabbitai · 2025-01-27T07:33:22Z

api/py/ai/chronon/cli/compile/compile_context.py

+    def existing_confs(self, obj_class: type) -> Dict[str, object]:
+
+        result = {}
+
+        output_dir = self.output_dir(obj_class)
+
+        for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+
+            for f in sub_files:
+
+                if not f.startswith("."):  # ignore hidden files - such as .DS_Store
+                    continue
+
+                obj = file2thrift(os.path.join(sub_root, f), obj_class)
+
+                if obj and hasattr(obj, "metaData"):
+                    result[obj.metaData.name] = obj
+
+        return result


⚠️ Potential issue

Fix hidden files check and rename unused variable.

- def existing_confs(self, obj_class: type) -> Dict[str, object]: + def existing_confs(self, obj_class: type) -> Dict[str, object]: result = {} output_dir = self.output_dir(obj_class) - for sub_root, sub_dirs, sub_files in os.walk(output_dir): + for sub_root, _sub_dirs, sub_files in os.walk(output_dir): for f in sub_files: - if not f.startswith("."): # ignore hidden files + if f.startswith("."): # ignore hidden files continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def existing_confs(self, obj_class: type) -> Dict[str, object]:

result = {}

output_dir = self.output_dir(obj_class)

for sub_root, sub_dirs, sub_files in os.walk(output_dir):

for f in sub_files:

if not f.startswith("."): # ignore hidden files - such as .DS_Store

continue

obj = file2thrift(os.path.join(sub_root, f), obj_class)

if obj and hasattr(obj, "metaData"):

result[obj.metaData.name] = obj

return result

def existing_confs(self, obj_class: type) -> Dict[str, object]:

result = {}

output_dir = self.output_dir(obj_class)

for sub_root, _sub_dirs, sub_files in os.walk(output_dir):

for f in sub_files:

if f.startswith("."): # ignore hidden files - such as .DS_Store

continue

obj = file2thrift(os.path.join(sub_root, f), obj_class)

if obj and hasattr(obj, "metaData"):

result[obj.metaData.name] = obj

return result

🧰 Tools

🪛 Ruff (0.8.2)

87-87: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

94-94: Undefined name file2thrift

(F821)

coderabbitai · 2025-01-27T07:33:22Z

api/py/ai/chronon/cli/compile/validator.py

+
 class ChrononRepoValidator(object):
-    def __init__(self, chronon_root_path: str, output_root: str, log_level=logging.INFO):
+    def __init__(self, compile_context: CompileContext, log_level=logging.INFO):


⚠️ Potential issue

Add missing import for CompileContext.

+from ai.chronon.cli.compile.compile_context import CompileContext

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

213-213: Undefined name CompileContext

(F821)

varant-zlai · 2025-01-27T14:41:42Z

api/py/ai/chronon/cli/entrypoint.py

+
+
+@cli.command()
+@click.argument("conf")


Is the idea that this points to the python file or the compiled directory? I was thinking python, with an object selector in case you specify a file with multiple objects within it.

Let's keep these arguments consistent with what's in runner.py (

chronon/api/py/ai/chronon/repo/runner.py

Line 144 in c0f1645

def backfill(self, start_date, end_date, force_recompute=False, plan=False):

)

And in fact, we can even complete the implementation in this file to call those runner functions, which are attached to the base python objects (like here: https://github.com/zipline-ai/chronon/blob/main/api/py/ai/chronon/group_by.py#L610-L612).

So the logic here would be:

Extract object

Call object.backfill(args)

By the way, we can add scope to runner.py, although I wonder if scope is the right word? Although it's more verbose, run_upstream=t/f and run_downstream=t/f might be more explicit and clearer.

varant-zlai

There was a lot of compile logic in my previous PR that I want to make sure that we're still keeping. Since we compile the whole repo when we try to run a given object (ensures compile of the target file works, errors if not, warns if other things are failing, etc...)

tchow-zlai · 2025-01-27T17:54:33Z

api/py/test/sample/teams.py

+            "GCP_DATAPROC_CLUSTER_NAME": "canary-2",  # TODO: GCP Dataproc Cluster Name
+            "GCP_BIGTABLE_INSTANCE_ID": "zipline-canary-instance",  # TODO: GCP Bigtable Instance ID
+        },
+        backfill={  # Backfills tend to be larger than other jobs


main thing here is I think we'd like to just use spark config props - that removes an extra layer of indirection, and makes it clear to the user what this is for. WDYT?

I am down for that actually

coderabbitai

Actionable comments posted: 18

🔭 Outside diff range comments (3)

online/src/main/scala/ai/chronon/online/TopicChecker.scala (1)
43-80: Close admin client in topic check.
Likewise, call close() after checking topics to release resources.
 def topicShouldExist(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Unit = {
   val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
   try {
     val adminClient = AdminClient.create(props)
     ...
+    adminClient.close()
   } catch {
     case ex: Exception => throw new RuntimeException(s"Failed to check for topic $topic in $bootstrap", ex)
   }
 }
frontend/src/routes/thrift/+page.svelte (1)

1-136: Move API testing code to dedicated test file.

Testing code should not be in production routes. Consider:

Moving to a test file

Using environment variables for IDs

Using relative timestamps
distribution/build_and_upload_gcp_artifacts.sh (1)
24-27: Fix Variable Expansion
Prefix EXPECTED_MINIMUM_MINOR_PYTHON_VERSION and MINOR_PYTHON_VERSION with '$' for proper evaluation.
- if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+ if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

🧹 Nitpick comments (47)

online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)
565-611: Consider adding null/empty case test.

Test case looks good but could be more comprehensive.

Add a test case for empty/null listing IDs:
  it should "handle explode invocations in select clauses" in {
    // ... existing test code ...
    
+   val emptyRow: Map[String, Any] = Map(
+     "event_name" -> "backend_cart_payment",
+     "properties" -> makeHashMap("sold_listing_ids" -> ""),
+   )
+   
+   val empty_res = cu.performSql(emptyRow)
+   empty_res.size shouldBe 0
  }
spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1)
52-53: Add explicit return type for UDF.

Improve type safety by declaring return type.
-  private def edit_distance: UserDefinedFunction =
-    functions.udf((left: Object, right: Object) => EditDistance.between(left, right))
+  private def edit_distance: UserDefinedFunction =
+    functions.udf((left: Object, right: Object) => EditDistance.between(left, right): Int)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (1)
55-66: Enhance date conversion test structure.

While the test covers both date types, consider:

Extracting duplicate conversion logic

Asserting the actual BigQuery date values
 it should "bigquery connector converts spark dates regardless of date setting" in {
   val input = spark.createDataFrame(Seq((1, "2021-01-01"))).toDF("id", "ds")
+  def testDateConversion(useJava8Api: Boolean) = {
+    spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, useJava8Api)
+    val date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1)
+    val expectedType = if (useJava8Api) classOf[java.time.LocalDate] else classOf[java.sql.Date]
+    assert(date.getClass == expectedType)
+    val bqDate = SparkBigQueryUtil.sparkDateToBigQuery(date)
+    assert(bqDate != null) // Add specific value assertion
+  }
-  spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true)
-  val java8Date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1)
-  assert(java8Date.isInstanceOf[java.time.LocalDate])
-  SparkBigQueryUtil.sparkDateToBigQuery(java8Date)
-
-  spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, false)
-  val nonJava8Date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1)
-  assert(nonJava8Date.isInstanceOf[java.sql.Date])
-  SparkBigQueryUtil.sparkDateToBigQuery(nonJava8Date)
+  testDateConversion(true)
+  testDateConversion(false)
 }
online/src/main/scala/ai/chronon/online/CatalystUtil.scala (2)

147-150: Use typed container.
Returning Seq[Array[Any]] may undermine type safety.

167-193: Method size.
Consider smaller subroutines for readability.
hub/src/main/java/ai/chronon/hub/HubVerticle.java (1)
82-85: Consider separate request classes for join and column drift.

Using JoinDriftRequest for both join and column drift endpoints could cause confusion. Create distinct request classes for better type safety.
-router.get("/api/v1/join/:name/column/:columnName/drift").handler(RouteHandlerWrapper.createHandler(driftHandler::getColumnDrift, JoinDriftRequest.class));
+router.get("/api/v1/join/:name/column/:columnName/drift").handler(RouteHandlerWrapper.createHandler(driftHandler::getColumnDrift, ColumnDriftRequest.class));
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)
6-6: Document rationale for wildcard import.

Consider adding a comment explaining why wildcard import is preferred over specific imports.

79-80: Consider adding descriptive error message for Hive fallback.

Good fallback logic, but users might benefit from knowing why Hive was chosen.
-        scala.Option(btTableIdentifier.getProject).map(BigQueryFormat(_, bigQueryClient, Map.empty)).getOrElse(Hive))
+        scala.Option(btTableIdentifier.getProject)
+          .map(BigQueryFormat(_, bigQueryClient, Map.empty))
+          .getOrElse {
+            logger.info(s"No BigQuery project found for $tableName, falling back to Hive")
+            Hive
+          })
flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala (1)
14-16: Make cache capacity configurable.
Separate the cache size from static code so it can be customized via config if needed.
- private val CacheCapacity: Int = 10
+ private val CacheCapacity: Int = conf.get("registry_cache_capacity").map(_.toInt).getOrElse(10)
flink/src/main/scala/ai/chronon/flink/KafkaFlinkSource.scala (1)

20-28: Improve bootstrap fallback.
You assume host and port if the config is missing yet throw an error only for the port. Consider a clearer fallback or using defaults for both.

flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (2)

18-19: Prefer final variables.
Mark constructor params as val or private val if they are not re-assigned.

49-55: Blocking call caution.
Await.result(..., 30.seconds) can stall event loops. Consider async handling.

hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (1)

49-55: Blocking call caution.
Await.result(..., 30.seconds) can freeze threads. Async approach may be safer.
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import.
from logging import error is unused. Drop it.
-from logging import error
🧰 Tools

🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Rename unused variable.
Rename sub_dirs to _sub_dirs to clarify it's unused.
-for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools

🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)
hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (2)

20-39: Handle unsupported types carefully
Throwing RuntimeException is okay, but consider a domain-specific exception.

44-48: Casting with reflection
Consider safer casts or typed definitions.

frontend/src/lib/api/api.ts (1)

40-46: Generic getConf
Clear approach, but return type might need stricter checks.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1)
21-24: Consider consistent exception usage.
Elsewhere, you use UnsupportedOperationException; here you use NotImplementedError.
-throw new NotImplementedError("alterTableProperties not yet supported for BigQuery")
+throw new UnsupportedOperationException("alterTableProperties not yet supported for BigQuery")
frontend/src/lib/components/NavigationBar.svelte (1)

251-261: Hard-coded condition may cause confusion.

Consider removing or externalizing 'risk.user_transactions.txn_join' for clarity.

api/thrift/fetcher.thrift (1)

6-11: Consider adding field constraints and improving docs.

Add:

Valid ranges for timestamp fields

Max length for dataset string

Expected format for keyBytes
frontend/src/routes/joins/+page.svelte (1)
8-12: Simplify sort function.

The sort function can be simplified using string comparison.
-	const sortedItems = data.items.sort((a: IJoin, b: IJoin) => {
-		if (a.metaData?.name === 'risk.user_transactions.txn_join') return -1;
-		if (b.metaData?.name === 'risk.user_transactions.txn_join') return 1;
-		return 0;
-	});
+	const sortedItems = data.items.sort((a: IJoin, b: IJoin) => 
+		(b.metaData?.name === 'risk.user_transactions.txn_join') as any - 
+		(a.metaData?.name === 'risk.user_transactions.txn_join') as any
+	);
distribution/run_zipline_quickstart.py (1)
10-11: Fix line continuation style.

Use parentheses for line continuation instead of backslash.
-        quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
-                                     , "run_zipline_quickstart.sh")
+        quickstart_sh = os.path.join(
+            os.path.dirname(os.path.realpath(__file__)),
+            "run_zipline_quickstart.sh"
+        )
frontend/src/routes/joins/[slug]/observability/ModelTable.svelte (1)
15-17: Simplify ModelType indexing.

The conditional check for undefined can be simplified using optional chaining.
-			value={model.modelType === undefined ? undefined : ModelType[model.modelType]}
+			value={ModelType[model.modelType ?? -1]}
api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1)
18-18: Consider using relative timestamps.

Hardcoded timestamp might become outdated.
-    key.setTileStartTimestampMillis(1738195200000L) // 2025-01-29T00:00:00Z
+    key.setTileStartTimestampMillis(System.currentTimeMillis())
api/src/main/scala/ai/chronon/api/TilingUtils.scala (1)

15-15: Document the compact protocol bug.

Add details about the bug for future reference.
frontend/src/lib/server/conf-loader.ts (1)
39-42: Enhance error handling specificity.

Consider catching specific error types and providing more detailed error messages.
 } catch (error) {
-    console.error(`Failed to load ${entityMatch.label.toLowerCase()}:`, error);
-    throw error;
+    const errorMessage = error instanceof Error ? error.message : 'Unknown error';
+    console.error(`Failed to load ${entityMatch.label.toLowerCase()}: ${errorMessage}`);
+    throw new Error(`Configuration loading failed: ${errorMessage}`);
 }
api/py/ai/chronon/cli/compile/fill_templates.py (2)
4-4: Remove unused import.

The parse_teams import is not used in the code.
-import ai.chronon.cli.compile.parse_teams as teams
🧰 Tools

🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

7-15: Add input validation.

Add validation for input parameters to prevent potential errors.
 def _fill_template(table, obj, namespace):
+    if not isinstance(table, str):
+        return table
 
     if table:
         table = table.replace(
api/py/ai/chronon/cli/logger.py (1)
5-14: Consider using an enum for color codes.

Organize color codes using an enum for better maintainability.
from enum import Enum

class Colors(Enum):
    TIME = "\033[36m"  # Cyan
    DEBUG = "\033[36m"  # Cyan
    INFO = "\033[32m"  # Green
    WARNING = "\033[33m"  # Yellow
    ERROR = "\033[31m"  # Red
    CRITICAL = "\033[41m"  # White on Red
    FILE = "\033[35m"  # Purple
    RESET = "\033[0m"
spark/src/main/scala/ai/chronon/spark/streaming/TopicCheckerApp.scala (1)

39-41: Consider graceful shutdown.

Using System.exit(0) might not allow proper cleanup. Consider using a more graceful shutdown mechanism.
api/py/ai/chronon/cli/compile/parse_configs.py (1)
4-4: Remove unused imports.
-import logging
-from typing import Any, Dict, List, Tuple
+from typing import Any, List
-import ai.chronon.cli.compile.parse_teams as teams
Also applies to: 6-6, 9-9

🧰 Tools

🪛 Ruff (0.8.2)

4-4: logging imported but unused

Remove unused import: logging

(F401)
api/py/ai/chronon/cli/entrypoint.py (1)
2-2: Remove unused imports.
-import sys
-from enum import Enum
Also applies to: 5-5

🧰 Tools

🪛 Ruff (0.8.2)

2-2: sys imported but unused

Remove unused import: sys

(F401)
api/py/ai/chronon/cli/compile/compiler.py (1)
48-49: Clean up spinner implementation.

Status variable is unused and spinner code is duplicated.
-    with console.status("[bold green]Loading...", spinner="dots") as status:
+    with console.status("[bold green]Loading...", spinner="dots"):
🧰 Tools

🪛 Ruff (0.8.2)

48-48: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)
api/src/main/scala/ai/chronon/api/Constants.scala (1)

80-82: Enhance documentation for magic null constants.

While the comment explains language compatibility, it should also document:

Why these specific values were chosen

When to use these constants vs. actual null values

api/thrift/hub.thrift (1)

119-124: Add documentation for each enum value.

Each ConfType value should be documented to explain its purpose and usage.
online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1)
37-52: Enhance parse method robustness.

Consider these improvements:

Validate params format before splitting

Handle malformed key-value pairs gracefully
 def parse(topic: String): TopicInfo = {
   assert(topic.nonEmpty, s"invalid topic: $topic")
   val (messageBus, rest) = if (topic.contains("://")) {
     val tokens = topic.split("://", 2)
     tokens.head -> tokens.last
   } else {
     "kafka" -> topic
   }
   assert(rest.nonEmpty, s"invalid topic: $topic")
   val fields = rest.split("/")
   val topicName = fields.head
-  val params = fields.tail.map { f =>
-    val kv = f.split("=", 2); kv.head -> kv.last
-  }.toMap
+  val params = fields.tail.flatMap { f =>
+    try {
+      val kv = f.split("=", 2)
+      if (kv.length == 2) Some(kv.head -> kv.last) else None
+    } catch {
+      case _: Exception => None
+    }
+  }.toMap
   TopicInfo(topicName, messageBus, params)
 }
api/py/ai/chronon/cli/git_utils.py (1)
4-4: Remove unused import.
-from typing import List, Optional, Set
+from typing import List, Optional
🧰 Tools

🪛 Ruff (0.8.2)

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)
hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (1)

23-24: Add test cases for concurrent requests and error scenarios.

Consider adding tests for:

Concurrent drift requests

DriftStore failures

Empty result sets

Also applies to: 64-88

flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (1)

8-9: Restore removed documentation for TimestampedIR and TimestampedTile.

Add back the documentation to maintain code clarity.
api/py/ai/chronon/cli/compile/validator.py (1)
458-470: Consider combining temporal and topic checks.

The temporal and topic checks could be combined to reduce code complexity.
-    non_temporal = (
-        group_by.accuracy is None or group_by.accuracy == Accuracy.SNAPSHOT
-    )
-
-    no_topic = not _group_by_has_topic(group_by)
-    has_hourly_windows = _group_by_has_hourly_windows(group_by)
-
-    # batch features cannot contain hourly windows
-    if (no_topic and non_temporal) and has_hourly_windows:
+    if (not _group_by_has_topic(group_by) and 
+        (group_by.accuracy is None or group_by.accuracy == Accuracy.SNAPSHOT) and 
+        _group_by_has_hourly_windows(group_by)):
distribution/build_and_upload_gcp_artifacts.sh (2)

33-33: Split VERSION Assignment
Separate the VERSION extraction from the 'pip wheel' command to avoid masking return values.

9-9: Enhance cd Robustness
Consider appending "|| exit" to the cd command for added safety (e.g., cd $CHRONON_ROOT_DIR || exit).

distribution/run_zipline_quickstart.sh (3)

7-8: Robust Working Directory Change
Use "cd $WORKING_DIR || exit" to ensure the script halts if the directory change fails.

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

10-11: Remove Unused RED Variable
The RED variable is declared but never used; remove it if it’s unnecessary.

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

15-18: Address Bigtable TODO
The TODO for deleting bigtable rows remains unresolved—update this per the requirements.
service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (1)
90-103: Enhance error messages.

Add more context to error messages.
-                        LOGGER.error("Failed to serialize response", e);
+                        LOGGER.error("Failed to serialize response to JSON using TSimpleJSONProtocol", e);
-                        LOGGER.error("Unexpected error during serialization", e);
+                        LOGGER.error("Unexpected error during JSON serialization of " + output.getClass().getSimpleName(), e);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 2c28746 and 30aed3c.

📒 Files selected for processing (131)

.github/workflows/require_triggered_status_checks.yaml (1 hunks)
.github/workflows/test_frontend.yaml (1 hunks)
.plugin-versions (1 hunks)
.tool-versions (1 hunks)
aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala (1 hunks)
aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala (1 hunks)
api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
api/py/ai/chronon/cli/compile/validator.py (12 hunks)
api/py/ai/chronon/cli/entrypoint.py (1 hunks)
api/py/ai/chronon/cli/git_utils.py (1 hunks)
api/py/ai/chronon/cli/logger.py (1 hunks)
api/py/ai/chronon/group_by.py (0 hunks)
api/py/ai/chronon/join.py (0 hunks)
api/py/ai/chronon/repo/hub_uploader.py (2 hunks)
api/py/ai/chronon/repo/run.py (13 hunks)
api/py/test/sample/teams.json (2 hunks)
api/src/main/scala/ai/chronon/api/Constants.scala (1 hunks)
api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
api/src/main/scala/ai/chronon/api/TilingUtils.scala (1 hunks)
api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (1 hunks)
api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1 hunks)
api/thrift/api.thrift (4 hunks)
api/thrift/fetcher.thrift (1 hunks)
api/thrift/hub.thrift (1 hunks)
api/thrift/observability.thrift (1 hunks)
build.sbt (8 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (2 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (7 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (5 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (5 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProviderTest.scala (1 hunks)
distribution/build_and_upload_gcp_artifacts.sh (3 hunks)
distribution/run_zipline_quickstart.py (1 hunks)
distribution/run_zipline_quickstart.sh (1 hunks)
flink/BUILD.bazel (2 hunks)
flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (5 hunks)
flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (8 hunks)
flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (11 hunks)
flink/src/main/scala/ai/chronon/flink/KafkaFlinkSource.scala (1 hunks)
flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (2 hunks)
flink/src/main/scala/ai/chronon/flink/SchemaProvider.scala (1 hunks)
flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala (1 hunks)
flink/src/main/scala/ai/chronon/flink/SourceProvider.scala (0 hunks)
flink/src/main/scala/ai/chronon/flink/TestFlinkJob.scala (4 hunks)
flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (1 hunks)
flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (5 hunks)
flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (1 hunks)
flink/src/test/resources/user.avsc (1 hunks)
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (4 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (6 hunks)
flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1 hunks)
flink/src/test/scala/ai/chronon/flink/test/SchemaRegistrySchemaProviderSpec.scala (1 hunks)
flink/src/test/scala/org/apache/spark/sql/avro/AvroDeSerializationSupportSpec.scala (1 hunks)
frontend/src/lib/api/api.test.ts (3 hunks)
frontend/src/lib/api/api.ts (3 hunks)
frontend/src/lib/components/LogicalNodeTable.svelte (1 hunks)
frontend/src/lib/components/NavigationBar.svelte (3 hunks)
frontend/src/lib/server/conf-loader.ts (1 hunks)
frontend/src/lib/types/Entity/Entity.ts (1 hunks)
frontend/src/lib/types/Model/Model.test.ts (0 hunks)
frontend/src/lib/types/Model/Model.ts (0 hunks)
frontend/src/routes/groupbys/+page.server.ts (1 hunks)
frontend/src/routes/groupbys/+page.svelte (1 hunks)
frontend/src/routes/joins/+page.server.ts (1 hunks)
frontend/src/routes/joins/+page.svelte (1 hunks)
frontend/src/routes/joins/[slug]/observability/ModelTable.svelte (1 hunks)
frontend/src/routes/joins/[slug]/services/joins.service.ts (2 hunks)
frontend/src/routes/models/+page.server.ts (1 hunks)
frontend/src/routes/models/+page.svelte (1 hunks)
frontend/src/routes/stagingqueries/+page.server.ts (1 hunks)
frontend/src/routes/stagingqueries/+page.svelte (1 hunks)
frontend/src/routes/thrift/+page.svelte (1 hunks)
hub/src/main/java/ai/chronon/hub/HubVerticle.java (3 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (1 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (1 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/JoinsHandler.scala (0 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/ModelsHandler.scala (0 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/Paginate.scala (0 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/SearchHandler.scala (0 hunks)
hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala (1 hunks)
hub/src/main/scala/ai/chronon/hub/model/Model.scala (0 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (1 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (1 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/JoinHandlerTest.scala (0 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/ModelHandlerTest.scala (0 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/SearchHandlerTest.scala (0 hunks)
online/src/main/java/ai/chronon/online/JavaFetcher.java (2 hunks)
online/src/main/scala/ai/chronon/online/AvroConversions.scala (4 hunks)
online/src/main/scala/ai/chronon/online/CatalystUtil.scala (5 hunks)
online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (2 hunks)
online/src/main/scala/ai/chronon/online/Fetcher.scala (2 hunks)
online/src/main/scala/ai/chronon/online/FetcherBase.scala (5 hunks)
online/src/main/scala/ai/chronon/online/FetcherMain.scala (1 hunks)
online/src/main/scala/ai/chronon/online/MetadataStore.scala (2 hunks)
online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (1 hunks)
online/src/main/scala/ai/chronon/online/TopicChecker.scala (3 hunks)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1 hunks)
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (4 hunks)
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (24 hunks)
online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (5 hunks)
service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (2 hunks)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/Driver.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/JobSubmitter.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (3 hunks)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (3 hunks)
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (8 hunks)
spark/src/main/scala/ai/chronon/spark/format/Format.scala (2 hunks)
spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/streaming/KafkaStreamBuilder.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/streaming/TopicCheckerApp.scala (1 hunks)
spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (3 hunks)
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (1 hunks)
tools/build_rules/dependencies/maven_repository.bzl (2 hunks)
tools/build_rules/dependencies/spark_repository.bzl (1 hunks)
tools/build_rules/flink/BUILD (1 hunks)
tools/build_rules/prelude_bazel (1 hunks)
tools/build_rules/spark/BUILD (1 hunks)

💤 Files with no reviewable changes (13)

api/py/ai/chronon/join.py
hub/src/test/scala/ai/chronon/hub/handlers/ModelHandlerTest.scala
hub/src/main/scala/ai/chronon/hub/handlers/Paginate.scala
frontend/src/lib/types/Model/Model.ts
hub/src/test/scala/ai/chronon/hub/handlers/JoinHandlerTest.scala
hub/src/test/scala/ai/chronon/hub/handlers/SearchHandlerTest.scala
hub/src/main/scala/ai/chronon/hub/handlers/JoinsHandler.scala
hub/src/main/scala/ai/chronon/hub/handlers/SearchHandler.scala
hub/src/main/scala/ai/chronon/hub/handlers/ModelsHandler.scala
api/py/ai/chronon/group_by.py
flink/src/main/scala/ai/chronon/flink/SourceProvider.scala
hub/src/main/scala/ai/chronon/hub/model/Model.scala
frontend/src/lib/types/Model/Model.test.ts

✅ Files skipped from review due to trivial changes (9)

.tool-versions
spark/src/main/scala/ai/chronon/spark/JobSubmitter.scala
tools/build_rules/spark/BUILD
api/src/main/scala/ai/chronon/api/DataType.scala
aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala
api/py/ai/chronon/repo/hub_uploader.py
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProviderTest.scala
flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala

🚧 Files skipped from review as they are similar to previous changes (1)

spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala

🧰 Additional context used

📓 Learnings (4)

spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.

.github/workflows/test_frontend.yaml (1)

Learnt from: ken-zlai
PR: zipline-ai/chronon#185
File: .github/workflows/test_frontend.yaml:41-42
Timestamp: 2025-01-08T18:29:51.176Z
Learning: In GitHub Actions, the `version` parameter in `awalsh128/cache-apt-pkgs-action` is a cache control number used to invalidate and regenerate the cache. It does not specify the package version - the actual package version is determined by the Ubuntu package repositories.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (2)

Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.

Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)

Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.

Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

🪛 Shellcheck (0.10.0)

distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

🪛 Ruff (0.8.2)

api/py/ai/chronon/cli/compile/fill_templates.py

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_configs.py

4-4: logging imported but unused

Remove unused import: logging

(F401)

6-6: typing.Dict imported but unused

Remove unused import

(F401)

6-6: typing.Tuple imported but unused

Remove unused import

(F401)

9-9: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_teams.py

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/repo/run.py

817-817: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

api/py/ai/chronon/cli/compile/validator.py

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compiler.py

20-20: Undefined name console

(F821)

25-25: Undefined name console

(F821)

28-28: Undefined name console

(F821)

28-28: f-string without any placeholders

Remove extraneous f prefix

(F541)

48-48: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/ai/chronon/cli/entrypoint.py

2-2: sys imported but unused

Remove unused import: sys

(F401)

5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)

17-17: Do not use bare except

(E722)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

33-33: Do not use bare except

(E722)

73-73: Do not use bare except

(E722)

🔇 Additional comments (213)

.github/workflows/require_triggered_status_checks.yaml (2)

1-8: Workflow Config OK.
YAML is concise and correctly sets up branch protection on pull requests.

9-14: Steps Config Validated.
The action is properly configured with token input and a 10s delay.

.plugin-versions (1)

2-2: LGTM!

The bazelisk plugin entry is correctly formatted and uses a specific commit hash.

build.sbt (3)

218-219: LGTM!

The Confluent dependencies are properly configured with consistent versions.

Also applies to: 225-225, 249-249

293-293: LGTM!

Minor version bump for spark-bigquery-with-dependencies.

300-301: LGTM!

Proper exclusion rule to prevent dependency conflicts.

online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (2)

25-25: LGTM!

Import required for ScalaTest matchers.

182-182: LGTM! Safer result handling.

Consistent use of .headOption improves robustness.

Also applies to: 201-201, 225-225, 250-250, 275-275, 298-298, 310-310, 318-318, 327-327, 335-335, 352-352, 362-362, 372-372, 387-387, 407-407, 425-425, 443-443, 470-470, 488-488, 502-502, 521-521, 540-540, 553-553, 649-649

spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (3)

20-20: LGTM!

Import follows package organization best practices.

85-95: LGTM!

Clean integration with percentile-based metrics and edit operation tracking.

109-115: LGTM!

Proper type-specific metric application.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2)

9-13: LGTM!

Import changes align with the new date conversion test case.

48-52: Add test coverage for the new BigQueryFormat parameter.

Pattern matching was updated to handle 3 parameters, but the test only verifies the type.

Consider adding assertions to validate the values of all three parameters.
online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (1)

48-48: Add sequence size validation.

Silently taking the first result could hide data issues.

Consider adding size validation:
-        reintroduceExceptions(catalystUtil.performSql(keys ++ values).headOption.orNull, values)
+        val results = catalystUtil.performSql(keys ++ values)
+        require(results.size <= 1, s"Expected at most one result, got ${results.size}")
+        reintroduceExceptions(results.headOption.orNull, values)
✅ Verification successful

Validate SQL result sequence size.
Adding an explicit size check is ideal to catch unexpected multiple rows.
Consider updating the code as follows:
-        reintroduceExceptions(catalystUtil.performSql(keys ++ values).headOption.orNull, values)
+        val results = catalystUtil.performSql(keys ++ values)  // get SQL results
+        require(results.size <= 1, s"Expected at most one result, got ${results.size}")  // validate size
+        reintroduceExceptions(results.headOption.orNull, values)  // process first result or null
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1)

9-9: LGTM! Clear test description.

The updated test description better reflects the functionality being tested.

online/src/main/scala/ai/chronon/online/CatalystUtil.scala (11)

44-44: Consider concurrency.
ArrayBuffer is not thread-safe.

113-114: Check backward compatibility.
Changing return type from Option to Seq might affect existing callers.

140-140: Consistent transformation.
Storing transformFunc returning Seq aligns with multi-result strategy.

154-154: Verify usage.
Method now yields Seq—ensure callers handle multiple results.

159-163: Multiple results.
Returning Seq[Map[String, Any]] is consistent with the broader changes.

201-210: Mutable buffer caution.
Ensure no concurrency issues with shared buffer usage.

215-221: Filter logic.
Returning ArrayBuffer fosters multi-row processing.

236-245: Projection chain.
Implementation seems coherent.

247-251: Simple projection.
Generates array buffers of single row.

255-259: Collect usage.
Potential memory risk if data is large.

262-266: Projection approach.
Straightforward.
flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (2)

2-2: LGTM!

Import statement aligns with the broader refactoring changes.

34-34: Verify timestamp availability.

The simplified timestamp access looks cleaner, but ensure tsMillis is always available in WriteResponse.

✅ Verification successful

Timestamp availability verified.

tsMillis is defined in WriteResponse (in flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala).

Its consistent use in equals and hashCode confirms its presence.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for WriteResponse definition and usage
ast-grep --pattern 'case class WriteResponse {
  $$$
  tsMillis: Long
  $$$
}'
Length of output: 80

Script:
#!/bin/bash
# Search for WriteResponse definitions across the codebase to check for tsMillis.
rg -n "case class WriteResponse" .
Length of output: 36

Script:
#!/bin/bash
# Comprehensive search for WriteResponse definitions in Scala.
rg -n "(case class|class|trait)\s+WriteResponse" .
Length of output: 165

Script:
#!/bin/bash
# Dump WriteResponse class for tsMillis context.
sed -n '80,150p' ./flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
Length of output: 1350
flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (25)

4-5: Essential imports
All good.

9-12: New imports for watermark & Avro
No issues.

19-19: MetadataStore usage
Looks correct.

21-21: TopicInfo usage
No concerns.

22-23: Event-time imports
All good.

29-29: Checkpoint config import
Valid usage.

44-44: Duration import
Used for watermarks. Fine.

62-62: sinkFn updated
Now uses AvroCodecOutput. Seems consistent.

115-120: Watermark assignment
Good approach for event-time.

121-121: Avro output
No issues.

170-175: Second watermark assignment
Mirrors the first approach. Good.

203-203: Watermarks in tiling
Ensures correct event-time logic.

225-225: Tiling DS Avro output
Consistent with usage above.

260-263: AllowedOutOfOrderness
5 minutes for late data is typical.

264-267: Idleness timeout
Handles low event rates neatly.

268-278: Custom watermarkStrategy
Implements bounded out-of-orderness & idleness.

287-288: Mock source option
Minor, looks good.

289-290: kafkaBootstrap
Optional param. No concerns.

299-299: groupByName
Straight usage. Fine.

303-303: Kafka bootstrap
Clean usage.

314-349: Retrieving serving info
Robust. Good error handling.

360-361: Checkpoint retention commentary
No issues.

379-379: Creating jobDatastream
Looks fine.

381-385: Metrics sink
Nice separation.

390-390: buildApi method
Private scoping is good.

hub/src/main/java/ai/chronon/hub/HubVerticle.java (2)

6-7: LGTM!

Imports align with the new drift analysis functionality.

63-66: Verify error handling in ConfHandler.

Routes look good. Consider verifying error handling for invalid configurations.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)

63-65: LGTM! Consistent client usage.

Good practice to pass bigQueryClient explicitly.

37-43: LGTM! Verify direct write method compatibility.

Safe changes with CREATE_NEVER disposition. Direct write method is more efficient.

✅ Verification successful

LGTM – Direct write and CREATE_NEVER are correctly set in GcpFormatProvider.scala.

"writeMethod" is "direct" (Line 38).

"createDisposition" uses CREATE_NEVER (Line 39).
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if any tests or configurations assume indirect write method
rg -l "writeMethod.*indirect" 
Length of output: 31

Script:
#!/bin/bash
# Verify new configuration uses direct writeMethod and CREATE_NEVER disposition
rg -n 'writeMethod\s*->\s*"direct"'
rg -n 'CREATE_NEVER'
Length of output: 217

Script:
#!/bin/bash
# Check for explicit "direct" usage in writeMethod.
grep -n '-> "direct"' cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
Length of output: 217

Script:
#!/bin/bash
# Confirm direct writeMethod config using PCRE.
grep -n -P "\"writeMethod\"\\s*->\\s*\"direct\"" cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
# Confirm CREATE_NEVER config.
grep -n -F 'CREATE_NEVER' cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
Length of output: 348
flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (2)

21-25: Encoder creation clarity.
Method is concise. Ensure AvroDataToCatalyst handles schema changes gracefully.

71-80: Check null returns.
Returning null after errors can cause confusion in downstream processing.

hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (2)

63-66: Handle empty result more robustly.
headOption.getOrElse(...) returns an empty series. Check if that's correct.

68-79: Offset parsing structure.
Implementation is fine, but consider rejecting unrecognized suffixes more explicitly.
api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: Hidden files logic bug.
You likely want to skip hidden files if f.startswith('.') is true. This logic is reversed.
-if not f.startswith("."):  # ignore hidden files
-    continue
+if f.startswith("."):
+    continue
hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (3)

14-16: Class initialization looks good
Sufficient logging is used.

53-71: List retrieval
Straightforward approach; no issues found.

76-109: Search logic
Efficient usage of filter; ensure large registries are handled.

flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (3)

28-48: Async wait
Proper concurrency parameters. Looks good.

99-105: Timeout handling
Logs error, increments counter, sets status false. Good minimal approach.

107-131: Multi-put logic
Handles success/failure well; logging is precise.

frontend/src/lib/api/api.ts (14)

2-2: Additional imports
Well-structured references.

Also applies to: 4-8, 13-14

48-50: getJoin
Simple wrapper, no issues.

52-54: getGroupBy
Mirrors getJoin. OK.

56-58: getModel
Similar pattern, no concerns.

60-62: getStagingQuery
Consistent usage of getConf.

64-68: search
Invokes a single endpoint. Good approach.

135-140: getConfList
Uses confType param. Looks correct.

142-144: getJoinList
Retrieve from ConfType.JOIN. Perfect.

146-148: getGroupByList
Works similarly.

150-152: getModelList
No issues.

154-156: getStagingQueryList
Consistency maintained.

158-172: getJoinDrift
Builds URL params well.

174-191: getColumnDrift
Similar structure, no concerns.

193-201: getColumnSummary
Fetches summary data similarly.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (4)

5-11: Imports look appropriate.
No issues noted with these BigQuery-related imports.

17-18: Case class extension is fine.
The constructor parameter bqClient is a clear approach for BigQuery.

29-65: Creation logic seems correct.
Aligned with the external check for table existence, and consistent with the retrieved learnings.

139-142: Unsupported operations are clearly marked.
No further concerns.

flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (7)

8-9: New imports are correct.
These support the new output type design.

92-92: Good move to AvroCodecOutput.
Replaces PutRequest nicely.

104-104: flatMap signature updated.
Applies AvroCodecOutput in collector—works well.

116-120: Map-to-AvroCodecOutput logic looks valid.
Key/value bytes are handled confidently.

132-132: TiledAvroCodecFn also updated to AvroCodecOutput.
Consistent approach.

143-143: flatMap for tiles updated.
Matches new output type usage.

155-173: Tile-to-AvroCodecOutput conversion is fine.
Retains debug logging; no issue found.

flink/src/main/scala/ai/chronon/flink/TestFlinkJob.scala (7)

5-7: Import updates support row-based logic.
No issues noted.

Also applies to: 11-12, 15-15, 18-22, 25-26, 28-28, 33-36

49-73: E2EEventSource now yields Row.
Implementation is straightforward and uses parallelism well.

80-80: Time difference calculation is fine.
No changes needed.

85-92: DummyInitializationContext is minimal and functional.
Works for the tests.

95-106: Schema definitions appear correct.
Switch to DataType-based fields is consistent.

108-120: makeSource logic is coherent.
Encodes to Avro and decodes into Spark Rows effectively.

197-212: buildTestFlinkJob approach is consistent.
Builds a Flink pipeline with row-based sources.

frontend/src/lib/components/NavigationBar.svelte (3)

25-30: Imports look good.

52-52: Initialization is correct.

60-69: Merged arrays for consistent typed results.
online/src/main/scala/ai/chronon/online/FetcherBase.scala (4)

47-47: New import for ExecutionContext is valid.

327-364: Filtering out invalid requests improves safety.

613-644: Updated response parsing with fallback logic is clear.

382-382: Confirm usage of .toSeq after this line

Ensure no unexpected map iteration breakage.

✅ Verification successful

Confirmed: .toSeq usage in FetcherBase is intact.
The shell script output indicates that the call remains present in online/src/main/scala/ai/chronon/online/FetcherBase.scala, aligning with similar conversions elsewhere. No unexpected iteration issues were found.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Searching suspicious references in the file to confirm usage
rg -A 3 "toSeq" 
Length of output: 47736
api/py/ai/chronon/repo/run.py (6)

33-33: Enum import helps new job types.

135-137: New jar constants for Flink and Dataproc.

139-141: New job type enum broadens streaming support.

419-426: New fields handle Flink streaming params.

430-431: Conditional fetch logic is consistent.

983-984: Extended signature for new streaming params is fine.

frontend/src/routes/stagingqueries/+page.svelte (1)

1-7: LGTM!

Clean and consistent with other route components.

frontend/src/routes/models/+page.svelte (1)

1-7: LGTM!

Consistent implementation.

frontend/src/routes/groupbys/+page.svelte (1)

1-7: LGTM!

Matches the pattern used in other routes.

frontend/src/routes/joins/+page.svelte (1)

15-15: LGTM!

The LogicalNodeTable component is used correctly with the required props.

flink/src/main/scala/ai/chronon/flink/SchemaProvider.scala (1)

8-17: LGTM!

Well-structured abstract class with clear documentation and purpose.

tools/build_rules/dependencies/spark_repository.bzl (1)

21-24: LGTM! Good catch on the version conflict.

The exclusion prevents NoSuchMethodError by avoiding commons-cli 1.2 vs 1.5.0 conflict with Flink runtime.

tools/build_rules/flink/BUILD (2)

1-1: LGTM! Public visibility is appropriate for build rules.

5-19: Verify the main_class hack.

The hack comment needs explanation.

Add a comment explaining why None is used as main_class.

api/src/main/scala/ai/chronon/api/TilingUtils.scala (1)

17-25: LGTM! Thread-local serializers prevent concurrency issues.

Good use of ThreadLocal to handle concurrent serialization safely.

flink/src/test/resources/user.avsc (1)

1-59: Well-structured and documented schema.

The Avro schema is comprehensive with appropriate field types and documentation.

frontend/src/lib/api/api.test.ts (1)

33-35: LGTM!

Test cases are well-structured and correctly updated to reflect the API changes.

Also applies to: 50-50, 61-61
frontend/src/lib/types/Entity/Entity.ts (1)

55-56: Track workaround for future removal.

Consider creating a ticket to track the removal of this workaround once the Asana task is resolved.

✅ Verification successful

Reminder: Create a tracking ticket for workaround removal.

The workaround in frontend/src/lib/types/Entity/Entity.ts is confirmed.

Similar "todo" and workaround comments appear elsewhere.

Recommend creating a ticket to track its removal once the Asana task is resolved.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for other workarounds
rg -i "workaround|hack|todo|fixme" --type ts
Length of output: 933
api/py/ai/chronon/cli/compile/parse_configs.py (1)

14-20: LGTM!

Well-structured dataclass with clear field definitions.
api/py/ai/chronon/cli/entrypoint.py (2)

27-34: LGTM!

Well-structured CLI commands with clear documentation and appropriate default values.

Also applies to: 36-59, 61-67

10-19: ⚠️ Potential issue

Specify exception type in get_current_branch.
 def get_current_branch():
     try:
         return (
             subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
             .decode("utf-8")
             .strip()
         )
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

17-17: Do not use bare except

(E722)
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (1)

24-27: LGTM!

Clean migration from PutRequest to AvroCodecOutput with consistent test assertions.

Also applies to: 46-49

spark/src/main/scala/ai/chronon/spark/streaming/KafkaStreamBuilder.scala (1)

21-21: LGTM!

Clean addition of TopicChecker import.
frontend/src/routes/joins/[slug]/services/joins.service.ts (2)

2-2: LGTM! Import changes align with the new API structure.

Also applies to: 5-5

44-46: Verify the optional chaining.

The change from models.items to models.models?.find() suggests a potential breaking change in the API response structure.

✅ Verification successful

Optional Chaining Verified
The search confirms only the use of models.models?.find(), matching the updated API structure. No evidence suggests any misuse of optional chaining.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for other usages of the models property
rg -A 2 'models\.(items|models)' 
Length of output: 324
flink/BUILD.bazel (2)

15-15: LGTM! Kafka client dependency added.

60-76: LGTM! Binary targets properly configured.

The exclusion of runtime dependencies via deploy_env prevents version conflicts.
api/py/ai/chronon/cli/compile/parse_teams.py (4)

13-38: LGTM! Team loading implementation.

Clear error handling and logging of found teams.

74-111: LGTM! Helper functions for merging.

The merge functions handle None cases properly.

4-4: 🛠️ Refactor suggestion

Remove unused import.

The from_file import is not used in this module.
-from ai.chronon.cli.compile.parse_configs import from_file
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

40-72: Verify metadata update logic.

The metadata update assumes specific object structure. Ensure all objects passed to this function have the required fields.

✅ Verification successful

Verified metadata update logic.

The sole call in compiler.py passes co.obj, and update_metadata enforces that obj.metaData (with its name and team fields) is non-null.

No usage contradicts the required structure.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for calls to update_metadata
rg -A 3 'update_metadata\(' 
Length of output: 692
tools/build_rules/prelude_bazel (1)

41-42: LGTM! Flink dependencies added.

The Kafka connector and Avro dependencies align with the changes in BUILD.bazel.

api/py/ai/chronon/cli/compile/compiler.py (1)

111-116: Add error handling for file operations.

File operations should handle IOError and OSError exceptions.

api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (1)

1-79: LGTM! Well-structured tests with good edge case coverage.

Tests handle nulls, special values, and edge cases thoroughly.

flink/src/test/scala/ai/chronon/flink/test/SchemaRegistrySchemaProviderSpec.scala (1)

1-60: LGTM! Comprehensive schema registry tests.

Good coverage of success and failure cases with proper mocking.

spark/src/main/scala/ai/chronon/spark/format/Format.scala (2)

10-10: LGTM!

Logger initialization follows best practices with @transient and lazy val.

77-89: Similar error handling needed here.

Same error handling improvement as suggested for createTable should be applied here.

api/thrift/hub.thrift (1)

126-133: Add validation for branch/version exclusivity.

The comment states that either branch or version should be set, but this isn't enforced. Consider adding validation in the handler.
api/py/ai/chronon/cli/git_utils.py (4)

33-34: Specify exception type.
-        except:
+        except OSError:
            pass
🧰 Tools

🪛 Ruff (0.8.2)

33-33: Do not use bare except

(E722)

73-74: Specify exception type.
-    except:
+    except OSError:
        return None
🧰 Tools

🪛 Ruff (0.8.2)

73-73: Do not use bare except

(E722)

77-128: LGTM!

Good error handling and logging implementation.

130-157: LGTM!

Well-structured implementation with proper error handling.
flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (4)

20-36: LGTM!

Well-documented with proper equals/hashCode implementation.

47-65: LGTM!

Well-documented with proper equals/hashCode implementation.

71-91: LGTM!

Well-documented with proper equals/hashCode implementation.

96-120: LGTM!

Well-documented with proper equals/hashCode implementation.

flink/src/test/scala/org/apache/spark/sql/avro/AvroDeSerializationSupportSpec.scala (2)

23-83: LGTM!

Well-structured test cases with proper resource cleanup.

86-125: LGTM!

Well-structured helper object with comprehensive test data creation.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (1)

42-131: LGTM!

Formatting changes improve readability.

tools/build_rules/dependencies/maven_repository.bzl (2)

84-85: LGTM: Added Flink Kafka and Avro dependencies.

The additions align with the PR objectives for enhanced Flink support.

104-119: Well-documented Hadoop exclusions.

Clear explanation for excluding Hadoop dependencies to prevent IllegalAccessError.

api/thrift/observability.thrift (3)

141-142: LGTM: Added default drift metric.

Setting JENSEN_SHANNON as default aligns with industry standards per the documentation.

144-162: LGTM: Well-structured JoinDriftRequest and JoinDriftResponse.

Clean interface design with required and optional fields.

157-162: LGTM: Consistent JoinSummaryRequest structure.

Follows the same pattern as JoinDriftRequest.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (1)

27-27: LGTM: Added date type conversion support.

Proper handling of Avro date logical types during serialization.

Also applies to: 50-50

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (3)

98-99: LGTM: Using magic value instead of null.

More robust handling of missing values.

117-133: LGTM: Improved validation in collectDoubles.

Better tracking of valid inputs with sawValidInput flag.

147-155: LGTM: Enhanced handling of special values.

Properly handles NaN and Infinite values by converting to magic values.

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (1)

31-44: Add null checks for WriteResponse fields.

Validate keyBytes, valueBytes, and tsMillis before access to prevent NPEs.

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

87-94: LGTM!

The filtering logic ensures that only relevant keys are processed.

online/src/main/java/ai/chronon/online/JavaFetcher.java (1)

27-27: LGTM!

Clean builder pattern implementation with good example usage.

Also applies to: 39-48, 50-119

hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (1)

1-296: LGTM!

Comprehensive test coverage with good use of mocking and edge cases.

online/src/main/scala/ai/chronon/online/AvroConversions.scala (1)

20-20: LGTM!

Clean date type support with good documentation.

Also applies to: 29-29, 37-44, 58-63, 121-122

spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

83-83: LGTM!

Adding java.time.LocalDate to Kryo serializer is necessary for date serialization.

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (1)

114-115: LGTM!

Improved null safety using Option to handle potential null values from getPercentileDriftSeries.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (3)

69-70: LGTM!

Added savepoint support for Flink jobs.

193-198: LGTM!

Well-organized argument prefixes improve code readability.

246-257: LGTM!

Clean pattern matching for job type handling.

service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (2)

90-103: LGTM! Improved error handling and serialization.

The new implementation provides better error handling and uses a consistent serialization approach with TSimpleJSONProtocol.

5-5: LGTM!

Import required for JSON serialization.

online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (3)

229-240: LGTM! Good test coverage for the happy path.

Test verifies basic fetch functionality with proper assertions.

242-255: LGTM! Good null handling test.

Test ensures proper behavior with null keys.

257-270: LGTM! Good error case coverage.

Test verifies proper handling of missing keys.

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3)

116-116: LGTM! Consistent null handling.

Changes standardize null handling using Constants.magicNull values.

Also applies to: 254-255, 257-258, 313-313

317-334: LGTM! Good edge case coverage.

Test verifies handling of Long.MAX_VALUE and magicNullLong values.

352-373: LGTM! Comprehensive null handling tests.

Test covers various combinations of null and non-null fields.

spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)

74-74: LGTM! Better method naming and caching.

Changes improve clarity with better method naming and add efficient caching with lazy val.

Also applies to: 111-111

hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala (1)

122-130: LGTM! Improved null handling using Option.

The changes enhance safety by using Option and map methods instead of direct null checks.

api/thrift/api.thrift (2)

8-8: LGTM! Updated thrift generation command.

The command path has been updated to use relative paths.

441-457: LGTM! Well-structured data types for configuration.

The new EnvironmentVariables and Team structs provide a clean way to manage environment variables and team information.

spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (4)

26-26: LGTM! Consolidated Spark SQL imports.

Using a wildcard import for Spark SQL classes improves readability.

34-34: LGTM! Added TestRecord case class.

Simple and clean test data structure.

487-500: LGTM! Added test for empty DataFrame handling.

Good coverage for edge case handling.

480-481: LGTM! Standardized table creation with file format.

Consistent use of PARQUET format for table creation.

Also applies to: 521-522

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)

386-392: LGTM! Added SQL string escaping.

Properly escaping single quotes prevents SQL injection vulnerabilities.

api/py/ai/chronon/cli/compile/validator.py (4)

22-32: LGTM!

The new imports are well-organized and necessary for the added functionality.

183-213: LGTM!

The new helper functions are well-structured and encapsulate specific validation logic.

220-242: LGTM!

The constructor changes improve parameter organization and initialization.

474-479: LGTM!

The error messages are clear and well-formatted.
spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1)

292-293: Verify message bus compatibility.

The change from topicType to messageBus might affect existing topic configurations.

✅ Verification successful

Message bus usage verified.

No topicType found; all code now uses messageBus.

Consistent usage across modules (spark, flink, online).
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for any remaining uses of topicType
rg -A 2 "topicType"

# Search for messageBus usage patterns
rg -A 2 "messageBus"
Length of output: 1905
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)

236-238: LGTM!

The variable renaming from leftDf to statsDf improves code clarity.

online/src/main/scala/ai/chronon/online/Fetcher.scala (1)

99-107: LGTM!

The addition of executionContextOverride provides good flexibility for execution context management.

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

188-188: LGTM! Appropriate logging level change.

Changed logging level from error to info since these are expected states, not errors.

Also applies to: 197-197

spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

299-299: LGTM! Better method name.

Changed from timeRange to calculateTimeRange for better clarity.

spark/src/main/scala/ai/chronon/spark/Driver.scala (1)

630-632: LGTM! Good refactoring.

Simplified code by delegating to FetcherMain and reusing FetcherArgs.

.github/workflows/test_frontend.yaml (1)

39-39: LGTM! Action version bump.

Updated cache-apt-pkgs-action to latest version v1.4.3.

api/py/test/sample/teams.json (2)

18-18: GCP Cluster Name Updated
GCP_DATAPROC_CLUSTER_NAME is now set to "zipline-canary-cluster".

64-64: Quickstart Namespace Updated
Namespace changed to "canary-443022.data" to match the new project structure.

distribution/build_and_upload_gcp_artifacts.sh (3)

44-45: New Build Steps Added
Added sbt commands for Flink and Service assemblies.

49-50: Jar Path Assignments
FLINK_JAR and SERVICE_JAR variables are assigned; verify the paths align with your build outputs.

62-70: Jar Existence Checks
Checks for SERVICE_JAR and FLINK_JAR help catch build failures early.

distribution/run_zipline_quickstart.sh (1)

20-27: Git Config Checkout
Git clone and branch checkout steps look correct; ensure SSH key access is properly configured.

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

frontend/src/routes/joins/+page.server.ts (1)

1-4: LGTM! Clean refactor to use centralized config loader.

Consistent with other route handlers.

frontend/src/routes/models/+page.server.ts (1)

1-4: LGTM! Follows the established pattern.

Identical to other route handlers.

frontend/src/routes/groupbys/+page.server.ts (1)

1-4: LGTM! Maintains consistency with other routes.

Same implementation as other handlers.
frontend/src/routes/stagingqueries/+page.server.ts (1)

1-4: LGTM! Completes the unified approach.

Matches the pattern used in other routes.

Let's verify the implementation of the shared loadConfList function:

✅ Verification successful

LoadConfList Implementation Verified
The implementation of loadConfList in frontend/src/lib/server/conf-loader.ts matches the expected unified approach. The function correctly extracts the pathname, finds the corresponding entity, handles missing matches, wraps the API call in a try/catch, and returns the required structure.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the implementation of loadConfList
ast-grep --pattern 'export function loadConfList($_) {
  $$$
}'
Length of output: 2103

coderabbitai · 2025-02-04T03:48:38Z

online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala

+    val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
+    val resultJson = gson.toJson(result.head)


⚠️ Potential issue

Add null safety checks.

Direct use of .head could throw if empty. Consider using .headOption or handling empty results.

- val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]]) - val resultJson = gson.toJson(result.head) + val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]]) + val resultJson = result.headOption.map(gson.toJson).getOrElse("[]")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])

val resultJson = gson.toJson(result.head)

val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])

val resultJson = result.headOption.map(gson.toJson).getOrElse("[]")

coderabbitai · 2025-02-04T03:48:38Z

online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala

+    val resList = cu.performSql(CommonScalarsRow)
+    assertEquals(resList.size, 1)
+    val resMap = resList.head
+    assertEquals(resMap.size, 2)


💡 Codebase verification

Empty results are not handled in performSql implementations

The CatalystUtil.scala implementations of performSql don't include an empty-result check.

The test in CatalystUtilHiveUDFTest.scala assumes a non-empty result by calling head, which can lead to runtime errors.

🔗 Analysis chain

Verify error handling for empty results.

The code assumes the list will always have at least one element.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for error handling in performSql implementation ast-grep --pattern 'def performSql($$$) { $$$ if ($results.isEmpty) { $$$ } $$$ }'

Length of output: 96

Script:

#!/bin/bash # Show context of performSql definition. rg -A 12 "def performSql$" --no-heading echo "-----" # Look for any empty results error handling rg "if\s*\(.*\.isEmpty$" --no-heading -n

Length of output: 13789

coderabbitai · 2025-02-04T03:48:38Z

flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala

+    parsedSchema.schemaType() match {
+      case AvroSchema.TYPE =>
+        val schema = parsedSchema.asInstanceOf[AvroSchema]
+        AvroDeserializationSupport.build(topicInfo.name, schema.canonicalString(), schemaRegistryWireFormat = true)
+      case _ => throw new IllegalArgumentException(s"Unsupported schema type: ${parsedSchema.schemaType()}")


🛠️ Refactor suggestion

Plan for more schema types.
This code currently only handles Avro. If support for other formats is needed later, consider a pluggable design.

coderabbitai · 2025-02-04T03:48:38Z

online/src/main/scala/ai/chronon/online/TopicChecker.scala

+  def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {
+    val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
    val adminClient = AdminClient.create(props)
    val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic);
    topicDescription.get().partitions().size()
  }


🛠️ Refactor suggestion

Close admin client.
Ensure the adminClient is closed to free resources.

def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = { val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap)) val adminClient = AdminClient.create(props) val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic) val partitions = topicDescription.get().partitions().size() + adminClient.close() partitions }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {

val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))

val adminClient = AdminClient.create(props)

val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic);

topicDescription.get().partitions().size()

}

def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {

val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))

val adminClient = AdminClient.create(props)

val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic)

val partitions = topicDescription.get().partitions().size()

adminClient.close()

partitions

}

coderabbitai · 2025-02-04T03:48:39Z

online/src/main/scala/ai/chronon/online/FetcherBase.scala

+                  executionContextOverride: ExecutionContext = null)
+    extends MetadataStore(kvStore, metaDataSet, timeoutMillis, executionContextOverride)


⚠️ Potential issue

Avoid null ExecutionContext.

Use a default or require one:

-def executionContextOverride: ExecutionContext = null +def executionContextOverride: ExecutionContext = ExecutionContext.global

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-02-04T03:48:40Z

spark/src/main/scala/ai/chronon/spark/format/Format.scala

+  def createTable(df: DataFrame,
+                  tableName: String,
+                  partitionColumns: Seq[String],
+                  tableProperties: Map[String, String],
+                  fileFormat: String): (String => Unit) => Unit = {
+
+    def inner(df: DataFrame,
+              tableName: String,
+              partitionColumns: Seq[String],
+              tableProperties: Map[String, String],
+              fileFormat: String)(sqlEvaluator: String => Unit) = {
+      val creationSql =
+        createTableSql(tableName,
+                       df.schema,
+                       partitionColumns,
+                       tableProperties,
+                       fileFormatString(fileFormat),
+                       createTableTypeString)
+
+      sqlEvaluator(creationSql)
+
+    }
+
+    inner(df, tableName, partitionColumns, tableProperties, fileFormat)
+
+  }


🛠️ Refactor suggestion

Add error handling for SQL execution.

The SQL execution could fail. Consider wrapping the sqlEvaluator call in a Try block.

def inner(df: DataFrame, tableName: String, partitionColumns: Seq[String], tableProperties: Map[String, String], fileFormat: String)(sqlEvaluator: String => Unit) = { val creationSql = createTableSql(tableName, df.schema, partitionColumns, tableProperties, fileFormatString(fileFormat), createTableTypeString) - sqlEvaluator(creationSql) + Try(sqlEvaluator(creationSql)) match { + case Success(_) => logger.info(s"Successfully created table $tableName") + case Failure(e) => + logger.error(s"Failed to create table $tableName", e) + throw e + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def createTable(df: DataFrame,

tableName: String,

partitionColumns: Seq[String],

tableProperties: Map[String, String],

fileFormat: String): (String => Unit) => Unit = {

def inner(df: DataFrame,

tableName: String,

partitionColumns: Seq[String],

tableProperties: Map[String, String],

fileFormat: String)(sqlEvaluator: String => Unit) = {

val creationSql =

createTableSql(tableName,

df.schema,

partitionColumns,

tableProperties,

fileFormatString(fileFormat),

createTableTypeString)

sqlEvaluator(creationSql)

}

inner(df, tableName, partitionColumns, tableProperties, fileFormat)

}

def createTable(df: DataFrame,

tableName: String,

partitionColumns: Seq[String],

tableProperties: Map[String, String],

fileFormat: String): (String => Unit) => Unit = {

def inner(df: DataFrame,

tableName: String,

partitionColumns: Seq[String],

tableProperties: Map[String, String],

fileFormat: String)(sqlEvaluator: String => Unit) = {

val creationSql =

createTableSql(tableName,

df.schema,

partitionColumns,

tableProperties,

fileFormatString(fileFormat),

createTableTypeString)

Try(sqlEvaluator(creationSql)) match {

case Success(_) => logger.info(s"Successfully created table $tableName")

case Failure(e) =>

logger.error(s"Failed to create table $tableName", e)

throw e

}

}

inner(df, tableName, partitionColumns, tableProperties, fileFormat)

}

coderabbitai · 2025-02-04T03:48:40Z

online/src/main/scala/ai/chronon/online/FetcherMain.scala

+    val keyMapList =
+      if (args.keyJson.isDefined) {
+        Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))
+      } else {
+        logger.info(s"Reading requests from ${args.keyJsonFile()}")
+        val file = Source.fromFile(args.keyJsonFile())
+        val mapList = file.getLines().map(json => readMap(json)).toList
+        file.close()
+        mapList
+      }


⚠️ Potential issue

Add input validation for JSON files.

Validate JSON file path and content before processing.

val keyMapList = if (args.keyJson.isDefined) { + validateJson(args.keyJson()) Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson()))) } else { logger.info(s"Reading requests from ${args.keyJsonFile()}") + validateFilePath(args.keyJsonFile()) val file = Source.fromFile(args.keyJsonFile()) val mapList = file.getLines().map(json => readMap(json)).toList file.close() mapList }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val keyMapList =

if (args.keyJson.isDefined) {

Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))

} else {

logger.info(s"Reading requests from ${args.keyJsonFile()}")

val file = Source.fromFile(args.keyJsonFile())

val mapList = file.getLines().map(json => readMap(json)).toList

file.close()

mapList

}

val keyMapList =

if (args.keyJson.isDefined) {

validateJson(args.keyJson())

Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))

} else {

logger.info(s"Reading requests from ${args.keyJsonFile()}")

validateFilePath(args.keyJsonFile())

val file = Source.fromFile(args.keyJsonFile())

val mapList = file.getLines().map(json => readMap(json)).toList

file.close()

mapList

}

coderabbitai · 2025-02-04T03:48:40Z

online/src/main/scala/ai/chronon/online/FetcherMain.scala

+    def impl(props: Map[String, String]): Api = {
+      val urls = Array(new File(onlineJar()).toURI.toURL)
+      val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)
+      val cls = cl.loadClass(onlineClass())
+      val constructor = cls.getConstructors.apply(0)
+      val onlineImpl = constructor.newInstance(props)
+      onlineImpl.asInstanceOf[Api]
+    }


⚠️ Potential issue

Implement security measures for class loading.

Current implementation has security risks:

Unrestricted class loading

No validation of jar path

Add security checks:

def impl(props: Map[String, String]): Api = { + // Validate jar path is in allowed directory + val allowedPath = new File("/path/to/allowed/jars") + val jarFile = new File(onlineJar()) + if (!jarFile.getCanonicalPath.startsWith(allowedPath.getCanonicalPath)) { + throw new SecurityException("Invalid jar location") + } val urls = Array(new File(onlineJar()).toURI.toURL) val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader) + // Validate class name against allowlist + if (!allowedClasses.contains(onlineClass())) { + throw new SecurityException("Invalid class") + } val cls = cl.loadClass(onlineClass()) val constructor = cls.getConstructors.apply(0) val onlineImpl = constructor.newInstance(props) onlineImpl.asInstanceOf[Api] }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def impl(props: Map[String, String]): Api = {

val urls = Array(new File(onlineJar()).toURI.toURL)

val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)

val cls = cl.loadClass(onlineClass())

val constructor = cls.getConstructors.apply(0)

val onlineImpl = constructor.newInstance(props)

onlineImpl.asInstanceOf[Api]

}

def impl(props: Map[String, String]): Api = {

// Validate jar path is in allowed directory

val allowedPath = new File("/path/to/allowed/jars")

val jarFile = new File(onlineJar())

if (!jarFile.getCanonicalPath.startsWith(allowedPath.getCanonicalPath)) {

throw new SecurityException("Invalid jar location")

}

val urls = Array(new File(onlineJar()).toURI.toURL)

val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)

// Validate class name against allowlist

if (!allowedClasses.contains(onlineClass())) {

throw new SecurityException("Invalid class")

}

val cls = cl.loadClass(onlineClass())

val constructor = cls.getConstructors.apply(0)

val onlineImpl = constructor.newInstance(props)

onlineImpl.asInstanceOf[Api]

}

coderabbitai · 2025-02-04T03:48:40Z

online/src/main/scala/ai/chronon/online/MetadataStore.scala

+class MetadataStore(kvStore: KVStore,
+                    val dataset: String = MetadataDataset,
+                    timeoutMillis: Long,
+                    executionContextOverride: ExecutionContext = null) {


🛠️ Refactor suggestion

Use Option instead of null default value.

Replace executionContextOverride: ExecutionContext = null with executionContextOverride: Option[ExecutionContext] = None.

-class MetadataStore(kvStore: KVStore, - val dataset: String = MetadataDataset, - timeoutMillis: Long, - executionContextOverride: ExecutionContext = null) { +class MetadataStore(kvStore: KVStore, + val dataset: String = MetadataDataset, + timeoutMillis: Long, + executionContextOverride: Option[ExecutionContext] = None) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class MetadataStore(kvStore: KVStore,

val dataset: String = MetadataDataset,

timeoutMillis: Long,

executionContextOverride: ExecutionContext = null) {

class MetadataStore(kvStore: KVStore,

val dataset: String = MetadataDataset,

timeoutMillis: Long,

executionContextOverride: Option[ExecutionContext] = None) {

coderabbitai · 2025-02-04T03:48:40Z

online/src/main/scala/ai/chronon/online/MetadataStore.scala

+  implicit val executionContext: ExecutionContext =
+    Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)


🛠️ Refactor suggestion

Update Option handling.

Update to use idiomatic Option handling.

- implicit val executionContext: ExecutionContext = - Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext) + implicit val executionContext: ExecutionContext = + executionContextOverride.getOrElse(FlexibleExecutionContext.buildExecutionContext)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

implicit val executionContext: ExecutionContext =

Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)

implicit val executionContext: ExecutionContext =

executionContextOverride.getOrElse(FlexibleExecutionContext.buildExecutionContext)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (14)

api/py/ai/chronon/cli/compile/serializer.py (4)
17-20: Remove unused import.

TJSONProtocolFactory is imported but never used.
-from thrift.protocol.TJSONProtocol import (
-    TSimpleJSONProtocolFactory,
-    TJSONProtocolFactory,
-)
+from thrift.protocol.TJSONProtocol import TSimpleJSONProtocolFactory
🧰 Tools

🪛 Ruff (0.8.2)

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)

78-82: Optimize integer type handling.

Combine integer type checks for better readability.
-        elif ttype == TType.I64:
-            ret = int(val)
-        elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE:
-            ret = int(val)
+        elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE):
+            ret = int(val)
🧰 Tools

🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

41-86: Add input validation.

Add type checking for input values to prevent runtime errors.
     def _convert(self, val, ttype, ttype_info):
+        if val is None:
+            return None
         if ttype == TType.STRUCT:
🧰 Tools

🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

93-99: Add docstring explaining binary format.

Document the binary format specification for better maintainability.
 def json2binary(json_str, thrift_class):
+    """Convert JSON to Thrift binary format.
+    
+    Args:
+        json_str: JSON string to convert
+        thrift_class: Target Thrift class
+    Returns:
+        bytes: Thrift binary format using TBinaryProtocolAccelerated
+    """
     thrift = json2thrift(json_str, thrift_class)
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused error import.
Line 2 imports error from logging but never uses it. Trim unnecessary imports to keep the code clean.

Apply this diff:
-from logging import error
🧰 Tools

🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Rename unused loop variable.
Rename sub_dirs to _sub_dirs to indicate it’s unused.
 for sub_root, sub_dirs, sub_files in os.walk(output_dir):
-    for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+    for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools

🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)
api/py/ai/chronon/cli/compile/fill_templates.py (1)
4-4: Remove unused import.
The import parse_teams is never referenced. It’s best to remove it.
-import ai.chronon.cli.compile.parse_teams as teams
🧰 Tools

🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)
api/py/ai/chronon/cli/compile/parse_configs.py (1)
4-4: Remove unused imports.
-import logging
 from typing import Any, Dict, List, Tuple
-import ai.chronon.cli.compile.parse_teams as teams
Also applies to: 6-6, 9-9

🧰 Tools

🪛 Ruff (0.8.2)

4-4: logging imported but unused

Remove unused import: logging

(F401)
api/py/ai/chronon/cli/entrypoint.py (2)
2-2: Remove unused imports.
-import sys
-from enum import Enum
Also applies to: 5-5

🧰 Tools

🪛 Ruff (0.8.2)

2-2: sys imported but unused

Remove unused import: sys

(F401)

49-53: Consider using explicit flags instead of scope.

Replace scope with explicit run_upstream and run_downstream flags for clarity.
api/py/test/test_git_utils.py (1)
5-5: Remove unused import.
-from pathlib import Path
🧰 Tools

🪛 Ruff (0.8.2)

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)
api/thrift/api.thrift (1)

254-254: Consider documenting the deprecation reason.

Add a comment explaining why the env field is deprecated.
api/py/ai/chronon/cli/compile/validator.py (1)
421-428: Consider simplifying nested if statements.

Combine the nested if statements into a single condition.
-        if join.metaData.online:
-            if offline_included_group_bys:
+        if join.metaData.online and offline_included_group_bys:
🧰 Tools

🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)
.github/workflows/test_scala_no_spark.yaml (1)

45-47: Consider removing unused SBT configuration.

The SBT_OPTS environment variable is no longer needed with Bazel.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 30aed3c and dc233e9.

📒 Files selected for processing (26)

.github/image/Dockerfile (2 hunks)
.github/workflows/test_scala_no_spark.yaml (1 hunks)
api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
api/py/ai/chronon/cli/compile/serializer.py (1 hunks)
api/py/ai/chronon/cli/compile/validator.py (12 hunks)
api/py/ai/chronon/cli/entrypoint.py (1 hunks)
api/py/ai/chronon/cli/git_utils.py (1 hunks)
api/py/ai/chronon/cli/logger.py (1 hunks)
api/py/ai/chronon/cli/logging.py (1 hunks)
api/py/ai/chronon/group_by.py (0 hunks)
api/py/ai/chronon/join.py (0 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/test/sample/teams.py (1 hunks)
api/py/test/test_git_utils.py (1 hunks)
api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
api/thrift/api.thrift (4 hunks)
api/thrift/common.thrift (1 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (8 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (9 hunks)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
tools/build_rules/thrift/thrift.bzl (0 hunks)

💤 Files with no reviewable changes (3)

api/py/ai/chronon/join.py
api/py/ai/chronon/group_by.py
tools/build_rules/thrift/thrift.bzl

✅ Files skipped from review due to trivial changes (2)

hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala
hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala

🚧 Files skipped from review as they are similar to previous changes (4)

api/thrift/common.thrift
api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
api/py/ai/chronon/cli/logger.py

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/test/sample/teams.py

8-8: Undefined name EnvironmentVariables

(F821)

45-45: Undefined name EnvironmentVariables

(F821)

66-66: Undefined name EnvironmentVariables

(F821)

api/py/ai/chronon/cli/compile/parse_teams.py

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

33-33: Do not use bare except

(E722)

73-73: Do not use bare except

(E722)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/validator.py

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compiler.py

20-20: Undefined name console

(F821)

25-25: Undefined name console

(F821)

28-28: Undefined name console

(F821)

28-28: f-string without any placeholders

Remove extraneous f prefix

(F541)

48-48: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/ai/chronon/cli/compile/fill_templates.py

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_configs.py

4-4: logging imported but unused

Remove unused import: logging

(F401)

6-6: typing.Dict imported but unused

Remove unused import

(F401)

6-6: typing.Tuple imported but unused

Remove unused import

(F401)

9-9: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/serializer.py

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/ai/chronon/cli/entrypoint.py

2-2: sys imported but unused

Remove unused import: sys

(F401)

5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)

17-17: Do not use bare except

(E722)

api/py/test/test_git_utils.py

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: other_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: build-and-push

🔇 Additional comments (40)

api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: ⚠️ Potential issue

Fix inverted logic for ignoring hidden files.
Currently, the code processes only hidden files. Correct it to skip hidden files.
-if not f.startswith("."):  # ignore hidden files - such as .DS_Store
+if f.startswith("."):  # ignore hidden files - such as .DS_Store
     continue
Likely invalid or redundant comment.
api/py/ai/chronon/cli/logging.py (1)

37-39: Prevent duplicate log handlers.
Clear existing handlers when creating a logger to avoid multiple handler attachments.
 logger = logging.getLogger(__name__)
+logger.handlers.clear()
 logger.addHandler(handler)
 logger.setLevel(log_level)
api/py/ai/chronon/cli/compile/parse_configs.py (2)

14-20: LGTM! Well-structured dataclass.

The CompiledObj dataclass effectively encapsulates compilation results with clear field definitions.

54-55: ⚠️ Potential issue

Add path validation.
 module_name = rel_path_without_extension.replace("/", ".")
+if not all(part.isidentifier() for part in module_name.split(".")):
+    raise ValueError(f"Invalid module name: {module_name}")
+if ".." in module_name:
+    raise ValueError(f"Path traversal detected in: {module_name}")
 conf_type, team_name, rest = module_name.split(".", 2)
Likely invalid or redundant comment.
api/py/ai/chronon/cli/entrypoint.py (1)

17-18: ⚠️ Potential issue

Specify exception type.
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

17-17: Do not use bare except

(E722)
api/py/test/sample/teams.py (1)

1-1: ⚠️ Potential issue

Add missing import.
-from ai.chronon.api.ttypes import Team
+from ai.chronon.api.ttypes import Team, EnvironmentVariables
Likely invalid or redundant comment.
api/py/test/test_git_utils.py (2)

13-24: LGTM! Well-structured fixture.

The git_repo fixture properly handles setup and cleanup.

26-83: LGTM! Comprehensive test coverage.

Test cases thoroughly verify Git utility functions with proper assertions.
api/py/ai/chronon/cli/compile/parse_teams.py (5)

13-37: LGTM!

The function correctly loads and validates team configurations.

40-71: LGTM!

The function correctly updates and validates metadata.

74-90: LGTM!

The function correctly merges dictionaries with proper override behavior.

92-111: LGTM!

The function correctly merges environment variables.

4-4: 🛠️ Refactor suggestion

Remove unused import.
-from ai.chronon.cli.compile.parse_configs import from_file
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)
api/py/ai/chronon/cli/compile/compiler.py (1)

86-88: ⚠️ Potential issue

Fix success ratio calculation.

The denominator should use compiled_objects instead of compile_result.obj_dict.
-                f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}"
+                f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"
Likely invalid or redundant comment.
api/py/ai/chronon/cli/git_utils.py (6)

44-59: LGTM!

The function correctly retrieves and validates the merge base.

61-68: LGTM!

The function correctly retrieves file content from a specific commit.

77-127: LGTM!

The function correctly retrieves and validates changed files.

130-157: LGTM!

The function correctly identifies and validates changes since fork point.

33-34: 🛠️ Refactor suggestion

Specify exception type.
-        except:
+        except (OSError, IOError):
             pass
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

33-33: Do not use bare except

(E722)

73-74: 🛠️ Refactor suggestion

Specify exception type.
-    except:
+    except (OSError, IOError):
         return None
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

73-73: Do not use bare except

(E722)
api/py/ai/chronon/repo/compile.py (2)

225-277: LGTM!

The function correctly validates and writes objects with proper error handling.

279-291: LGTM!

The function correctly serializes and writes objects as JSON.

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (4)

58-61: LGTM! Array formatting improvements.

The consistent array formatting enhances readability.

Also applies to: 85-88, 110-114, 146-149, 220-223, 248-252, 288-291, 306-309, 327-331, 342-346, 365-369

317-334: LGTM! Good test coverage for edge cases.

Test case for Long.MAX_VALUE and magicNullLong values improves robustness.

336-350: LGTM! Good test coverage for null values.

Test case for handling all null Long values ensures consistent behavior.

352-373: LGTM! Good test coverage for mixed values.

Test case for mixed null and non-null Long fields improves completeness.

api/thrift/api.thrift (2)

441-446: LGTM! Well-structured environment variables.

Clean separation of environment variables by execution mode.

448-457: LGTM! Good team configuration structure.

Team struct provides essential fields for team management.

api/py/ai/chronon/cli/compile/validator.py (3)

183-191: LGTM! Good encapsulation of topic checks.

Clean helper function for checking topics in sources.

193-195: LGTM! Clear group-by topic check.

Simple and effective check for topics in group-by sources.

197-213: LGTM! Well-structured window check.

Clear logic for checking hourly windows in group-by aggregations.

.github/workflows/test_scala_no_spark.yaml (1)

51-51: LGTM! Clean transition to Bazel.

Consistent use of Bazel test commands across all modules.

Also applies to: 55-55, 59-59, 63-63, 67-67, 71-71, 75-75

.github/image/Dockerfile (8)

1-1: Base image update.
Switched from Alpine to Ubuntu.

6-6: Updated install command.
Using apt to install tools on Ubuntu.

22-34: Thrift dependencies.
Updated package names (e.g., build-essential, libboost-dev).

35-42: Thrift build.
Build commands and cleanup for Thrift look sound.

44-57: Cleanup commands.
Uses apt purge and apt autoremove appropriately.

60-60: Python packages install.
Installs testing/linting tools with the system break flag.

68-70: ARM Bazelisk setup.
Downloads Bazelisk for ARM and sets proper permissions.

71-73: JVM symlink.
Installs build-essential and creates JVM symlink for ARM.

coderabbitai · 2025-02-04T04:56:41Z

api/py/ai/chronon/cli/compile/compiler.py

+# Create a spinner with custom text
+with console.status("[bold green]Loading...", spinner="dots") as status:
+
+    # Simulate some work
+    for i in range(3):
+        time.sleep(1)
+        console.log(f"{i + 1}. Completed task")
+
+    time.sleep(1)
+    console.log(f"Done!")


⚠️ Potential issue

Move spinner implementation after console initialization.

The console variable is used before it's imported and initialized. Move this code block after line 46.

🧰 Tools

🪛 Ruff (0.8.2)

20-20: Undefined name console

(F821)

25-25: Undefined name console

(F821)

28-28: Undefined name console

(F821)

28-28: f-string without any placeholders

Remove extraneous f prefix

(F541)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

.github/image/Dockerfile (1)

71-73: Redundant Environment Setup
Reinstalling build-essential and re-linking JAVA may be redundant. Double-check if necessary.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between dc233e9 and 8c8c93c.

📒 Files selected for processing (1)

.github/image/Dockerfile (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (6)

.github/image/Dockerfile (6)

1-1: Base Image Update
Using Ubuntu as the base is clear.

6-6: APT Packages Installed
APT command is correct for Ubuntu.

22-34: Thrift Dependencies Setup
APT installs required build deps; verify the package list meets build needs.

35-42: Thrift Build Process
Thrift is built from source with minimal flags. Confirm these options suit requirements.

44-57: Cleanup Commands
APT purge, autoremove, and list removal are efficient. Ensure no runtime libs are removed.

60-60: Python Package Installation
The pip install command with --break-system-packages is acceptable.

coderabbitai · 2025-02-04T04:59:45Z

.github/image/Dockerfile

+RUN curl -fsSL "https://github.com/bazelbuild/bazelisk/releases/download/v1.18.0/bazelisk-linux-aarch64" -o /usr/local/bin/bazel
 ENV PATH="/usr/local/bin/bazel:${PATH}"


🛠️ Refactor suggestion

Bazel Installation Conflict
Two installations (v1.18.0 via curl and v1.19.0 via ADD) may conflict. Consolidate to one.

coderabbitai · 2025-02-04T04:59:46Z

.github/image/Dockerfile

+RUN chmod 755 /usr/local/bin/bazel
+
+RUN apt update && apt install -y build-essential


🛠️ Refactor suggestion

Bazelisk Redundancy
Using ADD for Bazelisk v1.19.0 after initial curl download causes redundancy.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/image/Dockerfile (1)

1-1: Pin Ubuntu Version.
Using ubuntu:latest risks unpredictable builds. Pin to a specific version (e.g., ubuntu:22.04).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 8c8c93c and 136dfbc.

📒 Files selected for processing (1)

.github/image/Dockerfile (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: table_utils_delta_format_spark_tests
GitHub Check: mutation_spark_tests
GitHub Check: join_spark_tests
GitHub Check: fetcher_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: build-and-push

🔇 Additional comments (7)

.github/image/Dockerfile (7)

6-6: Clear package installation.
Command is concise; verify all packages are needed.

18-18: Bazel PATH update.
Ensure Bazel binary is correctly prioritized. Already flagged in past reviews.

22-34: Thrift dependencies setup.
Apt install block is clear and complete.

35-42: Thrift build & cleanup.
Tar extraction and removal of temp files are well-handled.

44-57: Post-build cleanup.
Proper use of apt purge, autoremove, and list cleanup.

60-60: Python tools installation.
Pip installs essential testing and linting tools correctly.

68-70: Reinstall build-essential & JAVA symlink.
Reinstalling build-essential after cleanup and creating the JAVA symlink for ARM are sound.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

.github/image/Dockerfile (3)

1-1: Base Image Update.
Consider pinning a specific Ubuntu version instead of latest for reproducibility.

6-6: APT Installation.
APT update/install looks fine; consider cleaning the cache to reduce image size.

18-18: PATH Update.
Multiple ENV PATH modifications—verify the ordering is as intended.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 136dfbc and a4aac43.

📒 Files selected for processing (2)

.github/image/Dockerfile (2 hunks)
.github/workflows/test_scala_spark.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build-and-push

🔇 Additional comments (6)

.github/workflows/test_scala_spark.yaml (1)

40-40: Bazel test command is concise.
The Bazel command looks correct if the target exists.

.github/image/Dockerfile (5)

22-34: Thrift Dependencies.
APT dependency block is clear; consider non-interactive options if needed.

35-43: Thrift Build & Cleanup.
Build and cleanup commands look solid. Confirm the configure flags meet requirements.

44-57: APT Purge & Cleanup.
Properly removes build dependencies. Reinstalling build-essential later appears intentional.

60-60: Pip Installation.
Using --break-system-packages is acceptable; ensure compatibility with your environment.

68-71: Post-Cleanup Setup.
Reinstalling build-essential and setting up the Java symlink are clear—verify their necessity post-purge.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

.github/workflows/test_scala_spark.yaml (1)

25-25: Invalid runner label.
Change ubuntu-8_cores-32_gb to a supported label (e.g. ubuntu-latest-8-cores).

🧰 Tools

🪛 actionlint (1.7.4)

25-25: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🧹 Nitpick comments (4)

.github/workflows/test_scala_spark.yaml (1)

46-46: Remove trailing space.
Trim extra whitespace at the end of the --google_credentials line.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 46-46: trailing spaces

(trailing-spaces)

.github/image/Dockerfile (2)

59-59: Consider version pinning.
Pin versions for pytest, tox, and flake8 for better reproducibility.

68-69: Verify JAVA symlink.
Ensure the symlink for JAVA_HOME works reliably across environments.

.github/workflows/test_scala_no_spark.yaml (1)

53-53: Remove trailing spaces.
Trim all trailing whitespace in the Bazel command lines.

Also applies to: 60-60, 67-67, 74-74, 81-81, 88-88, 95-95

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 53-53: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a4aac43 and dbf7cc1.

📒 Files selected for processing (9)

.github/image/Dockerfile (3 hunks)
.github/workflows/build_and_push_docker.yaml (1 hunks)
.github/workflows/test_scala_no_spark.yaml (1 hunks)
.github/workflows/test_scala_spark.yaml (1 hunks)
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (1 hunks)
tools/build_rules/thrift/thrift.bzl (1 hunks)

✅ Files skipped from review due to trivial changes (3)

spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala
spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala

🚧 Files skipped from review as they are similar to previous changes (2)

spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
tools/build_rules/thrift/thrift.bzl

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/build_and_push_docker.yaml

15-15: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)

.github/workflows/test_scala_spark.yaml

[error] 46-46: trailing spaces

(trailing-spaces)

.github/workflows/test_scala_no_spark.yaml

[error] 53-53: trailing spaces

(trailing-spaces)

[error] 60-60: trailing spaces

(trailing-spaces)

[error] 67-67: trailing spaces

(trailing-spaces)

[error] 74-74: trailing spaces

(trailing-spaces)

[error] 81-81: trailing spaces

(trailing-spaces)

[error] 88-88: trailing spaces

(trailing-spaces)

[error] 95-95: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: scala_compile_fmt_fix
GitHub Check: build-and-push

coderabbitai · 2025-02-04T06:19:19Z

.github/workflows/build_and_push_docker.yaml

 jobs:
  build-and-push:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-8_cores-32_gb


⚠️ Potential issue

Invalid runner label.
Label ubuntu-8_cores-32_gb is not recognized. Use a valid label (e.g. ubuntu-latest-8-cores).

🧰 Tools

🪛 actionlint (1.7.4)

15-15: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/test_python.yaml (1)
37-38: Remove trailing whitespace.
Trailing whitespace detected on line 38.
-        run: |          
+        run: |
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 36d3a8b and 4979dbc.

📒 Files selected for processing (1)

.github/workflows/test_python.yaml (2 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

.github/workflows/test_python.yaml

[error] 38-38: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: other_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (2)

.github/workflows/test_python.yaml (2)

43-43: Explicit shell for lint step.
Using shell: bash here provides clarity.

52-52: Explicit shell for tests.
Consistent shell declaration improves readability.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (11)

api/py/ai/chronon/cli/compile/parse_teams.py (2)
73-89: Simplify map merging using dictionary comprehension.

Current implementation can be more concise.
-    result = {}
-    for m in maps:
-        if m is None:
-            continue
-        for key, value in m.items():
-            result[key] = value
-    return result
+    return {k: v for m in maps if m is not None for k, v in m.items()}
91-91: Add return type hint for clarity.

Function return type should be explicit.
-def _merge_environment_variables(*env_vars: List[EnvironmentVariables]):
+def _merge_environment_variables(*env_vars: List[EnvironmentVariables]) -> EnvironmentVariables:
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import
- from logging import error
🧰 Tools

🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Rename unused variable
Keep it as _sub_dirs to reflect it's unused.
- for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+ for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools

🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)
api/py/ai/chronon/cli/compile/conf_validator.py (4)
20-20: Remove unused import
- import os
🧰 Tools

🪛 Ruff (0.8.2)

20-20: os imported but unused

Remove unused import: os

(F401)

54-57: Inline condition
- if group_by.metaData.online or group_by.backfillStartDate:
-     return True
- else:
-     return False
+ return group_by.metaData.online or group_by.backfillStartDate
🧰 Tools

🪛 Ruff (0.8.2)

54-57: Return the condition directly

Inline condition

(SIM103)

356-359: Use single if
- if wild_card_derivation_included:
-     if derivation.expression in derived_columns:
-         derived_columns.remove(derivation.expression)
+ if wild_card_derivation_included and derivation.expression in derived_columns:
+     derived_columns.remove(derivation.expression)
🧰 Tools

🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)

421-427: Combine nested if
- if join.metaData.online:
-     if offline_included_group_bys:
-         errors.append(...)
+ if join.metaData.online and offline_included_group_bys:
+     errors.append(...)
🧰 Tools

🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)
api/py/ai/chronon/cli/compile/parse_configs.py (1)
5-5: Remove unused imports
- from typing import Any, Dict, List, Tuple
+ from typing import Any, List
🧰 Tools

🪛 Ruff (0.8.2)

5-5: typing.Dict imported but unused

Remove unused import

(F401)

5-5: typing.Tuple imported but unused

Remove unused import

(F401)
api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1)

78-78: Document aggregation operation value.

Operation value 3 should be documented or use a named constant for clarity.
.github/workflows/test_python.yaml (1)
38-38: Remove trailing spaces.
-        run: |          
+        run: |
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4979dbc and 82b73bd.

📒 Files selected for processing (12)

.github/image/Dockerfile (3 hunks)
.github/workflows/test_python.yaml (1 hunks)
api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
api/py/ai/chronon/cli/compile/conf_validator.py (1 hunks)
api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
api/py/ai/chronon/cli/entrypoint.py (1 hunks)
api/py/ai/chronon/group_by.py (0 hunks)
api/py/ai/chronon/join.py (0 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1 hunks)

💤 Files with no reviewable changes (2)

api/py/ai/chronon/join.py
api/py/ai/chronon/group_by.py

🚧 Files skipped from review as they are similar to previous changes (2)

api/py/ai/chronon/repo/compile.py
api/py/ai/chronon/cli/compile/fill_templates.py

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/conf_validator.py

20-20: os imported but unused

Remove unused import: os

(F401)

54-57: Return the condition directly

Inline condition

(SIM103)

356-357: Use a single if statement instead of nested if statements

(SIM102)

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/parse_configs.py

5-5: typing.Dict imported but unused

Remove unused import

(F401)

5-5: typing.Tuple imported but unused

Remove unused import

(F401)

api/py/ai/chronon/cli/entrypoint.py

15-15: Do not use bare except

(E722)

🪛 YAMLlint (1.35.1)

.github/workflows/test_python.yaml

[error] 38-38: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows
GitHub Check: build-and-push

🔇 Additional comments (7)

api/py/ai/chronon/cli/compile/parse_teams.py (2)

39-71: LGTM!

Good validation chain and proper environment variable merging.

31-34: Validate required Team attributes before adding to dictionary.

Add validation for required Team fields like outputNamespace.
     if isinstance(obj, Team):
+        require(obj.outputNamespace is not None, f"Team {name} missing required outputNamespace")
         obj.name = name
         logger.info(f"Found team {name}")
         team_dict[name] = obj
api/py/ai/chronon/cli/compile/parse_configs.py (1)

53-53: Check list size before splitting
Could raise ValueError if there aren’t enough parts.
 conf_parts = module_name.split(".", 2)
 if len(conf_parts) < 3:
     raise ValueError(f"Invalid module name: {module_name}")
api/py/ai/chronon/cli/entrypoint.py (4)

8-16: Catch specific exception in get_current_branch.
 def get_current_branch():
     try:
         return (
             subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
             .decode("utf-8")
             .strip()
         )
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"
🧰 Tools

🪛 Ruff (0.8.2)

15-15: Do not use bare except

(E722)

25-31: LGTM! Clean implementation of sync command.

59-64: LGTM! Clean implementation of deploy command.

46-51: 🛠️ Refactor suggestion

Replace scope with explicit run flags.

Per discussion, replace scope with explicit run_upstream and run_downstream flags for clarity.
-@click.option(
-    "--scope",
-    type=click.Choice(["upstream", "self", "downstream"]),
-    default="upstream",
-    help="Scope of configs to backfill",
-)
+@click.option("--run-upstream", is_flag=True, default=True, help="Run upstream configs")
+@click.option("--run-downstream", is_flag=True, default=False, help="Run downstream configs")
Likely invalid or redundant comment.

coderabbitai · 2025-02-04T07:26:37Z

api/py/ai/chronon/cli/compile/parse_teams.py

+        f"Team config file: {teams_file} not found. Please add one.",
+    )
+
+    team_module = importlib.import_module("teams")


⚠️ Potential issue

Use absolute import path for teams module.

Current import assumes teams.py is in PYTHONPATH. Use absolute path to avoid import errors.

- team_module = importlib.import_module("teams") + team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

team_module = importlib.import_module("teams")

team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")

coderabbitai · 2025-02-04T07:26:37Z

api/py/ai/chronon/cli/compile/compile_context.py

+                if not f.startswith("."):  # ignore hidden files - such as .DS_Store
+                    continue


⚠️ Potential issue

Fix reversed file-filter condition
Skipping all non-dot files contradicts the comment.

- if not f.startswith("."): # ignore hidden files - continue + if f.startswith("."): # ignore hidden files + continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not f.startswith("."): # ignore hidden files - such as .DS_Store

continue

if f.startswith("."): # ignore hidden files

continue

coderabbitai · 2025-02-04T07:26:37Z

.github/image/Dockerfile

+RUN apt update && apt install -y build-essential git
+RUN mkdir -p /usr/lib/jvm && ln -s /usr/lib/jvm/java-17-openjdk-amd64/ /usr/lib/jvm/default-jvm


🛠️ Refactor suggestion

Move Java symlink creation before verification.

Java verification may fail if symlink doesn't exist. Move these lines before the verification step.

coderabbitai · 2025-02-04T07:26:37Z

.github/image/Dockerfile

@@ -1,15 +1,9 @@
-FROM alpine:latest
+FROM ubuntu:latest



🛠️ Refactor suggestion

Pin Ubuntu version for reproducibility.

-FROM ubuntu:latest +FROM ubuntu:22.04

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FROM ubuntu:latest

FROM ubuntu:22.04

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

api/py/test/test_utils.py (1)
175-189: Avoid redefining test function.

test_get_related_table_names_for_group_bys is already defined at line 164.
-def test_get_related_table_names_for_group_bys():
+def test_get_related_table_names_for_group_bys_different_name():
🧰 Tools

🪛 Ruff (0.8.2)

175-175: Redefinition of unused test_get_related_table_names_for_group_bys from line 164

(F811)

🧹 Nitpick comments (8)

api/py/ai/chronon/cli/compile/conf_validator.py (4)
20-20: Remove unused import.

os is not used anywhere. Consider removing it.
-import os
🧰 Tools

🪛 Ruff (0.8.2)

20-20: os imported but unused

Remove unused import: os

(F401)

54-57: Inline the condition.

Directly return the boolean expression instead of using if-else.
-def _is_batch_upload_needed(group_by: GroupBy) -> bool:
-    if group_by.metaData.online or group_by.backfillStartDate:
-        return True
-    else:
-        return False
+def _is_batch_upload_needed(group_by: GroupBy) -> bool:
+    return group_by.metaData.online or group_by.backfillStartDate
🧰 Tools

🪛 Ruff (0.8.2)

54-57: Return the condition directly

Inline condition

(SIM103)

356-357: Merge nested if statements.

Combine to simplify and avoid redundancy.
-if wild_card_derivation_included:
-    if derivation.expression in derived_columns:
+if wild_card_derivation_included and derivation.expression in derived_columns:
     derived_columns.remove(derivation.expression)
🧰 Tools

🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)

421-422: Combine nested checks.

Shorten by joining conditions into a single if.
-if join.metaData.online:
-    if offline_included_group_bys:
+if join.metaData.online and offline_included_group_bys:
     errors.append(...)
🧰 Tools

🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import.

logging.error is never used.
-from logging import error
🧰 Tools

🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Rename unused variable.

sub_dirs is not used in the loop. Rename it to _sub_dirs.
-for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools

🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)
api/py/ai/chronon/cli/compile/compiler.py (2)
14-14: Remove unused import.

time is not used.
-import time
🧰 Tools

🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)

36-36: Remove unused variable.

status isn't used inside the context.
-with console.status("[bold green]Loading...", spinner="dots") as status:
+with console.status("[bold green]Loading...", spinner="dots"):
🧰 Tools

🪛 Ruff (0.8.2)

36-36: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 82b73bd and 19659a1.

📒 Files selected for processing (4)

api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
api/py/ai/chronon/cli/compile/conf_validator.py (1 hunks)
api/py/test/test_utils.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/cli/compile/conf_validator.py

20-20: os imported but unused

Remove unused import: os

(F401)

54-57: Return the condition directly

Inline condition

(SIM103)

356-357: Use a single if statement instead of nested if statements

(SIM102)

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/compiler.py

14-14: time imported but unused

Remove unused import: time

(F401)

36-36: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/test/test_utils.py

175-175: Redefinition of unused test_get_related_table_names_for_group_bys from line 164

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: other_spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (2)

api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: ⚠️ Potential issue

Fix reversed file-filter condition.

To skip hidden files, invert the check:
-if not f.startswith("."):  # ignore hidden files - such as .DS_Store
+if f.startswith("."):  # ignore hidden files - such as .DS_Store
    continue
Likely invalid or redundant comment.
api/py/ai/chronon/cli/compile/compiler.py (1)

74-75: ⚠️ Potential issue

Fix success ratio calculation.

It's showing X/X instead of X/Total.
-    f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}"
+    f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"
Likely invalid or redundant comment.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

api/py/test/test_utils.py (2)

171-174: Remove debug print and add error handling.

 def dopen(path):
     full_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), path)
-    print(full_path)
-    return open(full_path)
+    try:
+        return open(full_path)
+    except FileNotFoundError as e:
+        raise FileNotFoundError(f"Could not open {full_path}: {e}")

270-270: Remove debug print statement.

-    print("Current working directory:", os.getcwd())

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 19659a1 and 6a6921e.

📒 Files selected for processing (4)

api/py/test/test_utils.py (10 hunks)
api/py/test/test_validator.py (3 hunks)
api/py/tox.ini (1 hunks)
tools/build_rules/thrift/thrift.bzl (0 hunks)

💤 Files with no reviewable changes (1)

tools/build_rules/thrift/thrift.bzl

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/test/test_utils.py

189-189: Redefinition of unused test_get_related_table_names_for_group_bys from line 177

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: other_spark_tests
GitHub Check: no_spark_scala_tests
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (7)

api/py/tox.ini (2)

8-8: Update allowlist_externals: Added mkdir and cp to support new commands.

13-20: New compile commands: Two compile.py calls with distinct --input_path values and subsequent directory creation and copy. Ensure the execution order and output paths align with expectations.

api/py/test/test_validator.py (4)

26-30: LGTM: Improved path handling

Dynamic path generation makes tests more portable.

36-39: LGTM: Enhanced readability

Clean formatting of list comprehensions.

Also applies to: 44-51, 56-64

74-76: LGTM: Better function signature formatting

Improved readability by splitting long lines.

Also applies to: 83-85, 93-95, 102-104

138-140: LGTM: Clean import formatting

Better readability with properly split import statements.

Also applies to: 161-163

api/py/test/test_utils.py (1)

99-337: Test coverage looks comprehensive!

Good job on covering various scenarios for group_bys, joins, and table name utilities.

🧰 Tools

🪛 Ruff (0.8.2)

189-189: Redefinition of unused test_get_related_table_names_for_group_bys from line 177

(F811)

coderabbitai · 2025-02-04T18:52:04Z

api/py/test/test_utils.py

 def test_get_related_table_names_for_group_bys(
-        group_by_requiring_backfill,
-        online_group_by_requiring_streaming
+    group_by_requiring_backfill, online_group_by_requiring_streaming
 ):
-    with open('test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1') as conf_file:
+    with dopen(
+        "sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1"
+    ) as conf_file:
        json = conf_file.read()
        group_by = json2thrift(json, api.GroupBy)
        tables = utils.get_related_table_names(group_by)
        assert any(table.endswith("_upload") for table in tables)




⚠️ Potential issue

Fix duplicate test function name.

Two functions with the same name will cause the first one to be ignored.

-def test_get_related_table_names_for_group_bys( +def test_get_related_table_names_for_group_bys_with_streaming( group_by_requiring_backfill, online_group_by_requiring_streaming ):

Also applies to: 189-205

@nikhil-zlai

[Ticket](https://app.asana.com/0/0/1209184028451140) This PR includes both frontend and backend updates. Since @nikhil-zlai and I reviewed the high-level changes today, it might be helpful for @sean-zlai and me to go over them during our sync tomorrow. There are some key points I'd like to highlight, but I wanted to get this up early for review. **Reminder:** @sean-zlai will handle merging the new drift/summary functionality into the observability frontend. - Added new endpoints for Conf, Confs, search, drift, summary, etc., currently called on the `/thrift` route for demonstration. The old endpoints can be removed once we wire in the new ones. - Created `LogicalNodeTable.svelte` to display all `ConfTypes`, including empty states. - Updated search functionality to work for all `ConfTypes`, though only `risk.user_transactions.txn_join` is enabled for now. - Implemented new endpoints using `RouteHandlerWrapper`. - Removed old endpoints (except `/timeseries`, which is still needed for charts). - Separated drift and summary into two distinct endpoints. - Introduced: - `DriftHandler` (handles drift and summaries). - `ConfHandler` (handles Conf, Confs, and Conf search). - Added filtering by column to `DriftStore`. - Improved and added various tests. - Developed a new serializer that integrates well with `RouteHandlerWrapper`. Let me know if you have any questions! - [x] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  - **New Features** - Added configuration management capabilities across joins, models, group bys, and staging queries. - Introduced drift analysis and summary tracking for configurations. - Enhanced API endpoints for retrieving and searching configurations. - New `LogicalNodeTable` component for displaying configuration data. - Added new structures and enumerations for handling configuration requests in the API. - **Improvements** - Refactored API response handling to use JSON serialization. - Improved null value handling in data processing with predefined constants. - Updated type definitions for better type safety. - Streamlined configuration retrieval and search mechanisms. - Enhanced handling of search results by categorizing entity types. - **Bug Fixes** - Resolved issues with null value representation in long and double iterators. - Improved error handling in API request processing. - **Breaking Changes** - Removed previous model and join metadata endpoints. - Updated API method signatures for configuration retrieval. - Eliminated deprecated response types related to joins and models.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (4)

api/py/ai/chronon/cli/compile/compiler.py (1)
74-76: ⚠️ Potential issue

Fix success ratio calculation.

The success ratio uses the same value for numerator and denominator.

Apply this diff to fix the calculation:
-                f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}"
+                f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"
api/py/ai/chronon/cli/git_utils.py (2)
33-34: 🛠️ Refactor suggestion

Use specific exception type.

The bare except clause could hide important errors.

Apply this diff to use a more specific exception type:
-        except:
+        except OSError:
            pass
🧰 Tools

🪛 Ruff (0.8.2)

33-33: Do not use bare except

(E722)

73-74: 🛠️ Refactor suggestion

Use specific exception type.

The bare except clause could hide important errors.

Apply this diff to use a more specific exception type:
-    except:
+    except OSError:
        return None
🧰 Tools

🪛 Ruff (0.8.2)

73-73: Do not use bare except

(E722)
.github/image/Dockerfile (1)

1-1: 🛠️ Refactor suggestion

Pin Base Image Version:
Use a fixed tag (e.g., ubuntu:22.04) instead of latest for reproducibility.

🧹 Nitpick comments (14)

spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)

41-68: Consider removing commented code.

Since the functionality is moved to interactive.py, consider removing the commented test class instead of keeping it as reference.
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import.

The error import is never used.
-from logging import error
🧰 Tools

🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Rename unused variable.

sub_dirs is unused. Use _sub_dirs to clarify intent.
-for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools

🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)
api/py/ai/chronon/cli/compile/parse_configs.py (1)
5-5: Remove unused imports.

Dict and Tuple are unused.
-from typing import Any, Dict, List, Tuple
+from typing import Any, List
🧰 Tools

🪛 Ruff (0.8.2)

5-5: typing.Dict imported but unused

Remove unused import

(F401)

5-5: typing.Tuple imported but unused

Remove unused import

(F401)
api/py/ai/chronon/cli/entrypoint.py (1)

46-51: Consider more explicit scope options.

Replace scope with run_upstream and run_downstream flags for clarity.
api/py/test/test_git_utils.py (1)
5-5: Remove unused import.
-from pathlib import Path
🧰 Tools

🪛 Ruff (0.8.2)

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)
api/py/ai/chronon/cli/compile/compiler.py (1)
14-14: Remove unused import.

The time module is imported but not used.

Apply this diff to remove the unused import:
-import time
🧰 Tools

🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)
api/py/ai/chronon/cli/compile/serializer.py (1)
78-81: Combine if branches for integer types.

The branches for I64, I32, I16, and BYTE can be combined using logical OR operator.

Apply this diff to combine the branches:
-        elif ttype == TType.I64:
-            ret = int(val)
-        elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE:
-            ret = int(val)
+        elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE):
+            ret = int(val)
🧰 Tools

🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)
api/thrift/api.thrift (1)

237-238: Consider removing the deprecated field.

Since modeToEnvMap is deprecated and env is its replacement, consider removing the old field.

Also applies to: 254-254
api/py/ai/chronon/cli/compile/conf_validator.py (3)
18-20: Remove unused import.

The os module is imported but never used.
-import os
🧰 Tools

🪛 Ruff (0.8.2)

20-20: os imported but unused

Remove unused import: os

(F401)

53-58: Simplify boolean return.

Return the condition directly.
-def _is_batch_upload_needed(group_by: GroupBy) -> bool:
-    if group_by.metaData.online or group_by.backfillStartDate:
-        return True
-    else:
-        return False
+def _is_batch_upload_needed(group_by: GroupBy) -> bool:
+    return group_by.metaData.online or group_by.backfillStartDate
🧰 Tools

🪛 Ruff (0.8.2)

54-57: Return the condition directly

Inline condition

(SIM103)

356-358: Simplify nested if statements.

Combine nested if statements into a single condition.
-            if wild_card_derivation_included:
-                if derivation.expression in derived_columns:
+            if wild_card_derivation_included and derivation.expression in derived_columns:
-        if join.metaData.online:
-            if offline_included_group_bys:
+        if join.metaData.online and offline_included_group_bys:
Also applies to: 421-423

🧰 Tools

🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)
.github/workflows/test_python.yaml (1)
37-38: Remove trailing spaces.

Line 38 contains trailing spaces.
-        run: |          
+        run: |
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)
.github/image/Dockerfile (1)

59-59: Pip Package Management:
Consider pinning package versions to ensure stability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4dbe3ed and 3e0c5f0.

📒 Files selected for processing (36)

.bazelrc (1 hunks)
.github/image/Dockerfile (3 hunks)
.github/workflows/test_python.yaml (1 hunks)
.github/workflows/test_scala_no_spark.yaml (1 hunks)
.github/workflows/test_scala_spark.yaml (1 hunks)
api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
api/py/ai/chronon/cli/compile/conf_validator.py (1 hunks)
api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
api/py/ai/chronon/cli/compile/serializer.py (1 hunks)
api/py/ai/chronon/cli/entrypoint.py (1 hunks)
api/py/ai/chronon/cli/git_utils.py (1 hunks)
api/py/ai/chronon/cli/logger.py (1 hunks)
api/py/ai/chronon/cli/logging.py (1 hunks)
api/py/ai/chronon/group_by.py (0 hunks)
api/py/ai/chronon/join.py (0 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1 hunks)
api/py/test/sample/teams.py (1 hunks)
api/py/test/test_git_utils.py (1 hunks)
api/py/test/test_utils.py (10 hunks)
api/py/test/test_validator.py (3 hunks)
api/py/tox.ini (1 hunks)
api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
api/thrift/api.thrift (4 hunks)
api/thrift/common.thrift (1 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (8 hunks)
hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (9 hunks)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (1 hunks)
spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (1 hunks)
tools/build_rules/thrift/thrift.bzl (0 hunks)

💤 Files with no reviewable changes (3)

tools/build_rules/thrift/thrift.bzl
api/py/ai/chronon/join.py
api/py/ai/chronon/group_by.py

🚧 Files skipped from review as they are similar to previous changes (16)

spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala
api/thrift/common.thrift
hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala
.bazelrc
api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
.github/workflows/test_scala_no_spark.yaml
api/py/tox.ini
spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala
hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala
api/py/test/test_utils.py
api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
api/py/ai/chronon/cli/logging.py
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
api/py/ai/chronon/cli/compile/fill_templates.py
api/py/test/test_validator.py

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/test/sample/teams.py

8-8: Undefined name EnvironmentVariables

(F821)

45-45: Undefined name EnvironmentVariables

(F821)

66-66: Undefined name EnvironmentVariables

(F821)

api/py/ai/chronon/cli/compile/conf_validator.py

20-20: os imported but unused

Remove unused import: os

(F401)

54-57: Return the condition directly

Inline condition

(SIM103)

356-357: Use a single if statement instead of nested if statements

(SIM102)

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/compiler.py

14-14: time imported but unused

Remove unused import: time

(F401)

36-36: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/ai/chronon/cli/compile/parse_configs.py

5-5: typing.Dict imported but unused

Remove unused import

(F401)

5-5: typing.Tuple imported but unused

Remove unused import

(F401)

api/py/ai/chronon/cli/compile/serializer.py

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/ai/chronon/cli/entrypoint.py

15-15: Do not use bare except

(E722)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

33-33: Do not use bare except

(E722)

73-73: Do not use bare except

(E722)

api/py/test/test_git_utils.py

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

🪛 YAMLlint (1.35.1)

.github/workflows/test_python.yaml

[error] 38-38: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: no_spark_scala_tests
GitHub Check: other_spark_tests

🔇 Additional comments (22)

spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)

26-37: Verify if LocalDataLoaderTest object is still needed.

The object contains cleanup code but might be unused now that tests are moved.

✅ Verification successful

LocalDataLoaderTest appears to be isolated to its own test file.

Only self-references exist (including a commented-out test class).

No other Scala files reference it.

It seems safe to remove it if its cleanup code is no longer required.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for references to LocalDataLoaderTest object
rg "LocalDataLoaderTest\." --type scala
Length of output: 41

Script:
#!/bin/bash
# Search all Scala files for any mention of LocalDataLoaderTest.
rg "LocalDataLoaderTest" -g "*.scala"
Length of output: 437
api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: Fix reversed condition.

This logic skips non-hidden files and processes hidden ones, contradicting the comment.
-if not f.startswith("."):  # ignore hidden files - such as .DS_Store
-    continue
+if f.startswith("."):
+    continue
api/py/ai/chronon/cli/logger.py (1)

58-58: Specify exception type.

Raise statement should specify an exception type as previously noted.
-raise
+raise ValueError(message)
api/py/ai/chronon/cli/entrypoint.py (1)

8-16: ⚠️ Potential issue

Specify exception type.
 def get_current_branch():
     try:
         return (
             subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
             .decode("utf-8")
             .strip()
         )
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

15-15: Do not use bare except

(E722)
api/py/test/sample/teams.py (1)

1-1: ⚠️ Potential issue

Add missing import.
-from ai.chronon.api.ttypes import Team
+from ai.chronon.api.ttypes import Team, EnvironmentVariables
Likely invalid or redundant comment.
api/py/ai/chronon/cli/compile/parse_teams.py (1)

21-21: ⚠️ Potential issue

Use absolute import path.
-    team_module = importlib.import_module("teams")
+    team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")
Likely invalid or redundant comment.
api/py/ai/chronon/repo/compile.py (1)

241-275: LGTM!

The enhanced validation and error handling logic is well-implemented with proper error messages and warnings.

api/thrift/api.thrift (2)

441-446: LGTM: Well-structured environment variables.

The EnvironmentVariables struct provides a clear organization of environment variables across different contexts.

448-457: LGTM: Team metadata structure.

The Team struct effectively encapsulates team-related information.

api/py/ai/chronon/cli/compile/conf_validator.py (3)

334-378: LGTM: Well-structured derivation validation.

The derivation validation logic is comprehensive and handles all edge cases.

🧰 Tools

🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)

379-440: LGTM: Thorough join validation.

The join validation covers all necessary checks including production status and online/offline compatibility.

🧰 Tools

🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)

441-515: LGTM: Comprehensive group by validation.

The group by validation effectively handles all validation scenarios.

.github/workflows/test_scala_spark.yaml (2)

38-41: LGTM: Secure credential handling.

Properly decodes and stores Bazel cache credentials.

42-47: LGTM: Efficient test configuration.

Uses Bazel's remote caching for improved test performance.

.github/workflows/test_python.yaml (2)

49-49: LGTM: Updated linting configuration.

Appropriate exclusion of cli directory from flake8 checks.

37-37: LGTM: Explicit shell specification.

Clear specification of bash shell for command execution.

Also applies to: 43-43, 52-52

.github/image/Dockerfile (6)

6-6: Dependencies Check:
Ensure python3.12-venv and openjdk-17-jdk are required.

17-19: Bazel Setup:
Bazelisk binary is set correctly; verify that version v1.18.0 meets current needs.

23-56: Thrift Installation & Cleanup:
The build, install, and purge flow is solid. Confirm purged packages aren’t needed later.

61-61: Tool Installation:
Installing build-essential and git is clear and straightforward.

62-62: Java Symlink:
Symlink creation is concise; confirm that the target directory exists.

65-71: Installation Verification:
All key tools (Java, Thrift, Python, Pip, Bazel, Git) are verified.

coderabbitai · 2025-02-04T19:04:14Z

spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala

+// Not needed since we are doing this via interactive.py



💡 Codebase verification

Interactive.py misses equivalent test coverage.

No test functions were found in api/py/ai/chronon/repo/interactive.py.

No references to LocalDataLoader appear there.

It looks like the transition from Scala tests to Python testing isn’t fully complete. Please add tests in interactive.py or elsewhere to cover this functionality.

🔗 Analysis chain

Verify test coverage in interactive.py.

Please confirm that equivalent test coverage exists in interactive.py.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for corresponding test coverage in interactive.py rg -l "test_table_1_data" | grep "interactive.py"

Length of output: 47

Script:

#!/bin/bash # List any interactive.py files in the repository. echo "=== Listing interactive.py files ===" fd interactive.py # For each interactive.py file found, display any test functions and references to LocalDataLoader. echo "=== Checking interactive.py for test coverage and LocalDataLoader references ===" for file in $(fd interactive.py); do echo "File: $file" echo "-- Test functions --" grep -H "def test_" "$file" || echo "No test functions found in $file" echo "-- LocalDataLoader references --" grep -H "LocalDataLoader" "$file" || echo "No LocalDataLoader references found in $file" done

Length of output: 1016

coderabbitai · 2025-02-04T19:04:14Z

api/py/test/sample/teams.py

+sample_team = Team(
+    namespace="test",
+    env=EnvironmentVariables(
+        common={
+            "GCP_BIGTABLE_INSTANCE_ID": "test-instance"  # example, custom bigtable instance
+        },
+        backfill={  # smaller jobs for testing
+            "EXECUTOR_CORES": "2",
+            "DRIVER_MEMORY": "15G",
+            "EXECUTOR_MEMORY": "4G",
+            "PARALLELISM": "4",
+            "MAX_EXECUTORS": "4",
+        },
+        upload={
+            "PARALLELISM": "2",
+            "MAX_EXECUTORS": "4",
+        },
+    ),
+)


🛠️ Refactor suggestion

Remove duplicate team configuration.

sample_team is identical to test team. Remove the duplicate.

🧰 Tools

🪛 Ruff (0.8.2)

66-66: Undefined name EnvironmentVariables

(F821)

coderabbitai · 2025-02-04T19:04:15Z

api/py/ai/chronon/cli/compile/parse_teams.py

+    metadata = obj.metaData
+


🛠️ Refactor suggestion

Standardize metadata access.

- metadata = obj.metaData - - require(obj.metaData is not None, "Cannot update empty metadata") + require(obj.metaData is not None, "Cannot update empty metadata") + metadata = obj.metaData

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

metadata = obj.metaData

require(obj.metaData is not None, "Cannot update empty metadata")

metadata = obj.metaData

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/test_python.yaml (1)

41-43: Remove trailing whitespace.
Extra whitespace in the run block header (line 43) should be removed.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 43-43: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 3e0c5f0 and 82704f3.

📒 Files selected for processing (1)

.github/workflows/test_python.yaml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

.github/workflows/test_python.yaml

[error] 43-43: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: scala_compile_fmt_fix
GitHub Check: no_spark_scala_tests
GitHub Check: other_spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (3)

.github/workflows/test_python.yaml (3)

36-40: Git config step approved.
Git user configuration enhances commit traceability.

47-55: Lint step updated.
Using bash and excluding the 'cli' directory is intentional and consistent.

57-67: Tests step updated.
The bash shell setting ensures a consistent test environment.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

project/ThriftGen.scala (1)

32-37: Use UTF-8 for writing and add proper resource management.

-    val writer = new java.io.PrintWriter(file)
-    try {
-      writer.write(newContent)
-    } finally {
-      writer.close()
-    }
+    val writer = new java.io.PrintWriter(file, "UTF-8")
+    try {
+      writer.write(newContent)
+    } finally {
+      writer.close()
+    }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between b3a30f0 and 845949f.

📒 Files selected for processing (2)

.github/workflows/test_scala_fmt.yaml (1 hunks)
project/ThriftGen.scala (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/test_scala_fmt.yaml

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: no_spark_scala_tests
GitHub Check: other_spark_tests

coderabbitai · 2025-02-04T20:36:38Z

project/ThriftGen.scala

+    val source = scala.io.Source.fromFile(file, "UTF-8")
    val content = source.mkString
    source.close()


🛠️ Refactor suggestion

Good addition of UTF-8 encoding, but needs consistent application.

While specifying UTF-8 for reading is good, the writer should also use UTF-8.

- val source = scala.io.Source.fromFile(file, "UTF-8") - val content = source.mkString - source.close() + var source: scala.io.Source = null + try { + source = scala.io.Source.fromFile(file, "UTF-8") + content = source.mkString + } finally { + if (source != null) source.close() + }

Committable suggestion skipped: line range outside the PR's diff.

## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability.  --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>

## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Cheour clientslist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, baour clientsfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability.  --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>

coderabbitai bot reviewed Jan 27, 2025

View reviewed changes

varant-zlai reviewed Jan 27, 2025

View reviewed changes

tchow-zlai reviewed Jan 27, 2025

View reviewed changes

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

nikhil-zlai force-pushed the cli_v2 branch from abdd451 to dc233e9 Compare February 4, 2025 04:50

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

kumar-zlai approved these changes Feb 4, 2025

View reviewed changes

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

nikhil-zlai and others added 13 commits February 4, 2025 10:56

cli v2

ba7f7fe

changes

85a1302

more changes

fb79a9f

fixes

30735ee

changes so far

038c12e

cli v2

e81d1b7

save

42165bf

thrift + bazel + ci docker changes

48d8438

bazelification of non-spark tests in ci

9444a2f

docker back to aarch64

583544f

docker back to amd64

1d2e443

spark tests in ci

3b40973

nikhil-zlai and others added 7 commits February 4, 2025 10:56

validator in the right place

2f7b3b1

revert validator

f8d10aa

conf validator renaming

e92a9fc

pytest fixes

ef6243b

Added test option to output error logs for failed tests

e671abc

test_passing

b23292e

git install

9b01574

nikhil-zlai force-pushed the cli_v2 branch from e2f9127 to 9b01574 Compare February 4, 2025 18:56

docker on ubuntu-latest

3e0c5f0

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

Specify git user name and email for the python tests

82704f3

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

kumar-zlai added 2 commits February 4, 2025 12:03

To add sbt clean step before doing format checks

b3a30f0

Add UTF-8 encoding setting for thrift gen needed for CI

845949f

coderabbitai bot reviewed Feb 4, 2025

View reviewed changes

Remove locale gen

3858d3d

kumar-zlai merged commit 38f36f8 into main Feb 4, 2025
8 checks passed

kumar-zlai deleted the cli_v2 branch February 4, 2025 20:48

coderabbitai bot mentioned this pull request Mar 3, 2025

Cli improvements #462

Merged

4 tasks

coderabbitai bot mentioned this pull request Mar 12, 2025

stub/part_1: physical graph for workflow submission + column lineage #493

Merged

4 tasks

This was referenced Apr 3, 2025

Use teams.py instead of teams.json for compile #541

Merged

Use execution spark configs from compiled confs in JobSubmitter #549

Merged

fix: compiler fixes #619

Merged

coderabbitai bot mentioned this pull request May 15, 2025

Clean history #779

Closed

4 tasks

coderabbitai bot mentioned this pull request Jun 13, 2025

Vz/add short feature name flag #863

Merged

4 tasks

	from ai.chronon.api.ttypes import Team
	from ai.chronon.api.ttypes import Team, EnvironmentVariables

		val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
		val resultJson = gson.toJson(result.head)

		executionContextOverride: ExecutionContext = null)
		extends MetadataStore(kvStore, metaDataSet, timeoutMillis, executionContextOverride)

		implicit val executionContext: ExecutionContext =
		Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)

		RUN curl -fsSL "https://github.com/bazelbuild/bazelisk/releases/download/v1.18.0/bazelisk-linux-aarch64" -o /usr/local/bin/bazel
		ENV PATH="/usr/local/bin/bazel:${PATH}"

		RUN chmod 755 /usr/local/bin/bazel

		RUN apt update && apt install -y build-essential

	team_module = importlib.import_module("teams")
	team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")

		if not f.startswith("."): # ignore hidden files - such as .DS_Store
		continue

		RUN apt update && apt install -y build-essential git
		RUN mkdir -p /usr/lib/jvm && ln -s /usr/lib/jvm/java-17-openjdk-amd64/ /usr/lib/jvm/default-jvm

	metadata = obj.metaData
	require(obj.metaData is not None, "Cannot update empty metadata")
	metadata = obj.metaData



		@cli.command()
		@click.argument("conf")

cli v2 #277

cli v2 #277

Uh oh!

Conversation

nikhil-zlai commented Jan 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

varant-zlai Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

varant-zlai Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

varant-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

tchow-zlai Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

nikhil-zlai Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2025

nikhil-zlai commented Jan 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)