Skip to content

Conversation

@nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented Jan 27, 2025

Summary

Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification

  • always compile the whole repo
  • teams thrift and teams python
  • compile context as its own object
  • integration test on sample
  • progress bar for compile
  • sync to remote w/ progress bar
  • display workflow
  • display workflow progress

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New CLI Commands

    • Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management.
  • Enhanced Logging

    • Improved colored, structured log outputs for clearer real-time monitoring and debugging.
  • Configuration & Validation Upgrades

    • Strengthened configuration management and validation processes to ensure reliable operations.
    • Added a comprehensive validation framework for Chronon API thrift objects.
  • Build & Infrastructure Improvements

    • Transitioned to a new container base and modernized testing/build systems for better performance and stability.
  • Team & Utility Enhancements

    • Expanded team configuration options and refined utility processes to streamline overall workflows.
    • Introduced new data classes and methods for improved configuration and compilation context management.
    • Enhanced templating functionality for dynamic replacements based on object properties.
    • Improved handling of Git operations and enhanced error logging for better traceability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2025

Walkthrough

This update enhances the system’s configuration and compilation pipeline. New data classes and methods have been added for managing compile contexts, templating, validation, and serialization. The CLI now features commands for sync, backfill, and deploy, while Git utilities have been expanded for branch and change management. Additional improvements include colored logging, upgraded Thrift structures and enums, updated Docker and build configurations, and various test and formatting adjustments.

Changes

File(s) Change Summary
api/py/ai/chronon/cli/compile/compile_context.py, compiler.py, fill_templates.py, parse_configs.py, parse_teams.py, serializer.py, conf_validator.py Added new data classes, methods, and functions to manage configuration details, compilation context, templating, parsing, serialization, and validation logic.
api/py/ai/chronon/cli/entrypoint.py, git_utils.py Introduced CLI commands (sync, backfill, deploy) and Git utility functions for branch retrieval and change detection.
api/py/ai/chronon/group_by.py, join.py Removed method attachments (backfill, deploy, info/upload) from the group_by and join objects.
api/py/ai/chronon/cli/logger.py, logging.py Added color-coded logging classes and utility functions for enhanced, visually distinct log output.
api/thrift/api.thrift, api/thrift/common.thrift Added new Thrift structures and enum (ConfigType), and updated MetaData with an environment variables field.
spark/.../SparkSessionBuilder.scala, various Spark and Hub test files Updated log formatting in Spark builder; commented out or reformatted tests to improve clarity and align with Bazel/SBT transitions.
.github/image/Dockerfile, .github/workflows/*, .bazelrc, tools/build_rules/thrift/thrift.bzl Transitioned Docker base from Alpine to Ubuntu, switched build/test commands from SBT to Bazel, and adjusted build rule arguments and workflow configurations.
api/py/test/sample/teams.py, api/py/test/sample/production/joins/... Added new team configurations and sample JSON configuration for join operations.
api/py/test/test_git_utils.py, api/py/test/test_utils.py, api/py/test/test_validator.py Introduced new test utilities and fixtures while reformatting existing tests for better readability and consistent file-path handling.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant E as CLI Entrypoint
    participant G as GitUtils
    participant C as CompileContext
    participant P as Compiler
    participant L as Logger

    U->>E: Run CLI command (sync/backfill/deploy)
    E->>G: Get current branch
    G-->>E: Return branch name
    E->>C: Initialize compile context
    E->>P: Call compiler.main(...)
    P-->>E: Return compile results
    E->>L: Log results
    E-->>U: Output final result
Loading

Possibly related PRs

Suggested reviewers

  • varant-zlai
  • piyush-zlai

Poem

In code we trust with vibrant hue,
Configs and logs now shine anew.
Commands and contexts take the stage,
Compiling dreams on a digital page.
Git and Thrift join this merry tune,
A merge of progress under a bright full moon.
Cheers to change in every line!

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 845949f and 3858d3d.

📒 Files selected for processing (1)
  • .github/workflows/test_scala_fmt.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test_scala_fmt.yaml
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (15)
api/py/ai/chronon/cli/git_utils.py (1)

1-4: Remove unused import.
Set is never used.

-from typing import List, Optional, Set
+from typing import List, Optional
🧰 Tools
🪛 Ruff (0.8.2)

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

api/py/ai/chronon/cli/logging.py (1)

17-30: Add error handling for timestamp creation.

 def format(self, record):
-    time_str = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
+    try:
+        time_str = datetime.fromtimestamp(record.created).strftime("%H:%M:%S")
+    except (ValueError, TypeError, OSError) as e:
+        time_str = "??:??:??"
+        logger = logging.getLogger(__name__)
+        logger.error(f"Failed to format timestamp: {e}")
api/py/ai/chronon/cli/compile/compiler.py (1)

48-66: Add progress bar for better UX.

+        with tqdm(total=len(obj_dict), desc=f"Processing {config_info.cls}") as pbar:
             for name, obj in obj_dict.items():
                 errors = _write_config_or_return_errors(
                     name, obj, config_info, compile_context
                 )
                 if errors:
                     compile_result.error_dict[name] = errors
                 else:
                     compile_result.obj_dict[name] = obj
+                pbar.update(1)
api/py/ai/chronon/cli/compile/parse_teams.py (1)

11-28: Add error handling for module import failures.

Consider wrapping the module import in a try-except block to provide more informative error messages.

🧰 Tools
🪛 Ruff (0.8.2)

17-17: Undefined name importlib

(F821)

api/py/test/test_git_utils.py (1)

1-11: Remove unused import.

 import pytest
 import os
 import shutil
 import subprocess
-from pathlib import Path
 from ai.chronon.cli.git_utils import (
     get_current_branch,
     get_changes_since_commit,
     get_changes_since_fork,
 )
🧰 Tools
🪛 Ruff (0.8.2)

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

api/py/ai/chronon/cli/compile/serializer.py (5)

41-86: Consider adding type hints and docstrings.

The _convert method handles type conversion comprehensively but lacks documentation.

Add type hints and docstrings:

-    def _convert(self, val, ttype, ttype_info):
+    def _convert(self, val: Any, ttype: int, ttype_info: Tuple) -> Any:
+        """Convert JSON value to Thrift type.
+        
+        Args:
+            val: Value to convert
+            ttype: Thrift type constant
+            ttype_info: Type information tuple
+            
+        Returns:
+            Converted value
+            
+        Raises:
+            TypeError: If type is not recognized
+        """
🧰 Tools
🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)


93-100: Add error handling for Thrift serialization.

The function should catch potential Thrift serialization errors.

 def json2binary(json_str, thrift_class):
-    thrift = json2thrift(json_str, thrift_class)
-    transport = TMemoryBuffer()
-    protocol = TBinaryProtocolAccelerated(transport)
-    thrift.write(protocol)
-    return transport.getvalue()
+    try:
+        thrift = json2thrift(json_str, thrift_class)
+        transport = TMemoryBuffer()
+        protocol = TBinaryProtocolAccelerated(transport)
+        thrift.write(protocol)
+        return transport.getvalue()
+    except Exception as e:
+        raise Exception(f"Failed to serialize {thrift_class.__name__} to binary: {str(e)}") from e

113-118: Add error handling for JSON serialization.

The function should catch potential serialization errors.

 def thrift_simple_json(obj):
-    simple = TSerialization.serialize(obj, protocol_factory=TSimpleJSONProtocolFactory())
-    parsed = json.loads(simple)
-    return json.dumps(parsed, indent=2)
+    try:
+        simple = TSerialization.serialize(obj, protocol_factory=TSimpleJSONProtocolFactory())
+        parsed = json.loads(simple)
+        return json.dumps(parsed, indent=2)
+    except Exception as e:
+        raise Exception(f"Failed to serialize {type(obj).__name__} to JSON: {str(e)}") from e

17-20: Remove unused import.

TJSONProtocolFactory is not used in the code.

 from thrift.protocol.TJSONProtocol import (
     TSimpleJSONProtocolFactory,
-    TJSONProtocolFactory,
 )
🧰 Tools
🪛 Ruff (0.8.2)

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)


78-82: Combine if branches for integer types.

Simplify the code by combining the integer type checks.

-        elif ttype == TType.I64:
-            ret = int(val)
-        elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE:
-            ret = int(val)
+        elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE):
+            ret = int(val)
🧰 Tools
🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/ai/chronon/cli/compile/validator.py (5)

35-36: Remove unused imports.

-from ai.chronon.repo import JOIN_FOLDER_NAME, GROUP_BY_FOLDER_NAME
-from ai.chronon.repo.serializer import thrift_simple_json, file2thrift
+from ai.chronon.repo.serializer import thrift_simple_json
🧰 Tools
🪛 Ruff (0.8.2)

35-35: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)


35-35: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)


36-36: ai.chronon.repo.serializer.file2thrift imported but unused

Remove unused import: ai.chronon.repo.serializer.file2thrift

(F401)


184-191: Add docstring to helper function.

The function's purpose and return value should be documented.

 def _source_has_topic(source: Source) -> bool:
+    """
+    Check if the source has a topic defined.
+    
+    Args:
+        source: The source object to check
+    Returns:
+        bool: True if the source has a topic, False otherwise
+    """
     if source.events:

198-209: Consider early return for better readability.

The function can be simplified.

 def _group_by_has_hourly_windows(groupBy: GroupBy) -> bool:
     aggs: List[Aggregation] = groupBy.aggregations
 
     if not aggs:
         return False
 
     for agg in aggs:
         for window in agg.windows:
             if window.timeUnit == common.TimeUnit.HOUR:
                 return True
-
-    return False
+    return False

410-416: Simplify nested if statements.

Combine the conditions for better readability.

-        if join.metaData.online:
-            if offline_included_group_bys:
-                errors.append(
-                    "join {} is online but includes the following offline group_bys: {}".format(
-                        join.metaData.name, ", ".join(offline_included_group_bys)
-                    )
-                )
+        if join.metaData.online and offline_included_group_bys:
+            errors.append(
+                "join {} is online but includes the following offline group_bys: {}".format(
+                    join.metaData.name, ", ".join(offline_included_group_bys)
+                )
+            )
🧰 Tools
🪛 Ruff (0.8.2)

410-411: Use a single if statement instead of nested if statements

(SIM102)


455-459: Improve error message clarity.

The error message could be more specific about the incompatibility.

-                f"group_by {group_by.metaData.name} is defined to be daily refreshed but contains hourly windows. "
+                f"group_by {group_by.metaData.name} is defined as a batch (daily) feature but contains hourly windows, which is incompatible."
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c0f1645 and 2c28746.

📒 Files selected for processing (16)
  • api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
  • api/py/ai/chronon/cli/compile/serializer.py (1 hunks)
  • api/py/ai/chronon/cli/compile/validator.py (12 hunks)
  • api/py/ai/chronon/cli/entrypoint.py (1 hunks)
  • api/py/ai/chronon/cli/git_utils.py (1 hunks)
  • api/py/ai/chronon/cli/logging.py (1 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/test/sample/teams.py (1 hunks)
  • api/py/test/test_git_utils.py (1 hunks)
  • api/thrift/api.thrift (4 hunks)
  • api/thrift/common.thrift (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/parse_configs.py

3-3: logging imported but unused

Remove unused import: logging

(F401)


8-8: ai.chronon.cli.compile.constants.ConfigInfo imported but unused

Remove unused import: ai.chronon.cli.compile.constants.ConfigInfo

(F401)


12-12: Undefined name log_level

(F821)

api/py/ai/chronon/cli/compile/compiler.py

8-8: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)


9-9: ai.chronon.cli.compile.constants imported but unused

Remove unused import: ai.chronon.cli.compile.constants

(F401)


13-13: ai.chronon.api.ttypes.Team imported but unused

Remove unused import: ai.chronon.api.ttypes.Team

(F401)


15-15: ai.chronon.repo.validator.ChrononRepoValidator imported but unused

Remove unused import: ai.chronon.repo.validator.ChrononRepoValidator

(F401)


16-16: ai.chronon.api.ttypes.GroupBy imported but unused

Remove unused import

(F401)


16-16: ai.chronon.api.ttypes.Join imported but unused

Remove unused import

(F401)


16-16: ai.chronon.api.ttypes.StagingQuery imported but unused

Remove unused import

(F401)


16-16: ai.chronon.api.ttypes.Model imported but unused

Remove unused import

(F401)


22-22: Undefined name dataclass

(F821)


24-24: Undefined name ConfigInfo

(F821)


70-70: Undefined name ConfigInfo

(F821)

api/py/test/sample/teams.py

8-8: Undefined name EnvironmentVariables

(F821)


45-45: Undefined name EnvironmentVariables

(F821)


66-66: Undefined name EnvironmentVariables

(F821)

api/py/ai/chronon/cli/compile/serializer.py

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)


78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/test/test_git_utils.py

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

api/py/ai/chronon/cli/compile/compile_context.py

19-19: Undefined name Team

(F821)


19-19: Undefined name teams

(F821)


21-21: Undefined name ChrononRepoValidator

(F821)


21-21: Undefined name ChrononRepoValidator

(F821)


24-24: Undefined name Join

(F821)


26-26: Undefined name GroupBy

(F821)


30-30: Undefined name StagingQuery

(F821)


34-34: Undefined name Model

(F821)


87-87: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)


94-94: Undefined name file2thrift

(F821)

api/py/ai/chronon/cli/compile/fill_templates.py

8-8: Undefined name utils

(F821)


16-16: Undefined name teams_path

(F821)


18-18: Undefined name Join

(F821)


28-28: Undefined name Join

(F821)


33-33: Undefined name utils

(F821)

api/py/ai/chronon/cli/compile/parse_teams.py

2-2: ai.chronon.api.ttypes.MetaData imported but unused

Remove unused import: ai.chronon.api.ttypes.MetaData

(F401)


4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)


17-17: Undefined name importlib

(F821)


30-30: Undefined name Any

(F821)


32-32: Undefined name metadata

(F821)


47-47: Undefined name team_conf_file

(F821)


49-49: Undefined name metadata

(F821)


51-51: Undefined name metadata

(F821)


54-54: Undefined name metadata

(F821)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)


31-31: Do not use bare except

(E722)


71-71: Do not use bare except

(E722)

api/py/ai/chronon/cli/entrypoint.py

2-2: sys imported but unused

Remove unused import: sys

(F401)


5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)


7-7: ai.chronon.cli.compile.write_files imported but unused

Remove unused import: ai.chronon.cli.compile.write_files

(F401)


16-16: Do not use bare except

(E722)

api/py/ai/chronon/cli/compile/validator.py

35-35: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)


35-35: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)


36-36: ai.chronon.repo.serializer.file2thrift imported but unused

Remove unused import: ai.chronon.repo.serializer.file2thrift

(F401)


213-213: Undefined name CompileContext

(F821)


410-411: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: fetcher_spark_tests
🔇 Additional comments (28)
api/py/ai/chronon/cli/git_utils.py (4)

42-57: LGTM.
Logic for fork point retrieval looks adequate.


59-67: LGTM.
Fetching file content at commit is straightforward.


75-125: Ensure correct error handling.
The fallback for untracked files is good, but confirm you handle partial failures properly.


128-154: Overall logic looks good.
Filtering real_changes is a neat approach.

api/thrift/common.thrift (1)

16-20: Addition of ConfigType looks good.
No issues detected.

api/py/ai/chronon/cli/logging.py (1)

1-14: LGTM! Color codes are well-organized and follow standard ANSI escape sequences.

api/py/ai/chronon/cli/entrypoint.py (4)

20-24: LGTM! Well-documented CLI group.


26-31: Implement sync functionality.

The sync command only echoes the branch name without actual implementation.

Would you like me to help implement the sync functionality?


33-56: Implement backfill functionality.

The backfill command only echoes parameters without actual implementation.

Would you like me to help implement the backfill functionality?


58-67: Implement deploy functionality.

The deploy command only echoes parameters without actual implementation.

Would you like me to help implement the deploy functionality?

api/py/ai/chronon/cli/compile/parse_configs.py (1)

15-41: LGTM! Well-structured function with proper error handling.

api/py/ai/chronon/cli/compile/compiler.py (1)

29-47: LGTM! Clear initialization and logging.

api/py/ai/chronon/cli/compile/parse_teams.py (1)

58-94: LGTM! Well-documented helper functions.

api/py/test/sample/teams.py (2)

6-20: Replace placeholder values in TODOs.

Several configuration values need to be replaced with actual values:

  • Infra team email
  • Serde class and args
  • Customer ID
  • GCP Project ID and resources
🧰 Tools
🪛 Ruff (0.8.2)

8-8: Undefined name EnvironmentVariables

(F821)


84-90: LGTM! Simple team configurations.

api/py/test/test_git_utils.py (1)

13-85: LGTM! Well-structured test with proper cleanup.

The test provides comprehensive coverage of Git utilities with proper setup and teardown.

api/py/ai/chronon/cli/compile/compile_context.py (1)

40-80: LGTM! Well-documented methods with clear examples.

api/py/ai/chronon/cli/compile/serializer.py (3)

27-40: LGTM! Clean implementation of JSON to Thrift decoder.

The class properly handles both dict and JSON string inputs.


89-90: LGTM! Clean wrapper function.


102-110: LGTM! Good error handling.

The error message is descriptive and helpful.

spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1)

53-53: LGTM! Simplified log formatting.

Removed unnecessary ITALIC style while keeping the GREEN color for file:line in logs.

api/py/ai/chronon/repo/compile.py (4)

241-244: LGTM! Good logging of object details.

Clear logging of team and object name.


249-256: LGTM! Proper skip handling.

Good warning messages for skipped objects.


258-264: LGTM! Validation error handling.

Clear error messages for validation failures.


267-275: LGTM! Safe overwrite checks.

Proper handling of force overwrite and online config protection.

api/thrift/api.thrift (3)

Line range hint 236-254: Good deprecation of old env vars structure.

Properly marked old modeToEnvMap as deprecated in favor of new env field.


441-446: LGTM! Clean environment variables structure.

Well-organized structure for different deployment contexts.


448-456: LGTM! Comprehensive team configuration.

Good organization of team metadata and configuration.

Comment on lines +68 to +75
def get_current_file_content(file_path: str) -> Optional[str]:
try:
return Path(file_path).read_text()
except:
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid the bare except: at line 71.
Use a more precise exception type.

-except:
+except OSError:
    return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_current_file_content(file_path: str) -> Optional[str]:
try:
return Path(file_path).read_text()
except:
return None
def get_current_file_content(file_path: str) -> Optional[str]:
try:
return Path(file_path).read_text()
except OSError:
return None
🧰 Tools
🪛 Ruff (0.8.2)

71-71: Do not use bare except

(E722)

Comment on lines 11 to 41
def get_current_branch() -> str:

try:
subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL)

return (
subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
.decode("utf-8")
.strip()
)

except subprocess.CalledProcessError as e:

try:
head_file = Path(".git/HEAD").resolve()
if head_file.exists():
content = head_file.read_text().strip()
if content.startswith("ref: refs/heads/"):
return content.split("/")[-1]

except:
pass

print(
f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}",
file=sys.stderr,
)

raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid the bare except: at line 31.
Specify a more precise exception (e.g. except OSError:).

-except:
+except OSError:
    pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_current_branch() -> str:
try:
subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL)
return (
subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
.decode("utf-8")
.strip()
)
except subprocess.CalledProcessError as e:
try:
head_file = Path(".git/HEAD").resolve()
if head_file.exists():
content = head_file.read_text().strip()
if content.startswith("ref: refs/heads/"):
return content.split("/")[-1]
except:
pass
print(
f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}",
file=sys.stderr,
)
raise
def get_current_branch() -> str:
try:
subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL)
return (
subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
.decode("utf-8")
.strip()
)
except subprocess.CalledProcessError as e:
try:
head_file = Path(".git/HEAD").resolve()
if head_file.exists():
content = head_file.read_text().strip()
if content.startswith("ref: refs/heads/"):
return content.split("/")[-1]
except OSError:
pass
print(
f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}",
file=sys.stderr,
)
raise
🧰 Tools
🪛 Ruff (0.8.2)

31-31: Do not use bare except

(E722)

Comment on lines 1 to 40
import ai.chronon.cli.compile.parse_teams as teams


def _fill_template(table, obj, namespace):

if table:
table = table.replace(
"{{ logged_table }}", utils.log_table_name(obj, full_name=True)
)
table = table.replace("{{ db }}", namespace)

return table


def set_templated_values(obj, cls, team_name):
namespace = teams.get_team_conf(teams_path, team_name, "namespace")

if cls == Join and obj.bootstrapParts:

for bootstrap in obj.bootstrapParts:
bootstrap.table = _fill_template(bootstrap.table, obj, namespace)

if obj.metaData.dependencies:
obj.metaData.dependencies = [
_fill_template(dep, obj, namespace) for dep in obj.metaData.dependencies
]

if cls == Join and obj.labelParts:

obj.labelParts.metaData.dependencies = [
label_dep.replace(
"{{ join_backfill_table }}",
utils.output_table_name(obj, full_name=True),
)
for label_dep in obj.labelParts.metaData.dependencies
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Undefined references for utils, teams_path, Join.
Declare or import them to avoid runtime errors.

+import ai.chronon.cli.compile.utils as utils
+from ai.chronon.cli.compile.join import Join
+teams_path = "path/to/teams"  # or retrieve from config

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

8-8: Undefined name utils

(F821)


16-16: Undefined name teams_path

(F821)


18-18: Undefined name Join

(F821)


28-28: Undefined name Join

(F821)


33-33: Undefined name utils

(F821)

Comment on lines +32 to +41
def get_logger(log_level=logging.INFO):

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(ColorFormatter())

logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(log_level)

return logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Prevent duplicate handlers.

 def get_logger(log_level=logging.INFO):
     handler = logging.StreamHandler(sys.stdout)
     handler.setFormatter(ColorFormatter())
 
     logger = logging.getLogger(__name__)
+    logger.handlers.clear()  # Prevent duplicate handlers
     logger.addHandler(handler)
     logger.setLevel(log_level)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_logger(log_level=logging.INFO):
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(ColorFormatter())
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(log_level)
return logger
def get_logger(log_level=logging.INFO):
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(ColorFormatter())
logger = logging.getLogger(__name__)
logger.handlers.clear() # Prevent duplicate handlers
logger.addHandler(handler)
logger.setLevel(log_level)
return logger

Comment on lines 1 to 7
import click
import sys
from datetime import datetime, timedelta
import subprocess
from enum import Enum

import ai.chronon.cli.compile.write_files as write_files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused imports.

 import click
-import sys
 from datetime import datetime, timedelta
 import subprocess
-from enum import Enum
-
-import ai.chronon.cli.compile.write_files as write_files
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import click
import sys
from datetime import datetime, timedelta
import subprocess
from enum import Enum
import ai.chronon.cli.compile.write_files as write_files
import click
from datetime import datetime, timedelta
import subprocess
🧰 Tools
🪛 Ruff (0.8.2)

2-2: sys imported but unused

Remove unused import: sys

(F401)


5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)


7-7: ai.chronon.cli.compile.write_files imported but unused

Remove unused import: ai.chronon.cli.compile.write_files

(F401)

Comment on lines 30 to 71
def update_metadata(obj: Any, team_dict: Dict[str, Team]):

assert metadata is not None, "Cannot update empty metadata"

name = obj.metadata.name
team = obj.metadata.team

assert (
team is not None
), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error."

assert (
team in team_dict
), f"Team '{team}' not found in teams.py. Please add an entry."

assert (
DEFAULT_CONF_TEAM in team_dict
), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry."

metadata.outputNamespace = team_dict[team].outputNamespace

metadata.env = _merge_environment_variables(
team_dict[DEFAULT_CONF_TEAM].env,
team_dict[team].env,
metadata.env,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined variables and add type hints.

+from typing import Any
+
 def update_metadata(obj: Any, team_dict: Dict[str, Team]):
-    assert metadata is not None, "Cannot update empty metadata"
+    assert obj.metadata is not None, "Cannot update empty metadata"
     # ...
-    ), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry."
+    ), f"Default team '{DEFAULT_CONF_TEAM}' not found in teams.py. Please add an entry."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def update_metadata(obj: Any, team_dict: Dict[str, Team]):
assert metadata is not None, "Cannot update empty metadata"
name = obj.metadata.name
team = obj.metadata.team
assert (
team is not None
), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error."
assert (
team in team_dict
), f"Team '{team}' not found in teams.py. Please add an entry."
assert (
DEFAULT_CONF_TEAM in team_dict
), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry."
metadata.outputNamespace = team_dict[team].outputNamespace
metadata.env = _merge_environment_variables(
team_dict[DEFAULT_CONF_TEAM].env,
team_dict[team].env,
metadata.env,
)
from typing import Any
def update_metadata(obj: Any, team_dict: Dict[str, Team]):
assert obj.metadata is not None, "Cannot update empty metadata"
name = obj.metadata.name
team = obj.metadata.team
assert (
team is not None
), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error."
assert (
team in team_dict
), f"Team '{team}' not found in teams.py. Please add an entry."
assert (
DEFAULT_CONF_TEAM in team_dict
), f"Default team '{DEFAULT_CONF_TEAM}' not found in teams.py. Please add an entry."
metadata.outputNamespace = team_dict[team].outputNamespace
metadata.env = _merge_environment_variables(
team_dict[DEFAULT_CONF_TEAM].env,
team_dict[team].env,
metadata.env,
)
🧰 Tools
🪛 Ruff (0.8.2)

30-30: Undefined name Any

(F821)


32-32: Undefined name metadata

(F821)


47-47: Undefined name team_conf_file

(F821)


49-49: Undefined name metadata

(F821)


51-51: Undefined name metadata

(F821)


54-54: Undefined name metadata

(F821)

@@ -0,0 +1,90 @@
from ai.chronon.api.ttypes import Team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing import.

-from ai.chronon.api.ttypes import Team
+from ai.chronon.api.ttypes import Team, EnvironmentVariables
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from ai.chronon.api.ttypes import Team
from ai.chronon.api.ttypes import Team, EnvironmentVariables

Comment on lines 1 to 16
from dataclasses import dataclass
import os
from typing import Any, Dict, List

from ai.chronon.api.common.ttypes import ConfigType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing imports.

 from dataclasses import dataclass
 import os
 from typing import Any, Dict, List
 from ai.chronon.api.common.ttypes import ConfigType
+from ai.chronon.api.ttypes import Team
+from ai.chronon.cli.compile import teams
+from ai.chronon.cli.validation import ChrononRepoValidator
+from ai.chronon.api.models import Join, GroupBy, StagingQuery, Model
+from ai.chronon.cli.utils import file2thrift
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from dataclasses import dataclass
import os
from typing import Any, Dict, List
from ai.chronon.api.common.ttypes import ConfigType
from dataclasses import dataclass
import os
from typing import Any, Dict, List
from ai.chronon.api.common.ttypes import ConfigType
from ai.chronon.api.ttypes import Team
from ai.chronon.cli.compile import teams
from ai.chronon.cli.validation import ChrononRepoValidator
from ai.chronon.api.models import Join, GroupBy, StagingQuery, Model
from ai.chronon.cli.utils import file2thrift

Comment on lines 81 to 125
def existing_confs(self, obj_class: type) -> Dict[str, object]:

result = {}

output_dir = self.output_dir(obj_class)

for sub_root, sub_dirs, sub_files in os.walk(output_dir):

for f in sub_files:

if not f.startswith("."): # ignore hidden files - such as .DS_Store
continue

obj = file2thrift(os.path.join(sub_root, f), obj_class)

if obj and hasattr(obj, "metaData"):
result[obj.metaData.name] = obj

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix hidden files check and rename unused variable.

-    def existing_confs(self, obj_class: type) -> Dict[str, object]:
+    def existing_confs(self, obj_class: type) -> Dict[str, object]:
         result = {}
         output_dir = self.output_dir(obj_class)
-        for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+        for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
             for f in sub_files:
-                if not f.startswith("."):  # ignore hidden files
+                if f.startswith("."):  # ignore hidden files
                     continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def existing_confs(self, obj_class: type) -> Dict[str, object]:
result = {}
output_dir = self.output_dir(obj_class)
for sub_root, sub_dirs, sub_files in os.walk(output_dir):
for f in sub_files:
if not f.startswith("."): # ignore hidden files - such as .DS_Store
continue
obj = file2thrift(os.path.join(sub_root, f), obj_class)
if obj and hasattr(obj, "metaData"):
result[obj.metaData.name] = obj
return result
def existing_confs(self, obj_class: type) -> Dict[str, object]:
result = {}
output_dir = self.output_dir(obj_class)
for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
for f in sub_files:
if f.startswith("."): # ignore hidden files - such as .DS_Store
continue
obj = file2thrift(os.path.join(sub_root, f), obj_class)
if obj and hasattr(obj, "metaData"):
result[obj.metaData.name] = obj
return result
🧰 Tools
🪛 Ruff (0.8.2)

87-87: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)


94-94: Undefined name file2thrift

(F821)


class ChrononRepoValidator(object):
def __init__(self, chronon_root_path: str, output_root: str, log_level=logging.INFO):
def __init__(self, compile_context: CompileContext, log_level=logging.INFO):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing import for CompileContext.

+from ai.chronon.cli.compile.compile_context import CompileContext

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

213-213: Undefined name CompileContext

(F821)



@cli.command()
@click.argument("conf")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that this points to the python file or the compiled directory? I was thinking python, with an object selector in case you specify a file with multiple objects within it.

Let's keep these arguments consistent with what's in runner.py (

def backfill(self, start_date, end_date, force_recompute=False, plan=False):
)

And in fact, we can even complete the implementation in this file to call those runner functions, which are attached to the base python objects (like here: https://github.com/zipline-ai/chronon/blob/main/api/py/ai/chronon/group_by.py#L610-L612).

So the logic here would be:

  1. Extract object
  2. Call object.backfill(args)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we can add scope to runner.py, although I wonder if scope is the right word? Although it's more verbose, run_upstream=t/f and run_downstream=t/f might be more explicit and clearer.

Copy link
Collaborator

@varant-zlai varant-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a lot of compile logic in my previous PR that I want to make sure that we're still keeping. Since we compile the whole repo when we try to run a given object (ensures compile of the target file works, errors if not, warns if other things are failing, etc...)

"GCP_DATAPROC_CLUSTER_NAME": "canary-2", # TODO: GCP Dataproc Cluster Name
"GCP_BIGTABLE_INSTANCE_ID": "zipline-canary-instance", # TODO: GCP Bigtable Instance ID
},
backfill={ # Backfills tend to be larger than other jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main thing here is I think we'd like to just use spark config props - that removes an extra layer of indirection, and makes it clear to the user what this is for. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am down for that actually

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🔭 Outside diff range comments (3)
online/src/main/scala/ai/chronon/online/TopicChecker.scala (1)

43-80: Close admin client in topic check.
Likewise, call close() after checking topics to release resources.

 def topicShouldExist(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Unit = {
   val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
   try {
     val adminClient = AdminClient.create(props)
     ...
+    adminClient.close()
   } catch {
     case ex: Exception => throw new RuntimeException(s"Failed to check for topic $topic in $bootstrap", ex)
   }
 }
frontend/src/routes/thrift/+page.svelte (1)

1-136: Move API testing code to dedicated test file.

Testing code should not be in production routes. Consider:

  1. Moving to a test file
  2. Using environment variables for IDs
  3. Using relative timestamps
distribution/build_and_upload_gcp_artifacts.sh (1)

24-27: Fix Variable Expansion
Prefix EXPECTED_MINIMUM_MINOR_PYTHON_VERSION and MINOR_PYTHON_VERSION with '$' for proper evaluation.

- if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+ if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
🧹 Nitpick comments (47)
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)

565-611: Consider adding null/empty case test.

Test case looks good but could be more comprehensive.

Add a test case for empty/null listing IDs:

  it should "handle explode invocations in select clauses" in {
    // ... existing test code ...
    
+   val emptyRow: Map[String, Any] = Map(
+     "event_name" -> "backend_cart_payment",
+     "properties" -> makeHashMap("sold_listing_ids" -> ""),
+   )
+   
+   val empty_res = cu.performSql(emptyRow)
+   empty_res.size shouldBe 0
  }
spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1)

52-53: Add explicit return type for UDF.

Improve type safety by declaring return type.

-  private def edit_distance: UserDefinedFunction =
-    functions.udf((left: Object, right: Object) => EditDistance.between(left, right))
+  private def edit_distance: UserDefinedFunction =
+    functions.udf((left: Object, right: Object) => EditDistance.between(left, right): Int)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (1)

55-66: Enhance date conversion test structure.

While the test covers both date types, consider:

  1. Extracting duplicate conversion logic
  2. Asserting the actual BigQuery date values
 it should "bigquery connector converts spark dates regardless of date setting" in {
   val input = spark.createDataFrame(Seq((1, "2021-01-01"))).toDF("id", "ds")
+  def testDateConversion(useJava8Api: Boolean) = {
+    spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, useJava8Api)
+    val date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1)
+    val expectedType = if (useJava8Api) classOf[java.time.LocalDate] else classOf[java.sql.Date]
+    assert(date.getClass == expectedType)
+    val bqDate = SparkBigQueryUtil.sparkDateToBigQuery(date)
+    assert(bqDate != null) // Add specific value assertion
+  }
-  spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true)
-  val java8Date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1)
-  assert(java8Date.isInstanceOf[java.time.LocalDate])
-  SparkBigQueryUtil.sparkDateToBigQuery(java8Date)
-
-  spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, false)
-  val nonJava8Date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1)
-  assert(nonJava8Date.isInstanceOf[java.sql.Date])
-  SparkBigQueryUtil.sparkDateToBigQuery(nonJava8Date)
+  testDateConversion(true)
+  testDateConversion(false)
 }
online/src/main/scala/ai/chronon/online/CatalystUtil.scala (2)

147-150: Use typed container.
Returning Seq[Array[Any]] may undermine type safety.


167-193: Method size.
Consider smaller subroutines for readability.

hub/src/main/java/ai/chronon/hub/HubVerticle.java (1)

82-85: Consider separate request classes for join and column drift.

Using JoinDriftRequest for both join and column drift endpoints could cause confusion. Create distinct request classes for better type safety.

-router.get("/api/v1/join/:name/column/:columnName/drift").handler(RouteHandlerWrapper.createHandler(driftHandler::getColumnDrift, JoinDriftRequest.class));
+router.get("/api/v1/join/:name/column/:columnName/drift").handler(RouteHandlerWrapper.createHandler(driftHandler::getColumnDrift, ColumnDriftRequest.class));
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)

6-6: Document rationale for wildcard import.

Consider adding a comment explaining why wildcard import is preferred over specific imports.


79-80: Consider adding descriptive error message for Hive fallback.

Good fallback logic, but users might benefit from knowing why Hive was chosen.

-        scala.Option(btTableIdentifier.getProject).map(BigQueryFormat(_, bigQueryClient, Map.empty)).getOrElse(Hive))
+        scala.Option(btTableIdentifier.getProject)
+          .map(BigQueryFormat(_, bigQueryClient, Map.empty))
+          .getOrElse {
+            logger.info(s"No BigQuery project found for $tableName, falling back to Hive")
+            Hive
+          })
flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala (1)

14-16: Make cache capacity configurable.
Separate the cache size from static code so it can be customized via config if needed.

- private val CacheCapacity: Int = 10
+ private val CacheCapacity: Int = conf.get("registry_cache_capacity").map(_.toInt).getOrElse(10)
flink/src/main/scala/ai/chronon/flink/KafkaFlinkSource.scala (1)

20-28: Improve bootstrap fallback.
You assume host and port if the config is missing yet throw an error only for the port. Consider a clearer fallback or using defaults for both.

flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (2)

18-19: Prefer final variables.
Mark constructor params as val or private val if they are not re-assigned.


49-55: Blocking call caution.
Await.result(..., 30.seconds) can stall event loops. Consider async handling.

hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (1)

49-55: Blocking call caution.
Await.result(..., 30.seconds) can freeze threads. Async approach may be safer.

api/py/ai/chronon/cli/compile/compile_context.py (2)

2-2: Remove unused import.
from logging import error is unused. Drop it.

-from logging import error
🧰 Tools
🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Rename unused variable.
Rename sub_dirs to _sub_dirs to clarify it's unused.

-for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (2)

20-39: Handle unsupported types carefully
Throwing RuntimeException is okay, but consider a domain-specific exception.


44-48: Casting with reflection
Consider safer casts or typed definitions.

frontend/src/lib/api/api.ts (1)

40-46: Generic getConf
Clear approach, but return type might need stricter checks.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1)

21-24: Consider consistent exception usage.
Elsewhere, you use UnsupportedOperationException; here you use NotImplementedError.

-throw new NotImplementedError("alterTableProperties not yet supported for BigQuery")
+throw new UnsupportedOperationException("alterTableProperties not yet supported for BigQuery")
frontend/src/lib/components/NavigationBar.svelte (1)

251-261: Hard-coded condition may cause confusion.

Consider removing or externalizing 'risk.user_transactions.txn_join' for clarity.

api/thrift/fetcher.thrift (1)

6-11: Consider adding field constraints and improving docs.

Add:

  • Valid ranges for timestamp fields
  • Max length for dataset string
  • Expected format for keyBytes
frontend/src/routes/joins/+page.svelte (1)

8-12: Simplify sort function.

The sort function can be simplified using string comparison.

-	const sortedItems = data.items.sort((a: IJoin, b: IJoin) => {
-		if (a.metaData?.name === 'risk.user_transactions.txn_join') return -1;
-		if (b.metaData?.name === 'risk.user_transactions.txn_join') return 1;
-		return 0;
-	});
+	const sortedItems = data.items.sort((a: IJoin, b: IJoin) => 
+		(b.metaData?.name === 'risk.user_transactions.txn_join') as any - 
+		(a.metaData?.name === 'risk.user_transactions.txn_join') as any
+	);
distribution/run_zipline_quickstart.py (1)

10-11: Fix line continuation style.

Use parentheses for line continuation instead of backslash.

-        quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))
-                                     , "run_zipline_quickstart.sh")
+        quickstart_sh = os.path.join(
+            os.path.dirname(os.path.realpath(__file__)),
+            "run_zipline_quickstart.sh"
+        )
frontend/src/routes/joins/[slug]/observability/ModelTable.svelte (1)

15-17: Simplify ModelType indexing.

The conditional check for undefined can be simplified using optional chaining.

-			value={model.modelType === undefined ? undefined : ModelType[model.modelType]}
+			value={ModelType[model.modelType ?? -1]}
api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1)

18-18: Consider using relative timestamps.

Hardcoded timestamp might become outdated.

-    key.setTileStartTimestampMillis(1738195200000L) // 2025-01-29T00:00:00Z
+    key.setTileStartTimestampMillis(System.currentTimeMillis())
api/src/main/scala/ai/chronon/api/TilingUtils.scala (1)

15-15: Document the compact protocol bug.

Add details about the bug for future reference.

frontend/src/lib/server/conf-loader.ts (1)

39-42: Enhance error handling specificity.

Consider catching specific error types and providing more detailed error messages.

 } catch (error) {
-    console.error(`Failed to load ${entityMatch.label.toLowerCase()}:`, error);
-    throw error;
+    const errorMessage = error instanceof Error ? error.message : 'Unknown error';
+    console.error(`Failed to load ${entityMatch.label.toLowerCase()}: ${errorMessage}`);
+    throw new Error(`Configuration loading failed: ${errorMessage}`);
 }
api/py/ai/chronon/cli/compile/fill_templates.py (2)

4-4: Remove unused import.

The parse_teams import is not used in the code.

-import ai.chronon.cli.compile.parse_teams as teams
🧰 Tools
🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)


7-15: Add input validation.

Add validation for input parameters to prevent potential errors.

 def _fill_template(table, obj, namespace):
+    if not isinstance(table, str):
+        return table
 
     if table:
         table = table.replace(
api/py/ai/chronon/cli/logger.py (1)

5-14: Consider using an enum for color codes.

Organize color codes using an enum for better maintainability.

from enum import Enum

class Colors(Enum):
    TIME = "\033[36m"  # Cyan
    DEBUG = "\033[36m"  # Cyan
    INFO = "\033[32m"  # Green
    WARNING = "\033[33m"  # Yellow
    ERROR = "\033[31m"  # Red
    CRITICAL = "\033[41m"  # White on Red
    FILE = "\033[35m"  # Purple
    RESET = "\033[0m"
spark/src/main/scala/ai/chronon/spark/streaming/TopicCheckerApp.scala (1)

39-41: Consider graceful shutdown.

Using System.exit(0) might not allow proper cleanup. Consider using a more graceful shutdown mechanism.

api/py/ai/chronon/cli/compile/parse_configs.py (1)

4-4: Remove unused imports.

-import logging
-from typing import Any, Dict, List, Tuple
+from typing import Any, List
-import ai.chronon.cli.compile.parse_teams as teams

Also applies to: 6-6, 9-9

🧰 Tools
🪛 Ruff (0.8.2)

4-4: logging imported but unused

Remove unused import: logging

(F401)

api/py/ai/chronon/cli/entrypoint.py (1)

2-2: Remove unused imports.

-import sys
-from enum import Enum

Also applies to: 5-5

🧰 Tools
🪛 Ruff (0.8.2)

2-2: sys imported but unused

Remove unused import: sys

(F401)

api/py/ai/chronon/cli/compile/compiler.py (1)

48-49: Clean up spinner implementation.

Status variable is unused and spinner code is duplicated.

-    with console.status("[bold green]Loading...", spinner="dots") as status:
+    with console.status("[bold green]Loading...", spinner="dots"):
🧰 Tools
🪛 Ruff (0.8.2)

48-48: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/src/main/scala/ai/chronon/api/Constants.scala (1)

80-82: Enhance documentation for magic null constants.

While the comment explains language compatibility, it should also document:

  • Why these specific values were chosen
  • When to use these constants vs. actual null values
api/thrift/hub.thrift (1)

119-124: Add documentation for each enum value.

Each ConfType value should be documented to explain its purpose and usage.

online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1)

37-52: Enhance parse method robustness.

Consider these improvements:

  • Validate params format before splitting
  • Handle malformed key-value pairs gracefully
 def parse(topic: String): TopicInfo = {
   assert(topic.nonEmpty, s"invalid topic: $topic")
   val (messageBus, rest) = if (topic.contains("://")) {
     val tokens = topic.split("://", 2)
     tokens.head -> tokens.last
   } else {
     "kafka" -> topic
   }
   assert(rest.nonEmpty, s"invalid topic: $topic")
   val fields = rest.split("/")
   val topicName = fields.head
-  val params = fields.tail.map { f =>
-    val kv = f.split("=", 2); kv.head -> kv.last
-  }.toMap
+  val params = fields.tail.flatMap { f =>
+    try {
+      val kv = f.split("=", 2)
+      if (kv.length == 2) Some(kv.head -> kv.last) else None
+    } catch {
+      case _: Exception => None
+    }
+  }.toMap
   TopicInfo(topicName, messageBus, params)
 }
api/py/ai/chronon/cli/git_utils.py (1)

4-4: Remove unused import.

-from typing import List, Optional, Set
+from typing import List, Optional
🧰 Tools
🪛 Ruff (0.8.2)

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)

hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (1)

23-24: Add test cases for concurrent requests and error scenarios.

Consider adding tests for:

  • Concurrent drift requests
  • DriftStore failures
  • Empty result sets

Also applies to: 64-88

flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (1)

8-9: Restore removed documentation for TimestampedIR and TimestampedTile.

Add back the documentation to maintain code clarity.

api/py/ai/chronon/cli/compile/validator.py (1)

458-470: Consider combining temporal and topic checks.

The temporal and topic checks could be combined to reduce code complexity.

-    non_temporal = (
-        group_by.accuracy is None or group_by.accuracy == Accuracy.SNAPSHOT
-    )
-
-    no_topic = not _group_by_has_topic(group_by)
-    has_hourly_windows = _group_by_has_hourly_windows(group_by)
-
-    # batch features cannot contain hourly windows
-    if (no_topic and non_temporal) and has_hourly_windows:
+    if (not _group_by_has_topic(group_by) and 
+        (group_by.accuracy is None or group_by.accuracy == Accuracy.SNAPSHOT) and 
+        _group_by_has_hourly_windows(group_by)):
distribution/build_and_upload_gcp_artifacts.sh (2)

33-33: Split VERSION Assignment
Separate the VERSION extraction from the 'pip wheel' command to avoid masking return values.


9-9: Enhance cd Robustness
Consider appending "|| exit" to the cd command for added safety (e.g., cd $CHRONON_ROOT_DIR || exit).

distribution/run_zipline_quickstart.sh (3)

7-8: Robust Working Directory Change
Use "cd $WORKING_DIR || exit" to ensure the script halts if the directory change fails.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


10-11: Remove Unused RED Variable
The RED variable is declared but never used; remove it if it’s unnecessary.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


15-18: Address Bigtable TODO
The TODO for deleting bigtable rows remains unresolved—update this per the requirements.

service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (1)

90-103: Enhance error messages.

Add more context to error messages.

-                        LOGGER.error("Failed to serialize response", e);
+                        LOGGER.error("Failed to serialize response to JSON using TSimpleJSONProtocol", e);
-                        LOGGER.error("Unexpected error during serialization", e);
+                        LOGGER.error("Unexpected error during JSON serialization of " + output.getClass().getSimpleName(), e);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 2c28746 and 30aed3c.

📒 Files selected for processing (131)
  • .github/workflows/require_triggered_status_checks.yaml (1 hunks)
  • .github/workflows/test_frontend.yaml (1 hunks)
  • .plugin-versions (1 hunks)
  • .tool-versions (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala (1 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala (1 hunks)
  • api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
  • api/py/ai/chronon/cli/compile/validator.py (12 hunks)
  • api/py/ai/chronon/cli/entrypoint.py (1 hunks)
  • api/py/ai/chronon/cli/git_utils.py (1 hunks)
  • api/py/ai/chronon/cli/logger.py (1 hunks)
  • api/py/ai/chronon/group_by.py (0 hunks)
  • api/py/ai/chronon/join.py (0 hunks)
  • api/py/ai/chronon/repo/hub_uploader.py (2 hunks)
  • api/py/ai/chronon/repo/run.py (13 hunks)
  • api/py/test/sample/teams.json (2 hunks)
  • api/src/main/scala/ai/chronon/api/Constants.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1 hunks)
  • api/thrift/api.thrift (4 hunks)
  • api/thrift/fetcher.thrift (1 hunks)
  • api/thrift/hub.thrift (1 hunks)
  • api/thrift/observability.thrift (1 hunks)
  • build.sbt (8 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (7 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (5 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (5 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProviderTest.scala (1 hunks)
  • distribution/build_and_upload_gcp_artifacts.sh (3 hunks)
  • distribution/run_zipline_quickstart.py (1 hunks)
  • distribution/run_zipline_quickstart.sh (1 hunks)
  • flink/BUILD.bazel (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (5 hunks)
  • flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (8 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (11 hunks)
  • flink/src/main/scala/ai/chronon/flink/KafkaFlinkSource.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/SchemaProvider.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/SourceProvider.scala (0 hunks)
  • flink/src/main/scala/ai/chronon/flink/TestFlinkJob.scala (4 hunks)
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (5 hunks)
  • flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (1 hunks)
  • flink/src/test/resources/user.avsc (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (4 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (6 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/SchemaRegistrySchemaProviderSpec.scala (1 hunks)
  • flink/src/test/scala/org/apache/spark/sql/avro/AvroDeSerializationSupportSpec.scala (1 hunks)
  • frontend/src/lib/api/api.test.ts (3 hunks)
  • frontend/src/lib/api/api.ts (3 hunks)
  • frontend/src/lib/components/LogicalNodeTable.svelte (1 hunks)
  • frontend/src/lib/components/NavigationBar.svelte (3 hunks)
  • frontend/src/lib/server/conf-loader.ts (1 hunks)
  • frontend/src/lib/types/Entity/Entity.ts (1 hunks)
  • frontend/src/lib/types/Model/Model.test.ts (0 hunks)
  • frontend/src/lib/types/Model/Model.ts (0 hunks)
  • frontend/src/routes/groupbys/+page.server.ts (1 hunks)
  • frontend/src/routes/groupbys/+page.svelte (1 hunks)
  • frontend/src/routes/joins/+page.server.ts (1 hunks)
  • frontend/src/routes/joins/+page.svelte (1 hunks)
  • frontend/src/routes/joins/[slug]/observability/ModelTable.svelte (1 hunks)
  • frontend/src/routes/joins/[slug]/services/joins.service.ts (2 hunks)
  • frontend/src/routes/models/+page.server.ts (1 hunks)
  • frontend/src/routes/models/+page.svelte (1 hunks)
  • frontend/src/routes/stagingqueries/+page.server.ts (1 hunks)
  • frontend/src/routes/stagingqueries/+page.svelte (1 hunks)
  • frontend/src/routes/thrift/+page.svelte (1 hunks)
  • hub/src/main/java/ai/chronon/hub/HubVerticle.java (3 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (1 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (1 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/JoinsHandler.scala (0 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/ModelsHandler.scala (0 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/Paginate.scala (0 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/SearchHandler.scala (0 hunks)
  • hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala (1 hunks)
  • hub/src/main/scala/ai/chronon/hub/model/Model.scala (0 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (1 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (1 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/JoinHandlerTest.scala (0 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/ModelHandlerTest.scala (0 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/SearchHandlerTest.scala (0 hunks)
  • online/src/main/java/ai/chronon/online/JavaFetcher.java (2 hunks)
  • online/src/main/scala/ai/chronon/online/AvroConversions.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/Fetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/FetcherBase.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/FetcherMain.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/TopicChecker.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (4 hunks)
  • online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (24 hunks)
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (5 hunks)
  • service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/JobSubmitter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/format/Format.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/KafkaStreamBuilder.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/TopicCheckerApp.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (1 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (2 hunks)
  • tools/build_rules/dependencies/spark_repository.bzl (1 hunks)
  • tools/build_rules/flink/BUILD (1 hunks)
  • tools/build_rules/prelude_bazel (1 hunks)
  • tools/build_rules/spark/BUILD (1 hunks)
💤 Files with no reviewable changes (13)
  • api/py/ai/chronon/join.py
  • hub/src/test/scala/ai/chronon/hub/handlers/ModelHandlerTest.scala
  • hub/src/main/scala/ai/chronon/hub/handlers/Paginate.scala
  • frontend/src/lib/types/Model/Model.ts
  • hub/src/test/scala/ai/chronon/hub/handlers/JoinHandlerTest.scala
  • hub/src/test/scala/ai/chronon/hub/handlers/SearchHandlerTest.scala
  • hub/src/main/scala/ai/chronon/hub/handlers/JoinsHandler.scala
  • hub/src/main/scala/ai/chronon/hub/handlers/SearchHandler.scala
  • hub/src/main/scala/ai/chronon/hub/handlers/ModelsHandler.scala
  • api/py/ai/chronon/group_by.py
  • flink/src/main/scala/ai/chronon/flink/SourceProvider.scala
  • hub/src/main/scala/ai/chronon/hub/model/Model.scala
  • frontend/src/lib/types/Model/Model.test.ts
✅ Files skipped from review due to trivial changes (9)
  • .tool-versions
  • spark/src/main/scala/ai/chronon/spark/JobSubmitter.scala
  • tools/build_rules/spark/BUILD
  • api/src/main/scala/ai/chronon/api/DataType.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala
  • api/py/ai/chronon/repo/hub_uploader.py
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProviderTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
🧰 Additional context used
📓 Learnings (4)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
.github/workflows/test_frontend.yaml (1)
Learnt from: ken-zlai
PR: zipline-ai/chronon#185
File: .github/workflows/test_frontend.yaml:41-42
Timestamp: 2025-01-08T18:29:51.176Z
Learning: In GitHub Actions, the `version` parameter in `awalsh128/cache-apt-pkgs-action` is a cache control number used to invalidate and regenerate the cache. It does not specify the package version - the actual package version is determined by the Ubuntu package repositories.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (2)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[warning] 39-39: Declare and assign separately to avoid masking return values.

(SC2155)

🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/fill_templates.py

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_configs.py

4-4: logging imported but unused

Remove unused import: logging

(F401)


6-6: typing.Dict imported but unused

Remove unused import

(F401)


6-6: typing.Tuple imported but unused

Remove unused import

(F401)


9-9: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_teams.py

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/repo/run.py

817-817: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

api/py/ai/chronon/cli/compile/validator.py

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compiler.py

20-20: Undefined name console

(F821)


25-25: Undefined name console

(F821)


28-28: Undefined name console

(F821)


28-28: f-string without any placeholders

Remove extraneous f prefix

(F541)


48-48: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/ai/chronon/cli/entrypoint.py

2-2: sys imported but unused

Remove unused import: sys

(F401)


5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)


17-17: Do not use bare except

(E722)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)


33-33: Do not use bare except

(E722)


73-73: Do not use bare except

(E722)

🔇 Additional comments (213)
.github/workflows/require_triggered_status_checks.yaml (2)

1-8: Workflow Config OK.
YAML is concise and correctly sets up branch protection on pull requests.


9-14: Steps Config Validated.
The action is properly configured with token input and a 10s delay.

.plugin-versions (1)

2-2: LGTM!

The bazelisk plugin entry is correctly formatted and uses a specific commit hash.

build.sbt (3)

218-219: LGTM!

The Confluent dependencies are properly configured with consistent versions.

Also applies to: 225-225, 249-249


293-293: LGTM!

Minor version bump for spark-bigquery-with-dependencies.


300-301: LGTM!

Proper exclusion rule to prevent dependency conflicts.

online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (2)

25-25: LGTM!

Import required for ScalaTest matchers.


182-182: LGTM! Safer result handling.

Consistent use of .headOption improves robustness.

Also applies to: 201-201, 225-225, 250-250, 275-275, 298-298, 310-310, 318-318, 327-327, 335-335, 352-352, 362-362, 372-372, 387-387, 407-407, 425-425, 443-443, 470-470, 488-488, 502-502, 521-521, 540-540, 553-553, 649-649

spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (3)

20-20: LGTM!

Import follows package organization best practices.


85-95: LGTM!

Clean integration with percentile-based metrics and edit operation tracking.


109-115: LGTM!

Proper type-specific metric application.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2)

9-13: LGTM!

Import changes align with the new date conversion test case.


48-52: Add test coverage for the new BigQueryFormat parameter.

Pattern matching was updated to handle 3 parameters, but the test only verifies the type.

Consider adding assertions to validate the values of all three parameters.

online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (1)

48-48: Add sequence size validation.

Silently taking the first result could hide data issues.

Consider adding size validation:

-        reintroduceExceptions(catalystUtil.performSql(keys ++ values).headOption.orNull, values)
+        val results = catalystUtil.performSql(keys ++ values)
+        require(results.size <= 1, s"Expected at most one result, got ${results.size}")
+        reintroduceExceptions(results.headOption.orNull, values)
✅ Verification successful

Validate SQL result sequence size.

  • Adding an explicit size check is ideal to catch unexpected multiple rows.
  • Consider updating the code as follows:
    -        reintroduceExceptions(catalystUtil.performSql(keys ++ values).headOption.orNull, values)
    +        val results = catalystUtil.performSql(keys ++ values)  // get SQL results
    +        require(results.size <= 1, s"Expected at most one result, got ${results.size}")  // validate size
    +        reintroduceExceptions(results.headOption.orNull, values)  // process first result or null
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1)

9-9: LGTM! Clear test description.

The updated test description better reflects the functionality being tested.

online/src/main/scala/ai/chronon/online/CatalystUtil.scala (11)

44-44: Consider concurrency.
ArrayBuffer is not thread-safe.


113-114: Check backward compatibility.
Changing return type from Option to Seq might affect existing callers.


140-140: Consistent transformation.
Storing transformFunc returning Seq aligns with multi-result strategy.


154-154: Verify usage.
Method now yields Seq—ensure callers handle multiple results.


159-163: Multiple results.
Returning Seq[Map[String, Any]] is consistent with the broader changes.


201-210: Mutable buffer caution.
Ensure no concurrency issues with shared buffer usage.


215-221: Filter logic.
Returning ArrayBuffer fosters multi-row processing.


236-245: Projection chain.
Implementation seems coherent.


247-251: Simple projection.
Generates array buffers of single row.


255-259: Collect usage.
Potential memory risk if data is large.


262-266: Projection approach.
Straightforward.

flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (2)

2-2: LGTM!

Import statement aligns with the broader refactoring changes.


34-34: Verify timestamp availability.

The simplified timestamp access looks cleaner, but ensure tsMillis is always available in WriteResponse.

✅ Verification successful

Timestamp availability verified.

  • tsMillis is defined in WriteResponse (in flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala).
  • Its consistent use in equals and hashCode confirms its presence.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for WriteResponse definition and usage
ast-grep --pattern 'case class WriteResponse {
  $$$
  tsMillis: Long
  $$$
}'

Length of output: 80


Script:

#!/bin/bash
# Search for WriteResponse definitions across the codebase to check for tsMillis.
rg -n "case class WriteResponse" .

Length of output: 36


Script:

#!/bin/bash
# Comprehensive search for WriteResponse definitions in Scala.
rg -n "(case class|class|trait)\s+WriteResponse" .

Length of output: 165


Script:

#!/bin/bash
# Dump WriteResponse class for tsMillis context.
sed -n '80,150p' ./flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala

Length of output: 1350

flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (25)

4-5: Essential imports
All good.


9-12: New imports for watermark & Avro
No issues.


19-19: MetadataStore usage
Looks correct.


21-21: TopicInfo usage
No concerns.


22-23: Event-time imports
All good.


29-29: Checkpoint config import
Valid usage.


44-44: Duration import
Used for watermarks. Fine.


62-62: sinkFn updated
Now uses AvroCodecOutput. Seems consistent.


115-120: Watermark assignment
Good approach for event-time.


121-121: Avro output
No issues.


170-175: Second watermark assignment
Mirrors the first approach. Good.


203-203: Watermarks in tiling
Ensures correct event-time logic.


225-225: Tiling DS Avro output
Consistent with usage above.


260-263: AllowedOutOfOrderness
5 minutes for late data is typical.


264-267: Idleness timeout
Handles low event rates neatly.


268-278: Custom watermarkStrategy
Implements bounded out-of-orderness & idleness.


287-288: Mock source option
Minor, looks good.


289-290: kafkaBootstrap
Optional param. No concerns.


299-299: groupByName
Straight usage. Fine.


303-303: Kafka bootstrap
Clean usage.


314-349: Retrieving serving info
Robust. Good error handling.


360-361: Checkpoint retention commentary
No issues.


379-379: Creating jobDatastream
Looks fine.


381-385: Metrics sink
Nice separation.


390-390: buildApi method
Private scoping is good.

hub/src/main/java/ai/chronon/hub/HubVerticle.java (2)

6-7: LGTM!

Imports align with the new drift analysis functionality.


63-66: Verify error handling in ConfHandler.

Routes look good. Consider verifying error handling for invalid configurations.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)

63-65: LGTM! Consistent client usage.

Good practice to pass bigQueryClient explicitly.


37-43: LGTM! Verify direct write method compatibility.

Safe changes with CREATE_NEVER disposition. Direct write method is more efficient.

✅ Verification successful

LGTM – Direct write and CREATE_NEVER are correctly set in GcpFormatProvider.scala.

  • "writeMethod" is "direct" (Line 38).
  • "createDisposition" uses CREATE_NEVER (Line 39).
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if any tests or configurations assume indirect write method
rg -l "writeMethod.*indirect" 

Length of output: 31


Script:

#!/bin/bash
# Verify new configuration uses direct writeMethod and CREATE_NEVER disposition
rg -n 'writeMethod\s*->\s*"direct"'
rg -n 'CREATE_NEVER'

Length of output: 217


Script:

#!/bin/bash
# Check for explicit "direct" usage in writeMethod.
grep -n '-> "direct"' cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala

Length of output: 217


Script:

#!/bin/bash
# Confirm direct writeMethod config using PCRE.
grep -n -P "\"writeMethod\"\\s*->\\s*\"direct\"" cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
# Confirm CREATE_NEVER config.
grep -n -F 'CREATE_NEVER' cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala

Length of output: 348

flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (2)

21-25: Encoder creation clarity.
Method is concise. Ensure AvroDataToCatalyst handles schema changes gracefully.


71-80: Check null returns.
Returning null after errors can cause confusion in downstream processing.

hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (2)

63-66: Handle empty result more robustly.
headOption.getOrElse(...) returns an empty series. Check if that's correct.


68-79: Offset parsing structure.
Implementation is fine, but consider rejecting unrecognized suffixes more explicitly.

api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: Hidden files logic bug.
You likely want to skip hidden files if f.startswith('.') is true. This logic is reversed.

-if not f.startswith("."):  # ignore hidden files
-    continue
+if f.startswith("."):
+    continue
hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (3)

14-16: Class initialization looks good
Sufficient logging is used.


53-71: List retrieval
Straightforward approach; no issues found.


76-109: Search logic
Efficient usage of filter; ensure large registries are handled.

flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (3)

28-48: Async wait
Proper concurrency parameters. Looks good.


99-105: Timeout handling
Logs error, increments counter, sets status false. Good minimal approach.


107-131: Multi-put logic
Handles success/failure well; logging is precise.

frontend/src/lib/api/api.ts (14)

2-2: Additional imports
Well-structured references.

Also applies to: 4-8, 13-14


48-50: getJoin
Simple wrapper, no issues.


52-54: getGroupBy
Mirrors getJoin. OK.


56-58: getModel
Similar pattern, no concerns.


60-62: getStagingQuery
Consistent usage of getConf.


64-68: search
Invokes a single endpoint. Good approach.


135-140: getConfList
Uses confType param. Looks correct.


142-144: getJoinList
Retrieve from ConfType.JOIN. Perfect.


146-148: getGroupByList
Works similarly.


150-152: getModelList
No issues.


154-156: getStagingQueryList
Consistency maintained.


158-172: getJoinDrift
Builds URL params well.


174-191: getColumnDrift
Similar structure, no concerns.


193-201: getColumnSummary
Fetches summary data similarly.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (4)

5-11: Imports look appropriate.
No issues noted with these BigQuery-related imports.


17-18: Case class extension is fine.
The constructor parameter bqClient is a clear approach for BigQuery.


29-65: Creation logic seems correct.
Aligned with the external check for table existence, and consistent with the retrieved learnings.


139-142: Unsupported operations are clearly marked.
No further concerns.

flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (7)

8-9: New imports are correct.
These support the new output type design.


92-92: Good move to AvroCodecOutput.
Replaces PutRequest nicely.


104-104: flatMap signature updated.
Applies AvroCodecOutput in collector—works well.


116-120: Map-to-AvroCodecOutput logic looks valid.
Key/value bytes are handled confidently.


132-132: TiledAvroCodecFn also updated to AvroCodecOutput.
Consistent approach.


143-143: flatMap for tiles updated.
Matches new output type usage.


155-173: Tile-to-AvroCodecOutput conversion is fine.
Retains debug logging; no issue found.

flink/src/main/scala/ai/chronon/flink/TestFlinkJob.scala (7)

5-7: Import updates support row-based logic.
No issues noted.

Also applies to: 11-12, 15-15, 18-22, 25-26, 28-28, 33-36


49-73: E2EEventSource now yields Row.
Implementation is straightforward and uses parallelism well.


80-80: Time difference calculation is fine.
No changes needed.


85-92: DummyInitializationContext is minimal and functional.
Works for the tests.


95-106: Schema definitions appear correct.
Switch to DataType-based fields is consistent.


108-120: makeSource logic is coherent.
Encodes to Avro and decodes into Spark Rows effectively.


197-212: buildTestFlinkJob approach is consistent.
Builds a Flink pipeline with row-based sources.

frontend/src/lib/components/NavigationBar.svelte (3)

25-30: Imports look good.


52-52: Initialization is correct.


60-69: Merged arrays for consistent typed results.

online/src/main/scala/ai/chronon/online/FetcherBase.scala (4)

47-47: New import for ExecutionContext is valid.


327-364: Filtering out invalid requests improves safety.


613-644: Updated response parsing with fallback logic is clear.


382-382: Confirm usage of .toSeq after this line

Ensure no unexpected map iteration breakage.

✅ Verification successful

Confirmed: .toSeq usage in FetcherBase is intact.
The shell script output indicates that the call remains present in online/src/main/scala/ai/chronon/online/FetcherBase.scala, aligning with similar conversions elsewhere. No unexpected iteration issues were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching suspicious references in the file to confirm usage
rg -A 3 "toSeq" 

Length of output: 47736

api/py/ai/chronon/repo/run.py (6)

33-33: Enum import helps new job types.


135-137: New jar constants for Flink and Dataproc.


139-141: New job type enum broadens streaming support.


419-426: New fields handle Flink streaming params.


430-431: Conditional fetch logic is consistent.


983-984: Extended signature for new streaming params is fine.

frontend/src/routes/stagingqueries/+page.svelte (1)

1-7: LGTM!

Clean and consistent with other route components.

frontend/src/routes/models/+page.svelte (1)

1-7: LGTM!

Consistent implementation.

frontend/src/routes/groupbys/+page.svelte (1)

1-7: LGTM!

Matches the pattern used in other routes.

frontend/src/routes/joins/+page.svelte (1)

15-15: LGTM!

The LogicalNodeTable component is used correctly with the required props.

flink/src/main/scala/ai/chronon/flink/SchemaProvider.scala (1)

8-17: LGTM!

Well-structured abstract class with clear documentation and purpose.

tools/build_rules/dependencies/spark_repository.bzl (1)

21-24: LGTM! Good catch on the version conflict.

The exclusion prevents NoSuchMethodError by avoiding commons-cli 1.2 vs 1.5.0 conflict with Flink runtime.

tools/build_rules/flink/BUILD (2)

1-1: LGTM! Public visibility is appropriate for build rules.


5-19: Verify the main_class hack.

The hack comment needs explanation.

Add a comment explaining why None is used as main_class.

api/src/main/scala/ai/chronon/api/TilingUtils.scala (1)

17-25: LGTM! Thread-local serializers prevent concurrency issues.

Good use of ThreadLocal to handle concurrent serialization safely.

flink/src/test/resources/user.avsc (1)

1-59: Well-structured and documented schema.

The Avro schema is comprehensive with appropriate field types and documentation.

frontend/src/lib/api/api.test.ts (1)

33-35: LGTM!

Test cases are well-structured and correctly updated to reflect the API changes.

Also applies to: 50-50, 61-61

frontend/src/lib/types/Entity/Entity.ts (1)

55-56: Track workaround for future removal.

Consider creating a ticket to track the removal of this workaround once the Asana task is resolved.

✅ Verification successful

Reminder: Create a tracking ticket for workaround removal.

  • The workaround in frontend/src/lib/types/Entity/Entity.ts is confirmed.
  • Similar "todo" and workaround comments appear elsewhere.
  • Recommend creating a ticket to track its removal once the Asana task is resolved.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other workarounds
rg -i "workaround|hack|todo|fixme" --type ts

Length of output: 933

api/py/ai/chronon/cli/compile/parse_configs.py (1)

14-20: LGTM!

Well-structured dataclass with clear field definitions.

api/py/ai/chronon/cli/entrypoint.py (2)

27-34: LGTM!

Well-structured CLI commands with clear documentation and appropriate default values.

Also applies to: 36-59, 61-67


10-19: ⚠️ Potential issue

Specify exception type in get_current_branch.

 def get_current_branch():
     try:
         return (
             subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
             .decode("utf-8")
             .strip()
         )
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

17-17: Do not use bare except

(E722)

flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (1)

24-27: LGTM!

Clean migration from PutRequest to AvroCodecOutput with consistent test assertions.

Also applies to: 46-49

spark/src/main/scala/ai/chronon/spark/streaming/KafkaStreamBuilder.scala (1)

21-21: LGTM!

Clean addition of TopicChecker import.

frontend/src/routes/joins/[slug]/services/joins.service.ts (2)

2-2: LGTM! Import changes align with the new API structure.

Also applies to: 5-5


44-46: Verify the optional chaining.

The change from models.items to models.models?.find() suggests a potential breaking change in the API response structure.

✅ Verification successful

Optional Chaining Verified
The search confirms only the use of models.models?.find(), matching the updated API structure. No evidence suggests any misuse of optional chaining.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other usages of the models property
rg -A 2 'models\.(items|models)' 

Length of output: 324

flink/BUILD.bazel (2)

15-15: LGTM! Kafka client dependency added.


60-76: LGTM! Binary targets properly configured.

The exclusion of runtime dependencies via deploy_env prevents version conflicts.

api/py/ai/chronon/cli/compile/parse_teams.py (4)

13-38: LGTM! Team loading implementation.

Clear error handling and logging of found teams.


74-111: LGTM! Helper functions for merging.

The merge functions handle None cases properly.


4-4: 🛠️ Refactor suggestion

Remove unused import.

The from_file import is not used in this module.

-from ai.chronon.cli.compile.parse_configs import from_file

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)


40-72: Verify metadata update logic.

The metadata update assumes specific object structure. Ensure all objects passed to this function have the required fields.

✅ Verification successful

Verified metadata update logic.

  • The sole call in compiler.py passes co.obj, and update_metadata enforces that obj.metaData (with its name and team fields) is non-null.
  • No usage contradicts the required structure.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for calls to update_metadata
rg -A 3 'update_metadata\(' 

Length of output: 692

tools/build_rules/prelude_bazel (1)

41-42: LGTM! Flink dependencies added.

The Kafka connector and Avro dependencies align with the changes in BUILD.bazel.

api/py/ai/chronon/cli/compile/compiler.py (1)

111-116: Add error handling for file operations.

File operations should handle IOError and OSError exceptions.

api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (1)

1-79: LGTM! Well-structured tests with good edge case coverage.

Tests handle nulls, special values, and edge cases thoroughly.

flink/src/test/scala/ai/chronon/flink/test/SchemaRegistrySchemaProviderSpec.scala (1)

1-60: LGTM! Comprehensive schema registry tests.

Good coverage of success and failure cases with proper mocking.

spark/src/main/scala/ai/chronon/spark/format/Format.scala (2)

10-10: LGTM!

Logger initialization follows best practices with @transient and lazy val.


77-89: Similar error handling needed here.

Same error handling improvement as suggested for createTable should be applied here.

api/thrift/hub.thrift (1)

126-133: Add validation for branch/version exclusivity.

The comment states that either branch or version should be set, but this isn't enforced. Consider adding validation in the handler.

api/py/ai/chronon/cli/git_utils.py (4)

33-34: Specify exception type.

-        except:
+        except OSError:
            pass
🧰 Tools
🪛 Ruff (0.8.2)

33-33: Do not use bare except

(E722)


73-74: Specify exception type.

-    except:
+    except OSError:
        return None
🧰 Tools
🪛 Ruff (0.8.2)

73-73: Do not use bare except

(E722)


77-128: LGTM!

Good error handling and logging implementation.


130-157: LGTM!

Well-structured implementation with proper error handling.

flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (4)

20-36: LGTM!

Well-documented with proper equals/hashCode implementation.


47-65: LGTM!

Well-documented with proper equals/hashCode implementation.


71-91: LGTM!

Well-documented with proper equals/hashCode implementation.


96-120: LGTM!

Well-documented with proper equals/hashCode implementation.

flink/src/test/scala/org/apache/spark/sql/avro/AvroDeSerializationSupportSpec.scala (2)

23-83: LGTM!

Well-structured test cases with proper resource cleanup.


86-125: LGTM!

Well-structured helper object with comprehensive test data creation.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (1)

42-131: LGTM!

Formatting changes improve readability.

tools/build_rules/dependencies/maven_repository.bzl (2)

84-85: LGTM: Added Flink Kafka and Avro dependencies.

The additions align with the PR objectives for enhanced Flink support.


104-119: Well-documented Hadoop exclusions.

Clear explanation for excluding Hadoop dependencies to prevent IllegalAccessError.

api/thrift/observability.thrift (3)

141-142: LGTM: Added default drift metric.

Setting JENSEN_SHANNON as default aligns with industry standards per the documentation.


144-162: LGTM: Well-structured JoinDriftRequest and JoinDriftResponse.

Clean interface design with required and optional fields.


157-162: LGTM: Consistent JoinSummaryRequest structure.

Follows the same pattern as JoinDriftRequest.

spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (1)

27-27: LGTM: Added date type conversion support.

Proper handling of Avro date logical types during serialization.

Also applies to: 50-50

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (3)

98-99: LGTM: Using magic value instead of null.

More robust handling of missing values.


117-133: LGTM: Improved validation in collectDoubles.

Better tracking of valid inputs with sawValidInput flag.


147-155: LGTM: Enhanced handling of special values.

Properly handles NaN and Infinite values by converting to magic values.

flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (1)

31-44: Add null checks for WriteResponse fields.

Validate keyBytes, valueBytes, and tsMillis before access to prevent NPEs.

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

87-94: LGTM!

The filtering logic ensures that only relevant keys are processed.

online/src/main/java/ai/chronon/online/JavaFetcher.java (1)

27-27: LGTM!

Clean builder pattern implementation with good example usage.

Also applies to: 39-48, 50-119

hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (1)

1-296: LGTM!

Comprehensive test coverage with good use of mocking and edge cases.

online/src/main/scala/ai/chronon/online/AvroConversions.scala (1)

20-20: LGTM!

Clean date type support with good documentation.

Also applies to: 29-29, 37-44, 58-63, 121-122

spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)

83-83: LGTM!

Adding java.time.LocalDate to Kryo serializer is necessary for date serialization.

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (1)

114-115: LGTM!

Improved null safety using Option to handle potential null values from getPercentileDriftSeries.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (3)

69-70: LGTM!

Added savepoint support for Flink jobs.


193-198: LGTM!

Well-organized argument prefixes improve code readability.


246-257: LGTM!

Clean pattern matching for job type handling.

service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (2)

90-103: LGTM! Improved error handling and serialization.

The new implementation provides better error handling and uses a consistent serialization approach with TSimpleJSONProtocol.


5-5: LGTM!

Import required for JSON serialization.

online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (3)

229-240: LGTM! Good test coverage for the happy path.

Test verifies basic fetch functionality with proper assertions.


242-255: LGTM! Good null handling test.

Test ensures proper behavior with null keys.


257-270: LGTM! Good error case coverage.

Test verifies proper handling of missing keys.

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3)

116-116: LGTM! Consistent null handling.

Changes standardize null handling using Constants.magicNull values.

Also applies to: 254-255, 257-258, 313-313


317-334: LGTM! Good edge case coverage.

Test verifies handling of Long.MAX_VALUE and magicNullLong values.


352-373: LGTM! Comprehensive null handling tests.

Test covers various combinations of null and non-null fields.

spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)

74-74: LGTM! Better method naming and caching.

Changes improve clarity with better method naming and add efficient caching with lazy val.

Also applies to: 111-111

hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala (1)

122-130: LGTM! Improved null handling using Option.

The changes enhance safety by using Option and map methods instead of direct null checks.

api/thrift/api.thrift (2)

8-8: LGTM! Updated thrift generation command.

The command path has been updated to use relative paths.


441-457: LGTM! Well-structured data types for configuration.

The new EnvironmentVariables and Team structs provide a clean way to manage environment variables and team information.

spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (4)

26-26: LGTM! Consolidated Spark SQL imports.

Using a wildcard import for Spark SQL classes improves readability.


34-34: LGTM! Added TestRecord case class.

Simple and clean test data structure.


487-500: LGTM! Added test for empty DataFrame handling.

Good coverage for edge case handling.


480-481: LGTM! Standardized table creation with file format.

Consistent use of PARQUET format for table creation.

Also applies to: 521-522

spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)

386-392: LGTM! Added SQL string escaping.

Properly escaping single quotes prevents SQL injection vulnerabilities.

api/py/ai/chronon/cli/compile/validator.py (4)

22-32: LGTM!

The new imports are well-organized and necessary for the added functionality.


183-213: LGTM!

The new helper functions are well-structured and encapsulate specific validation logic.


220-242: LGTM!

The constructor changes improve parameter organization and initialization.


474-479: LGTM!

The error messages are clear and well-formatted.

spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1)

292-293: Verify message bus compatibility.

The change from topicType to messageBus might affect existing topic configurations.

✅ Verification successful

Message bus usage verified.

  • No topicType found; all code now uses messageBus.
  • Consistent usage across modules (spark, flink, online).
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining uses of topicType
rg -A 2 "topicType"

# Search for messageBus usage patterns
rg -A 2 "messageBus"

Length of output: 1905

spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)

236-238: LGTM!

The variable renaming from leftDf to statsDf improves code clarity.

online/src/main/scala/ai/chronon/online/Fetcher.scala (1)

99-107: LGTM!

The addition of executionContextOverride provides good flexibility for execution context management.

spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)

188-188: LGTM! Appropriate logging level change.

Changed logging level from error to info since these are expected states, not errors.

Also applies to: 197-197

spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)

299-299: LGTM! Better method name.

Changed from timeRange to calculateTimeRange for better clarity.

spark/src/main/scala/ai/chronon/spark/Driver.scala (1)

630-632: LGTM! Good refactoring.

Simplified code by delegating to FetcherMain and reusing FetcherArgs.

.github/workflows/test_frontend.yaml (1)

39-39: LGTM! Action version bump.

Updated cache-apt-pkgs-action to latest version v1.4.3.

api/py/test/sample/teams.json (2)

18-18: GCP Cluster Name Updated
GCP_DATAPROC_CLUSTER_NAME is now set to "zipline-canary-cluster".


64-64: Quickstart Namespace Updated
Namespace changed to "canary-443022.data" to match the new project structure.

distribution/build_and_upload_gcp_artifacts.sh (3)

44-45: New Build Steps Added
Added sbt commands for Flink and Service assemblies.


49-50: Jar Path Assignments
FLINK_JAR and SERVICE_JAR variables are assigned; verify the paths align with your build outputs.


62-70: Jar Existence Checks
Checks for SERVICE_JAR and FLINK_JAR help catch build failures early.

distribution/run_zipline_quickstart.sh (1)

20-27: Git Config Checkout
Git clone and branch checkout steps look correct; ensure SSH key access is properly configured.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

frontend/src/routes/joins/+page.server.ts (1)

1-4: LGTM! Clean refactor to use centralized config loader.

Consistent with other route handlers.

frontend/src/routes/models/+page.server.ts (1)

1-4: LGTM! Follows the established pattern.

Identical to other route handlers.

frontend/src/routes/groupbys/+page.server.ts (1)

1-4: LGTM! Maintains consistency with other routes.

Same implementation as other handlers.

frontend/src/routes/stagingqueries/+page.server.ts (1)

1-4: LGTM! Completes the unified approach.

Matches the pattern used in other routes.

Let's verify the implementation of the shared loadConfList function:

✅ Verification successful

LoadConfList Implementation Verified
The implementation of loadConfList in frontend/src/lib/server/conf-loader.ts matches the expected unified approach. The function correctly extracts the pathname, finds the corresponding entity, handles missing matches, wraps the API call in a try/catch, and returns the required structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of loadConfList
ast-grep --pattern 'export function loadConfList($_) {
  $$$
}'

Length of output: 2103

Comment on lines 77 to 78
val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
val resultJson = gson.toJson(result.head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add null safety checks.

Direct use of .head could throw if empty. Consider using .headOption or handling empty results.

-    val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
-    val resultJson = gson.toJson(result.head)
+    val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
+    val resultJson = result.headOption.map(gson.toJson).getOrElse("[]")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
val resultJson = gson.toJson(result.head)
val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
val resultJson = result.headOption.map(gson.toJson).getOrElse("[]")

Comment on lines 19 to 22
val resList = cu.performSql(CommonScalarsRow)
assertEquals(resList.size, 1)
val resMap = resList.head
assertEquals(resMap.size, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Empty results are not handled in performSql implementations

  • The CatalystUtil.scala implementations of performSql don't include an empty-result check.
  • The test in CatalystUtilHiveUDFTest.scala assumes a non-empty result by calling head, which can lead to runtime errors.
🔗 Analysis chain

Verify error handling for empty results.

The code assumes the list will always have at least one element.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling in performSql implementation
ast-grep --pattern 'def performSql($$$) {
  $$$
  if ($results.isEmpty) {
    $$$
  }
  $$$
}'

Length of output: 96


Script:

#!/bin/bash
# Show context of performSql definition.
rg -A 12 "def performSql\(" --no-heading

echo "-----"

# Look for any empty results error handling
rg "if\s*\(.*\.isEmpty\)" --no-heading -n

Length of output: 13789

Comment on lines 38 to 68
parsedSchema.schemaType() match {
case AvroSchema.TYPE =>
val schema = parsedSchema.asInstanceOf[AvroSchema]
AvroDeserializationSupport.build(topicInfo.name, schema.canonicalString(), schemaRegistryWireFormat = true)
case _ => throw new IllegalArgumentException(s"Unsupported schema type: ${parsedSchema.schemaType()}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Plan for more schema types.
This code currently only handles Avro. If support for other formats is needed later, consider a pluggable design.

Comment on lines 36 to 41
def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {
val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
val adminClient = AdminClient.create(props)
val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic);
topicDescription.get().partitions().size()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Close admin client.
Ensure the adminClient is closed to free resources.

 def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {
   val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
   val adminClient = AdminClient.create(props)
   val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic)
   val partitions = topicDescription.get().partitions().size()
+  adminClient.close()
   partitions
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {
val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
val adminClient = AdminClient.create(props)
val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic);
topicDescription.get().partitions().size()
}
def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {
val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
val adminClient = AdminClient.create(props)
val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic)
val partitions = topicDescription.get().partitions().size()
adminClient.close()
partitions
}

Comment on lines 63 to 64
executionContextOverride: ExecutionContext = null)
extends MetadataStore(kvStore, metaDataSet, timeoutMillis, executionContextOverride)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid null ExecutionContext.

Use a default or require one:

-def executionContextOverride: ExecutionContext = null
+def executionContextOverride: ExecutionContext = ExecutionContext.global

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 50 to 75
def createTable(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String): (String => Unit) => Unit = {

def inner(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String)(sqlEvaluator: String => Unit) = {
val creationSql =
createTableSql(tableName,
df.schema,
partitionColumns,
tableProperties,
fileFormatString(fileFormat),
createTableTypeString)

sqlEvaluator(creationSql)

}

inner(df, tableName, partitionColumns, tableProperties, fileFormat)

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for SQL execution.

The SQL execution could fail. Consider wrapping the sqlEvaluator call in a Try block.

 def inner(df: DataFrame,
           tableName: String,
           partitionColumns: Seq[String],
           tableProperties: Map[String, String],
           fileFormat: String)(sqlEvaluator: String => Unit) = {
   val creationSql =
     createTableSql(tableName,
                    df.schema,
                    partitionColumns,
                    tableProperties,
                    fileFormatString(fileFormat),
                    createTableTypeString)

-  sqlEvaluator(creationSql)
+  Try(sqlEvaluator(creationSql)) match {
+    case Success(_) => logger.info(s"Successfully created table $tableName")
+    case Failure(e) => 
+      logger.error(s"Failed to create table $tableName", e)
+      throw e
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def createTable(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String): (String => Unit) => Unit = {
def inner(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String)(sqlEvaluator: String => Unit) = {
val creationSql =
createTableSql(tableName,
df.schema,
partitionColumns,
tableProperties,
fileFormatString(fileFormat),
createTableTypeString)
sqlEvaluator(creationSql)
}
inner(df, tableName, partitionColumns, tableProperties, fileFormat)
}
def createTable(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String): (String => Unit) => Unit = {
def inner(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String)(sqlEvaluator: String => Unit) = {
val creationSql =
createTableSql(tableName,
df.schema,
partitionColumns,
tableProperties,
fileFormatString(fileFormat),
createTableTypeString)
Try(sqlEvaluator(creationSql)) match {
case Success(_) => logger.info(s"Successfully created table $tableName")
case Failure(e) =>
logger.error(s"Failed to create table $tableName", e)
throw e
}
}
inner(df, tableName, partitionColumns, tableProperties, fileFormat)
}

Comment on lines 129 to 138
val keyMapList =
if (args.keyJson.isDefined) {
Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))
} else {
logger.info(s"Reading requests from ${args.keyJsonFile()}")
val file = Source.fromFile(args.keyJsonFile())
val mapList = file.getLines().map(json => readMap(json)).toList
file.close()
mapList
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add input validation for JSON files.

Validate JSON file path and content before processing.

 val keyMapList =
   if (args.keyJson.isDefined) {
+    validateJson(args.keyJson())
     Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))
   } else {
     logger.info(s"Reading requests from ${args.keyJsonFile()}")
+    validateFilePath(args.keyJsonFile())
     val file = Source.fromFile(args.keyJsonFile())
     val mapList = file.getLines().map(json => readMap(json)).toList
     file.close()
     mapList
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val keyMapList =
if (args.keyJson.isDefined) {
Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))
} else {
logger.info(s"Reading requests from ${args.keyJsonFile()}")
val file = Source.fromFile(args.keyJsonFile())
val mapList = file.getLines().map(json => readMap(json)).toList
file.close()
mapList
}
val keyMapList =
if (args.keyJson.isDefined) {
validateJson(args.keyJson())
Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))
} else {
logger.info(s"Reading requests from ${args.keyJsonFile()}")
validateFilePath(args.keyJsonFile())
val file = Source.fromFile(args.keyJsonFile())
val mapList = file.getLines().map(json => readMap(json)).toList
file.close()
mapList
}

Comment on lines 81 to 88
def impl(props: Map[String, String]): Api = {
val urls = Array(new File(onlineJar()).toURI.toURL)
val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)
val cls = cl.loadClass(onlineClass())
val constructor = cls.getConstructors.apply(0)
val onlineImpl = constructor.newInstance(props)
onlineImpl.asInstanceOf[Api]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement security measures for class loading.

Current implementation has security risks:

  • Unrestricted class loading
  • No validation of jar path

Add security checks:

 def impl(props: Map[String, String]): Api = {
+  // Validate jar path is in allowed directory
+  val allowedPath = new File("/path/to/allowed/jars")
+  val jarFile = new File(onlineJar())
+  if (!jarFile.getCanonicalPath.startsWith(allowedPath.getCanonicalPath)) {
+    throw new SecurityException("Invalid jar location")
+  }
   val urls = Array(new File(onlineJar()).toURI.toURL)
   val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)
+  // Validate class name against allowlist
+  if (!allowedClasses.contains(onlineClass())) {
+    throw new SecurityException("Invalid class")
+  }
   val cls = cl.loadClass(onlineClass())
   val constructor = cls.getConstructors.apply(0)
   val onlineImpl = constructor.newInstance(props)
   onlineImpl.asInstanceOf[Api]
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def impl(props: Map[String, String]): Api = {
val urls = Array(new File(onlineJar()).toURI.toURL)
val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)
val cls = cl.loadClass(onlineClass())
val constructor = cls.getConstructors.apply(0)
val onlineImpl = constructor.newInstance(props)
onlineImpl.asInstanceOf[Api]
}
def impl(props: Map[String, String]): Api = {
// Validate jar path is in allowed directory
val allowedPath = new File("/path/to/allowed/jars")
val jarFile = new File(onlineJar())
if (!jarFile.getCanonicalPath.startsWith(allowedPath.getCanonicalPath)) {
throw new SecurityException("Invalid jar location")
}
val urls = Array(new File(onlineJar()).toURI.toURL)
val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)
// Validate class name against allowlist
if (!allowedClasses.contains(onlineClass())) {
throw new SecurityException("Invalid class")
}
val cls = cl.loadClass(onlineClass())
val constructor = cls.getConstructors.apply(0)
val onlineImpl = constructor.newInstance(props)
onlineImpl.asInstanceOf[Api]
}

Comment on lines 44 to 47
class MetadataStore(kvStore: KVStore,
val dataset: String = MetadataDataset,
timeoutMillis: Long,
executionContextOverride: ExecutionContext = null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use Option instead of null default value.

Replace executionContextOverride: ExecutionContext = null with executionContextOverride: Option[ExecutionContext] = None.

-class MetadataStore(kvStore: KVStore,
-                    val dataset: String = MetadataDataset,
-                    timeoutMillis: Long,
-                    executionContextOverride: ExecutionContext = null) {
+class MetadataStore(kvStore: KVStore,
+                    val dataset: String = MetadataDataset,
+                    timeoutMillis: Long,
+                    executionContextOverride: Option[ExecutionContext] = None) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class MetadataStore(kvStore: KVStore,
val dataset: String = MetadataDataset,
timeoutMillis: Long,
executionContextOverride: ExecutionContext = null) {
class MetadataStore(kvStore: KVStore,
val dataset: String = MetadataDataset,
timeoutMillis: Long,
executionContextOverride: Option[ExecutionContext] = None) {

Comment on lines 62 to 63
implicit val executionContext: ExecutionContext =
Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Update Option handling.

Update to use idiomatic Option handling.

-  implicit val executionContext: ExecutionContext =
-    Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)
+  implicit val executionContext: ExecutionContext =
+    executionContextOverride.getOrElse(FlexibleExecutionContext.buildExecutionContext)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
implicit val executionContext: ExecutionContext =
Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)
implicit val executionContext: ExecutionContext =
executionContextOverride.getOrElse(FlexibleExecutionContext.buildExecutionContext)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (14)
api/py/ai/chronon/cli/compile/serializer.py (4)

17-20: Remove unused import.

TJSONProtocolFactory is imported but never used.

-from thrift.protocol.TJSONProtocol import (
-    TSimpleJSONProtocolFactory,
-    TJSONProtocolFactory,
-)
+from thrift.protocol.TJSONProtocol import TSimpleJSONProtocolFactory
🧰 Tools
🪛 Ruff (0.8.2)

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)


78-82: Optimize integer type handling.

Combine integer type checks for better readability.

-        elif ttype == TType.I64:
-            ret = int(val)
-        elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE:
-            ret = int(val)
+        elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE):
+            ret = int(val)
🧰 Tools
🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)


41-86: Add input validation.

Add type checking for input values to prevent runtime errors.

     def _convert(self, val, ttype, ttype_info):
+        if val is None:
+            return None
         if ttype == TType.STRUCT:
🧰 Tools
🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)


93-99: Add docstring explaining binary format.

Document the binary format specification for better maintainability.

 def json2binary(json_str, thrift_class):
+    """Convert JSON to Thrift binary format.
+    
+    Args:
+        json_str: JSON string to convert
+        thrift_class: Target Thrift class
+    Returns:
+        bytes: Thrift binary format using TBinaryProtocolAccelerated
+    """
     thrift = json2thrift(json_str, thrift_class)
api/py/ai/chronon/cli/compile/compile_context.py (2)

2-2: Remove unused error import.
Line 2 imports error from logging but never uses it. Trim unnecessary imports to keep the code clean.

Apply this diff:

-from logging import error
🧰 Tools
🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Rename unused loop variable.
Rename sub_dirs to _sub_dirs to indicate it’s unused.

 for sub_root, sub_dirs, sub_files in os.walk(output_dir):
-    for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+    for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/fill_templates.py (1)

4-4: Remove unused import.
The import parse_teams is never referenced. It’s best to remove it.

-import ai.chronon.cli.compile.parse_teams as teams
🧰 Tools
🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_configs.py (1)

4-4: Remove unused imports.

-import logging
 from typing import Any, Dict, List, Tuple
-import ai.chronon.cli.compile.parse_teams as teams

Also applies to: 6-6, 9-9

🧰 Tools
🪛 Ruff (0.8.2)

4-4: logging imported but unused

Remove unused import: logging

(F401)

api/py/ai/chronon/cli/entrypoint.py (2)

2-2: Remove unused imports.

-import sys
-from enum import Enum

Also applies to: 5-5

🧰 Tools
🪛 Ruff (0.8.2)

2-2: sys imported but unused

Remove unused import: sys

(F401)


49-53: Consider using explicit flags instead of scope.

Replace scope with explicit run_upstream and run_downstream flags for clarity.

api/py/test/test_git_utils.py (1)

5-5: Remove unused import.

-from pathlib import Path
🧰 Tools
🪛 Ruff (0.8.2)

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

api/thrift/api.thrift (1)

254-254: Consider documenting the deprecation reason.

Add a comment explaining why the env field is deprecated.

api/py/ai/chronon/cli/compile/validator.py (1)

421-428: Consider simplifying nested if statements.

Combine the nested if statements into a single condition.

-        if join.metaData.online:
-            if offline_included_group_bys:
+        if join.metaData.online and offline_included_group_bys:
🧰 Tools
🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)

.github/workflows/test_scala_no_spark.yaml (1)

45-47: Consider removing unused SBT configuration.

The SBT_OPTS environment variable is no longer needed with Bazel.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 30aed3c and dc233e9.

📒 Files selected for processing (26)
  • .github/image/Dockerfile (2 hunks)
  • .github/workflows/test_scala_no_spark.yaml (1 hunks)
  • api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
  • api/py/ai/chronon/cli/compile/serializer.py (1 hunks)
  • api/py/ai/chronon/cli/compile/validator.py (12 hunks)
  • api/py/ai/chronon/cli/entrypoint.py (1 hunks)
  • api/py/ai/chronon/cli/git_utils.py (1 hunks)
  • api/py/ai/chronon/cli/logger.py (1 hunks)
  • api/py/ai/chronon/cli/logging.py (1 hunks)
  • api/py/ai/chronon/group_by.py (0 hunks)
  • api/py/ai/chronon/join.py (0 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/test/sample/teams.py (1 hunks)
  • api/py/test/test_git_utils.py (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
  • api/thrift/api.thrift (4 hunks)
  • api/thrift/common.thrift (1 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (8 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (9 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
  • tools/build_rules/thrift/thrift.bzl (0 hunks)
💤 Files with no reviewable changes (3)
  • api/py/ai/chronon/join.py
  • api/py/ai/chronon/group_by.py
  • tools/build_rules/thrift/thrift.bzl
✅ Files skipped from review due to trivial changes (2)
  • hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala
  • hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala
🚧 Files skipped from review as they are similar to previous changes (4)
  • api/thrift/common.thrift
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
  • api/py/ai/chronon/cli/logger.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/teams.py

8-8: Undefined name EnvironmentVariables

(F821)


45-45: Undefined name EnvironmentVariables

(F821)


66-66: Undefined name EnvironmentVariables

(F821)

api/py/ai/chronon/cli/compile/parse_teams.py

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)


33-33: Do not use bare except

(E722)


73-73: Do not use bare except

(E722)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/validator.py

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compiler.py

20-20: Undefined name console

(F821)


25-25: Undefined name console

(F821)


28-28: Undefined name console

(F821)


28-28: f-string without any placeholders

Remove extraneous f prefix

(F541)


48-48: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/ai/chronon/cli/compile/fill_templates.py

4-4: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/parse_configs.py

4-4: logging imported but unused

Remove unused import: logging

(F401)


6-6: typing.Dict imported but unused

Remove unused import

(F401)


6-6: typing.Tuple imported but unused

Remove unused import

(F401)


9-9: ai.chronon.cli.compile.parse_teams imported but unused

Remove unused import: ai.chronon.cli.compile.parse_teams

(F401)

api/py/ai/chronon/cli/compile/serializer.py

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)


78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/ai/chronon/cli/entrypoint.py

2-2: sys imported but unused

Remove unused import: sys

(F401)


5-5: enum.Enum imported but unused

Remove unused import: enum.Enum

(F401)


17-17: Do not use bare except

(E722)

api/py/test/test_git_utils.py

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: build-and-push
🔇 Additional comments (40)
api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: ⚠️ Potential issue

Fix inverted logic for ignoring hidden files.
Currently, the code processes only hidden files. Correct it to skip hidden files.

-if not f.startswith("."):  # ignore hidden files - such as .DS_Store
+if f.startswith("."):  # ignore hidden files - such as .DS_Store
     continue

Likely invalid or redundant comment.

api/py/ai/chronon/cli/logging.py (1)

37-39: Prevent duplicate log handlers.
Clear existing handlers when creating a logger to avoid multiple handler attachments.

 logger = logging.getLogger(__name__)
+logger.handlers.clear()
 logger.addHandler(handler)
 logger.setLevel(log_level)
api/py/ai/chronon/cli/compile/parse_configs.py (2)

14-20: LGTM! Well-structured dataclass.

The CompiledObj dataclass effectively encapsulates compilation results with clear field definitions.


54-55: ⚠️ Potential issue

Add path validation.

 module_name = rel_path_without_extension.replace("/", ".")
+if not all(part.isidentifier() for part in module_name.split(".")):
+    raise ValueError(f"Invalid module name: {module_name}")
+if ".." in module_name:
+    raise ValueError(f"Path traversal detected in: {module_name}")
 conf_type, team_name, rest = module_name.split(".", 2)

Likely invalid or redundant comment.

api/py/ai/chronon/cli/entrypoint.py (1)

17-18: ⚠️ Potential issue

Specify exception type.

-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

17-17: Do not use bare except

(E722)

api/py/test/sample/teams.py (1)

1-1: ⚠️ Potential issue

Add missing import.

-from ai.chronon.api.ttypes import Team
+from ai.chronon.api.ttypes import Team, EnvironmentVariables

Likely invalid or redundant comment.

api/py/test/test_git_utils.py (2)

13-24: LGTM! Well-structured fixture.

The git_repo fixture properly handles setup and cleanup.


26-83: LGTM! Comprehensive test coverage.

Test cases thoroughly verify Git utility functions with proper assertions.

api/py/ai/chronon/cli/compile/parse_teams.py (5)

13-37: LGTM!

The function correctly loads and validates team configurations.


40-71: LGTM!

The function correctly updates and validates metadata.


74-90: LGTM!

The function correctly merges dictionaries with proper override behavior.


92-111: LGTM!

The function correctly merges environment variables.


4-4: 🛠️ Refactor suggestion

Remove unused import.

-from ai.chronon.cli.compile.parse_configs import from_file

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused

Remove unused import: ai.chronon.cli.compile.parse_configs.from_file

(F401)

api/py/ai/chronon/cli/compile/compiler.py (1)

86-88: ⚠️ Potential issue

Fix success ratio calculation.

The denominator should use compiled_objects instead of compile_result.obj_dict.

-                f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}"
+                f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"

Likely invalid or redundant comment.

api/py/ai/chronon/cli/git_utils.py (6)

44-59: LGTM!

The function correctly retrieves and validates the merge base.


61-68: LGTM!

The function correctly retrieves file content from a specific commit.


77-127: LGTM!

The function correctly retrieves and validates changed files.


130-157: LGTM!

The function correctly identifies and validates changes since fork point.


33-34: 🛠️ Refactor suggestion

Specify exception type.

-        except:
+        except (OSError, IOError):
             pass

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

33-33: Do not use bare except

(E722)


73-74: 🛠️ Refactor suggestion

Specify exception type.

-    except:
+    except (OSError, IOError):
         return None

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

73-73: Do not use bare except

(E722)

api/py/ai/chronon/repo/compile.py (2)

225-277: LGTM!

The function correctly validates and writes objects with proper error handling.


279-291: LGTM!

The function correctly serializes and writes objects as JSON.

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (4)

58-61: LGTM! Array formatting improvements.

The consistent array formatting enhances readability.

Also applies to: 85-88, 110-114, 146-149, 220-223, 248-252, 288-291, 306-309, 327-331, 342-346, 365-369


317-334: LGTM! Good test coverage for edge cases.

Test case for Long.MAX_VALUE and magicNullLong values improves robustness.


336-350: LGTM! Good test coverage for null values.

Test case for handling all null Long values ensures consistent behavior.


352-373: LGTM! Good test coverage for mixed values.

Test case for mixed null and non-null Long fields improves completeness.

api/thrift/api.thrift (2)

441-446: LGTM! Well-structured environment variables.

Clean separation of environment variables by execution mode.


448-457: LGTM! Good team configuration structure.

Team struct provides essential fields for team management.

api/py/ai/chronon/cli/compile/validator.py (3)

183-191: LGTM! Good encapsulation of topic checks.

Clean helper function for checking topics in sources.


193-195: LGTM! Clear group-by topic check.

Simple and effective check for topics in group-by sources.


197-213: LGTM! Well-structured window check.

Clear logic for checking hourly windows in group-by aggregations.

.github/workflows/test_scala_no_spark.yaml (1)

51-51: LGTM! Clean transition to Bazel.

Consistent use of Bazel test commands across all modules.

Also applies to: 55-55, 59-59, 63-63, 67-67, 71-71, 75-75

.github/image/Dockerfile (8)

1-1: Base image update.
Switched from Alpine to Ubuntu.


6-6: Updated install command.
Using apt to install tools on Ubuntu.


22-34: Thrift dependencies.
Updated package names (e.g., build-essential, libboost-dev).


35-42: Thrift build.
Build commands and cleanup for Thrift look sound.


44-57: Cleanup commands.
Uses apt purge and apt autoremove appropriately.


60-60: Python packages install.
Installs testing/linting tools with the system break flag.


68-70: ARM Bazelisk setup.
Downloads Bazelisk for ARM and sets proper permissions.


71-73: JVM symlink.
Installs build-essential and creates JVM symlink for ARM.

Comment on lines 19 to 28
# Create a spinner with custom text
with console.status("[bold green]Loading...", spinner="dots") as status:

# Simulate some work
for i in range(3):
time.sleep(1)
console.log(f"{i + 1}. Completed task")

time.sleep(1)
console.log(f"Done!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Move spinner implementation after console initialization.

The console variable is used before it's imported and initialized. Move this code block after line 46.

🧰 Tools
🪛 Ruff (0.8.2)

20-20: Undefined name console

(F821)


25-25: Undefined name console

(F821)


28-28: Undefined name console

(F821)


28-28: f-string without any placeholders

Remove extraneous f prefix

(F541)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/image/Dockerfile (1)

71-73: Redundant Environment Setup
Reinstalling build-essential and re-linking JAVA may be redundant. Double-check if necessary.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between dc233e9 and 8c8c93c.

📒 Files selected for processing (1)
  • .github/image/Dockerfile (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (6)
.github/image/Dockerfile (6)

1-1: Base Image Update
Using Ubuntu as the base is clear.


6-6: APT Packages Installed
APT command is correct for Ubuntu.


22-34: Thrift Dependencies Setup
APT installs required build deps; verify the package list meets build needs.


35-42: Thrift Build Process
Thrift is built from source with minimal flags. Confirm these options suit requirements.


44-57: Cleanup Commands
APT purge, autoremove, and list removal are efficient. Ensure no runtime libs are removed.


60-60: Python Package Installation
The pip install command with --break-system-packages is acceptable.

Comment on lines 17 to 18
RUN curl -fsSL "https://github.com/bazelbuild/bazelisk/releases/download/v1.18.0/bazelisk-linux-aarch64" -o /usr/local/bin/bazel
ENV PATH="/usr/local/bin/bazel:${PATH}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Bazel Installation Conflict
Two installations (v1.18.0 via curl and v1.19.0 via ADD) may conflict. Consolidate to one.

Comment on lines 68 to 70
RUN chmod 755 /usr/local/bin/bazel

RUN apt update && apt install -y build-essential
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Bazelisk Redundancy
Using ADD for Bazelisk v1.19.0 after initial curl download causes redundancy.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/image/Dockerfile (1)

1-1: Pin Ubuntu Version.
Using ubuntu:latest risks unpredictable builds. Pin to a specific version (e.g., ubuntu:22.04).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 8c8c93c and 136dfbc.

📒 Files selected for processing (1)
  • .github/image/Dockerfile (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: table_utils_delta_format_spark_tests
  • GitHub Check: mutation_spark_tests
  • GitHub Check: join_spark_tests
  • GitHub Check: fetcher_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: build-and-push
🔇 Additional comments (7)
.github/image/Dockerfile (7)

6-6: Clear package installation.
Command is concise; verify all packages are needed.


18-18: Bazel PATH update.
Ensure Bazel binary is correctly prioritized. Already flagged in past reviews.


22-34: Thrift dependencies setup.
Apt install block is clear and complete.


35-42: Thrift build & cleanup.
Tar extraction and removal of temp files are well-handled.


44-57: Post-build cleanup.
Proper use of apt purge, autoremove, and list cleanup.


60-60: Python tools installation.
Pip installs essential testing and linting tools correctly.


68-70: Reinstall build-essential & JAVA symlink.
Reinstalling build-essential after cleanup and creating the JAVA symlink for ARM are sound.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.github/image/Dockerfile (3)

1-1: Base Image Update.
Consider pinning a specific Ubuntu version instead of latest for reproducibility.


6-6: APT Installation.
APT update/install looks fine; consider cleaning the cache to reduce image size.


18-18: PATH Update.
Multiple ENV PATH modifications—verify the ordering is as intended.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 136dfbc and a4aac43.

📒 Files selected for processing (2)
  • .github/image/Dockerfile (2 hunks)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-and-push
🔇 Additional comments (6)
.github/workflows/test_scala_spark.yaml (1)

40-40: Bazel test command is concise.
The Bazel command looks correct if the target exists.

.github/image/Dockerfile (5)

22-34: Thrift Dependencies.
APT dependency block is clear; consider non-interactive options if needed.


35-43: Thrift Build & Cleanup.
Build and cleanup commands look solid. Confirm the configure flags meet requirements.


44-57: APT Purge & Cleanup.
Properly removes build dependencies. Reinstalling build-essential later appears intentional.


60-60: Pip Installation.
Using --break-system-packages is acceptable; ensure compatibility with your environment.


68-71: Post-Cleanup Setup.
Reinstalling build-essential and setting up the Java symlink are clear—verify their necessity post-purge.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
.github/workflows/test_scala_spark.yaml (1)

25-25: Invalid runner label.
Change ubuntu-8_cores-32_gb to a supported label (e.g. ubuntu-latest-8-cores).

🧰 Tools
🪛 actionlint (1.7.4)

25-25: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🧹 Nitpick comments (4)
.github/workflows/test_scala_spark.yaml (1)

46-46: Remove trailing space.
Trim extra whitespace at the end of the --google_credentials line.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 46-46: trailing spaces

(trailing-spaces)

.github/image/Dockerfile (2)

59-59: Consider version pinning.
Pin versions for pytest, tox, and flake8 for better reproducibility.


68-69: Verify JAVA symlink.
Ensure the symlink for JAVA_HOME works reliably across environments.

.github/workflows/test_scala_no_spark.yaml (1)

53-53: Remove trailing spaces.
Trim all trailing whitespace in the Bazel command lines.

Also applies to: 60-60, 67-67, 74-74, 81-81, 88-88, 95-95

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 53-53: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a4aac43 and dbf7cc1.

📒 Files selected for processing (9)
  • .github/image/Dockerfile (3 hunks)
  • .github/workflows/build_and_push_docker.yaml (1 hunks)
  • .github/workflows/test_scala_no_spark.yaml (1 hunks)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (1 hunks)
  • tools/build_rules/thrift/thrift.bzl (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala
🚧 Files skipped from review as they are similar to previous changes (2)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
  • tools/build_rules/thrift/thrift.bzl
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/build_and_push_docker.yaml

15-15: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml

[error] 46-46: trailing spaces

(trailing-spaces)

.github/workflows/test_scala_no_spark.yaml

[error] 53-53: trailing spaces

(trailing-spaces)


[error] 60-60: trailing spaces

(trailing-spaces)


[error] 67-67: trailing spaces

(trailing-spaces)


[error] 74-74: trailing spaces

(trailing-spaces)


[error] 81-81: trailing spaces

(trailing-spaces)


[error] 88-88: trailing spaces

(trailing-spaces)


[error] 95-95: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: build-and-push

jobs:
build-and-push:
runs-on: ubuntu-latest
runs-on: ubuntu-8_cores-32_gb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Invalid runner label.
Label ubuntu-8_cores-32_gb is not recognized. Use a valid label (e.g. ubuntu-latest-8-cores).

🧰 Tools
🪛 actionlint (1.7.4)

15-15: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/test_python.yaml (1)

37-38: Remove trailing whitespace.
Trailing whitespace detected on line 38.

-        run: |          
+        run: |
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 36d3a8b and 4979dbc.

📒 Files selected for processing (1)
  • .github/workflows/test_python.yaml (2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml

[error] 38-38: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: other_spark_tests
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
.github/workflows/test_python.yaml (2)

43-43: Explicit shell for lint step.
Using shell: bash here provides clarity.


52-52: Explicit shell for tests.
Consistent shell declaration improves readability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (11)
api/py/ai/chronon/cli/compile/parse_teams.py (2)

73-89: Simplify map merging using dictionary comprehension.

Current implementation can be more concise.

-    result = {}
-    for m in maps:
-        if m is None:
-            continue
-        for key, value in m.items():
-            result[key] = value
-    return result
+    return {k: v for m in maps if m is not None for k, v in m.items()}

91-91: Add return type hint for clarity.

Function return type should be explicit.

-def _merge_environment_variables(*env_vars: List[EnvironmentVariables]):
+def _merge_environment_variables(*env_vars: List[EnvironmentVariables]) -> EnvironmentVariables:
api/py/ai/chronon/cli/compile/compile_context.py (2)

2-2: Remove unused import

- from logging import error
🧰 Tools
🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Rename unused variable
Keep it as _sub_dirs to reflect it's unused.

- for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+ for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/conf_validator.py (4)

20-20: Remove unused import

- import os
🧰 Tools
🪛 Ruff (0.8.2)

20-20: os imported but unused

Remove unused import: os

(F401)


54-57: Inline condition

- if group_by.metaData.online or group_by.backfillStartDate:
-     return True
- else:
-     return False
+ return group_by.metaData.online or group_by.backfillStartDate
🧰 Tools
🪛 Ruff (0.8.2)

54-57: Return the condition directly

Inline condition

(SIM103)


356-359: Use single if

- if wild_card_derivation_included:
-     if derivation.expression in derived_columns:
-         derived_columns.remove(derivation.expression)
+ if wild_card_derivation_included and derivation.expression in derived_columns:
+     derived_columns.remove(derivation.expression)
🧰 Tools
🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)


421-427: Combine nested if

- if join.metaData.online:
-     if offline_included_group_bys:
-         errors.append(...)
+ if join.metaData.online and offline_included_group_bys:
+     errors.append(...)
🧰 Tools
🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/parse_configs.py (1)

5-5: Remove unused imports

- from typing import Any, Dict, List, Tuple
+ from typing import Any, List
🧰 Tools
🪛 Ruff (0.8.2)

5-5: typing.Dict imported but unused

Remove unused import

(F401)


5-5: typing.Tuple imported but unused

Remove unused import

(F401)

api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1)

78-78: Document aggregation operation value.

Operation value 3 should be documented or use a named constant for clarity.

.github/workflows/test_python.yaml (1)

38-38: Remove trailing spaces.

-        run: |          
+        run: |
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4979dbc and 82b73bd.

📒 Files selected for processing (12)
  • .github/image/Dockerfile (3 hunks)
  • .github/workflows/test_python.yaml (1 hunks)
  • api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
  • api/py/ai/chronon/cli/compile/conf_validator.py (1 hunks)
  • api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
  • api/py/ai/chronon/cli/entrypoint.py (1 hunks)
  • api/py/ai/chronon/group_by.py (0 hunks)
  • api/py/ai/chronon/join.py (0 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1 hunks)
💤 Files with no reviewable changes (2)
  • api/py/ai/chronon/join.py
  • api/py/ai/chronon/group_by.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/py/ai/chronon/repo/compile.py
  • api/py/ai/chronon/cli/compile/fill_templates.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/conf_validator.py

20-20: os imported but unused

Remove unused import: os

(F401)


54-57: Return the condition directly

Inline condition

(SIM103)


356-357: Use a single if statement instead of nested if statements

(SIM102)


421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/parse_configs.py

5-5: typing.Dict imported but unused

Remove unused import

(F401)


5-5: typing.Tuple imported but unused

Remove unused import

(F401)

api/py/ai/chronon/cli/entrypoint.py

15-15: Do not use bare except

(E722)

🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml

[error] 38-38: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
  • GitHub Check: build-and-push
🔇 Additional comments (7)
api/py/ai/chronon/cli/compile/parse_teams.py (2)

39-71: LGTM!

Good validation chain and proper environment variable merging.


31-34: Validate required Team attributes before adding to dictionary.

Add validation for required Team fields like outputNamespace.

     if isinstance(obj, Team):
+        require(obj.outputNamespace is not None, f"Team {name} missing required outputNamespace")
         obj.name = name
         logger.info(f"Found team {name}")
         team_dict[name] = obj
api/py/ai/chronon/cli/compile/parse_configs.py (1)

53-53: Check list size before splitting
Could raise ValueError if there aren’t enough parts.

 conf_parts = module_name.split(".", 2)
 if len(conf_parts) < 3:
     raise ValueError(f"Invalid module name: {module_name}")
api/py/ai/chronon/cli/entrypoint.py (4)

8-16: Catch specific exception in get_current_branch.

 def get_current_branch():
     try:
         return (
             subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
             .decode("utf-8")
             .strip()
         )
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"
🧰 Tools
🪛 Ruff (0.8.2)

15-15: Do not use bare except

(E722)


25-31: LGTM! Clean implementation of sync command.


59-64: LGTM! Clean implementation of deploy command.


46-51: 🛠️ Refactor suggestion

Replace scope with explicit run flags.

Per discussion, replace scope with explicit run_upstream and run_downstream flags for clarity.

-@click.option(
-    "--scope",
-    type=click.Choice(["upstream", "self", "downstream"]),
-    default="upstream",
-    help="Scope of configs to backfill",
-)
+@click.option("--run-upstream", is_flag=True, default=True, help="Run upstream configs")
+@click.option("--run-downstream", is_flag=True, default=False, help="Run downstream configs")

Likely invalid or redundant comment.

f"Team config file: {teams_file} not found. Please add one.",
)

team_module = importlib.import_module("teams")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use absolute import path for teams module.

Current import assumes teams.py is in PYTHONPATH. Use absolute path to avoid import errors.

-    team_module = importlib.import_module("teams")
+    team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
team_module = importlib.import_module("teams")
team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")

Comment on lines +111 to +112
if not f.startswith("."): # ignore hidden files - such as .DS_Store
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix reversed file-filter condition
Skipping all non-dot files contradicts the comment.

- if not f.startswith("."):  # ignore hidden files
-     continue
+ if f.startswith("."):  # ignore hidden files
+     continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not f.startswith("."): # ignore hidden files - such as .DS_Store
continue
if f.startswith("."): # ignore hidden files
continue

Comment on lines 67 to 68
RUN apt update && apt install -y build-essential git
RUN mkdir -p /usr/lib/jvm && ln -s /usr/lib/jvm/java-17-openjdk-amd64/ /usr/lib/jvm/default-jvm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move Java symlink creation before verification.

Java verification may fail if symlink doesn't exist. Move these lines before the verification step.

@@ -1,15 +1,9 @@
FROM alpine:latest
FROM ubuntu:latest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin Ubuntu version for reproducibility.

-FROM ubuntu:latest
+FROM ubuntu:22.04
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM ubuntu:latest
FROM ubuntu:22.04

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
api/py/test/test_utils.py (1)

175-189: Avoid redefining test function.

test_get_related_table_names_for_group_bys is already defined at line 164.

-def test_get_related_table_names_for_group_bys():
+def test_get_related_table_names_for_group_bys_different_name():
🧰 Tools
🪛 Ruff (0.8.2)

175-175: Redefinition of unused test_get_related_table_names_for_group_bys from line 164

(F811)

🧹 Nitpick comments (8)
api/py/ai/chronon/cli/compile/conf_validator.py (4)

20-20: Remove unused import.

os is not used anywhere. Consider removing it.

-import os
🧰 Tools
🪛 Ruff (0.8.2)

20-20: os imported but unused

Remove unused import: os

(F401)


54-57: Inline the condition.

Directly return the boolean expression instead of using if-else.

-def _is_batch_upload_needed(group_by: GroupBy) -> bool:
-    if group_by.metaData.online or group_by.backfillStartDate:
-        return True
-    else:
-        return False
+def _is_batch_upload_needed(group_by: GroupBy) -> bool:
+    return group_by.metaData.online or group_by.backfillStartDate
🧰 Tools
🪛 Ruff (0.8.2)

54-57: Return the condition directly

Inline condition

(SIM103)


356-357: Merge nested if statements.

Combine to simplify and avoid redundancy.

-if wild_card_derivation_included:
-    if derivation.expression in derived_columns:
+if wild_card_derivation_included and derivation.expression in derived_columns:
     derived_columns.remove(derivation.expression)
🧰 Tools
🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)


421-422: Combine nested checks.

Shorten by joining conditions into a single if.

-if join.metaData.online:
-    if offline_included_group_bys:
+if join.metaData.online and offline_included_group_bys:
     errors.append(...)
🧰 Tools
🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compile_context.py (2)

2-2: Remove unused import.

logging.error is never used.

-from logging import error
🧰 Tools
🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Rename unused variable.

sub_dirs is not used in the loop. Rename it to _sub_dirs.

-for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/compiler.py (2)

14-14: Remove unused import.

time is not used.

-import time
🧰 Tools
🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)


36-36: Remove unused variable.

status isn't used inside the context.

-with console.status("[bold green]Loading...", spinner="dots") as status:
+with console.status("[bold green]Loading...", spinner="dots"):
🧰 Tools
🪛 Ruff (0.8.2)

36-36: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 82b73bd and 19659a1.

📒 Files selected for processing (4)
  • api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/conf_validator.py (1 hunks)
  • api/py/test/test_utils.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/conf_validator.py

20-20: os imported but unused

Remove unused import: os

(F401)


54-57: Return the condition directly

Inline condition

(SIM103)


356-357: Use a single if statement instead of nested if statements

(SIM102)


421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/compiler.py

14-14: time imported but unused

Remove unused import: time

(F401)


36-36: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/test/test_utils.py

175-175: Redefinition of unused test_get_related_table_names_for_group_bys from line 164

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: other_spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: ⚠️ Potential issue

Fix reversed file-filter condition.

To skip hidden files, invert the check:

-if not f.startswith("."):  # ignore hidden files - such as .DS_Store
+if f.startswith("."):  # ignore hidden files - such as .DS_Store
    continue

Likely invalid or redundant comment.

api/py/ai/chronon/cli/compile/compiler.py (1)

74-75: ⚠️ Potential issue

Fix success ratio calculation.

It's showing X/X instead of X/Total.

-    f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}"
+    f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
api/py/test/test_utils.py (2)

171-174: Remove debug print and add error handling.

 def dopen(path):
     full_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), path)
-    print(full_path)
-    return open(full_path)
+    try:
+        return open(full_path)
+    except FileNotFoundError as e:
+        raise FileNotFoundError(f"Could not open {full_path}: {e}")

270-270: Remove debug print statement.

-    print("Current working directory:", os.getcwd())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 19659a1 and 6a6921e.

📒 Files selected for processing (4)
  • api/py/test/test_utils.py (10 hunks)
  • api/py/test/test_validator.py (3 hunks)
  • api/py/tox.ini (1 hunks)
  • tools/build_rules/thrift/thrift.bzl (0 hunks)
💤 Files with no reviewable changes (1)
  • tools/build_rules/thrift/thrift.bzl
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/test_utils.py

189-189: Redefinition of unused test_get_related_table_names_for_group_bys from line 177

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: other_spark_tests
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (7)
api/py/tox.ini (2)

8-8: Update allowlist_externals: Added mkdir and cp to support new commands.


13-20: New compile commands: Two compile.py calls with distinct --input_path values and subsequent directory creation and copy. Ensure the execution order and output paths align with expectations.

api/py/test/test_validator.py (4)

26-30: LGTM: Improved path handling

Dynamic path generation makes tests more portable.


36-39: LGTM: Enhanced readability

Clean formatting of list comprehensions.

Also applies to: 44-51, 56-64


74-76: LGTM: Better function signature formatting

Improved readability by splitting long lines.

Also applies to: 83-85, 93-95, 102-104


138-140: LGTM: Clean import formatting

Better readability with properly split import statements.

Also applies to: 161-163

api/py/test/test_utils.py (1)

99-337: Test coverage looks comprehensive!

Good job on covering various scenarios for group_bys, joins, and table name utilities.

🧰 Tools
🪛 Ruff (0.8.2)

189-189: Redefinition of unused test_get_related_table_names_for_group_bys from line 177

(F811)

Comment on lines 177 to 188
def test_get_related_table_names_for_group_bys(
group_by_requiring_backfill,
online_group_by_requiring_streaming
group_by_requiring_backfill, online_group_by_requiring_streaming
):
with open('test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1') as conf_file:
with dopen(
"sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1"
) as conf_file:
json = conf_file.read()
group_by = json2thrift(json, api.GroupBy)
tables = utils.get_related_table_names(group_by)
assert any(table.endswith("_upload") for table in tables)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix duplicate test function name.

Two functions with the same name will cause the first one to be ignored.

-def test_get_related_table_names_for_group_bys(
+def test_get_related_table_names_for_group_bys_with_streaming(
     group_by_requiring_backfill, online_group_by_requiring_streaming
 ):

Also applies to: 189-205

nikhil-zlai and others added 13 commits February 4, 2025 10:56
[Ticket](https://app.asana.com/0/0/1209184028451140)

This PR includes both frontend and backend updates. Since @nikhil-zlai
and I reviewed the high-level changes today, it might be helpful for
@sean-zlai and me to go over them during our sync tomorrow. There are
some key points I'd like to highlight, but I wanted to get this up early
for review.

**Reminder:** @sean-zlai will handle merging the new drift/summary
functionality into the observability frontend.

- Added new endpoints for Conf, Confs, search, drift, summary, etc.,
currently called on the `/thrift` route for demonstration. The old
endpoints can be removed once we wire in the new ones.
- Created `LogicalNodeTable.svelte` to display all `ConfTypes`,
including empty states.
- Updated search functionality to work for all `ConfTypes`, though only
`risk.user_transactions.txn_join` is enabled for now.

- Implemented new endpoints using `RouteHandlerWrapper`.
- Removed old endpoints (except `/timeseries`, which is still needed for
charts).
- Separated drift and summary into two distinct endpoints.
- Introduced:
  - `DriftHandler` (handles drift and summaries).
  - `ConfHandler` (handles Conf, Confs, and Conf search).
- Added filtering by column to `DriftStore`.
- Improved and added various tests.
- Developed a new serializer that integrates well with
`RouteHandlerWrapper`.

Let me know if you have any questions!

- [x] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

- **New Features**
- Added configuration management capabilities across joins, models,
group bys, and staging queries.
	- Introduced drift analysis and summary tracking for configurations.
	- Enhanced API endpoints for retrieving and searching configurations.
	- New `LogicalNodeTable` component for displaying configuration data.
- Added new structures and enumerations for handling configuration
requests in the API.

- **Improvements**
	- Refactored API response handling to use JSON serialization.
- Improved null value handling in data processing with predefined
constants.
	- Updated type definitions for better type safety.
	- Streamlined configuration retrieval and search mechanisms.
	- Enhanced handling of search results by categorizing entity types.

- **Bug Fixes**
- Resolved issues with null value representation in long and double
iterators.
	- Improved error handling in API request processing.

- **Breaking Changes**
	- Removed previous model and join metadata endpoints.
	- Updated API method signatures for configuration retrieval.
	- Eliminated deprecated response types related to joins and models.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
api/py/ai/chronon/cli/compile/compiler.py (1)

74-76: ⚠️ Potential issue

Fix success ratio calculation.

The success ratio uses the same value for numerator and denominator.

Apply this diff to fix the calculation:

-                f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}"
+                f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"
api/py/ai/chronon/cli/git_utils.py (2)

33-34: 🛠️ Refactor suggestion

Use specific exception type.

The bare except clause could hide important errors.

Apply this diff to use a more specific exception type:

-        except:
+        except OSError:
            pass
🧰 Tools
🪛 Ruff (0.8.2)

33-33: Do not use bare except

(E722)


73-74: 🛠️ Refactor suggestion

Use specific exception type.

The bare except clause could hide important errors.

Apply this diff to use a more specific exception type:

-    except:
+    except OSError:
        return None
🧰 Tools
🪛 Ruff (0.8.2)

73-73: Do not use bare except

(E722)

.github/image/Dockerfile (1)

1-1: 🛠️ Refactor suggestion

Pin Base Image Version:
Use a fixed tag (e.g., ubuntu:22.04) instead of latest for reproducibility.

🧹 Nitpick comments (14)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)

41-68: Consider removing commented code.

Since the functionality is moved to interactive.py, consider removing the commented test class instead of keeping it as reference.

api/py/ai/chronon/cli/compile/compile_context.py (2)

2-2: Remove unused import.

The error import is never used.

-from logging import error
🧰 Tools
🪛 Ruff (0.8.2)

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Rename unused variable.

sub_dirs is unused. Use _sub_dirs to clarify intent.

-for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
🧰 Tools
🪛 Ruff (0.8.2)

107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/parse_configs.py (1)

5-5: Remove unused imports.

Dict and Tuple are unused.

-from typing import Any, Dict, List, Tuple
+from typing import Any, List
🧰 Tools
🪛 Ruff (0.8.2)

5-5: typing.Dict imported but unused

Remove unused import

(F401)


5-5: typing.Tuple imported but unused

Remove unused import

(F401)

api/py/ai/chronon/cli/entrypoint.py (1)

46-51: Consider more explicit scope options.

Replace scope with run_upstream and run_downstream flags for clarity.

api/py/test/test_git_utils.py (1)

5-5: Remove unused import.

-from pathlib import Path
🧰 Tools
🪛 Ruff (0.8.2)

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

api/py/ai/chronon/cli/compile/compiler.py (1)

14-14: Remove unused import.

The time module is imported but not used.

Apply this diff to remove the unused import:

-import time
🧰 Tools
🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)

api/py/ai/chronon/cli/compile/serializer.py (1)

78-81: Combine if branches for integer types.

The branches for I64, I32, I16, and BYTE can be combined using logical OR operator.

Apply this diff to combine the branches:

-        elif ttype == TType.I64:
-            ret = int(val)
-        elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE:
-            ret = int(val)
+        elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE):
+            ret = int(val)
🧰 Tools
🪛 Ruff (0.8.2)

78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/thrift/api.thrift (1)

237-238: Consider removing the deprecated field.

Since modeToEnvMap is deprecated and env is its replacement, consider removing the old field.

Also applies to: 254-254

api/py/ai/chronon/cli/compile/conf_validator.py (3)

18-20: Remove unused import.

The os module is imported but never used.

-import os
🧰 Tools
🪛 Ruff (0.8.2)

20-20: os imported but unused

Remove unused import: os

(F401)


53-58: Simplify boolean return.

Return the condition directly.

-def _is_batch_upload_needed(group_by: GroupBy) -> bool:
-    if group_by.metaData.online or group_by.backfillStartDate:
-        return True
-    else:
-        return False
+def _is_batch_upload_needed(group_by: GroupBy) -> bool:
+    return group_by.metaData.online or group_by.backfillStartDate
🧰 Tools
🪛 Ruff (0.8.2)

54-57: Return the condition directly

Inline condition

(SIM103)


356-358: Simplify nested if statements.

Combine nested if statements into a single condition.

-            if wild_card_derivation_included:
-                if derivation.expression in derived_columns:
+            if wild_card_derivation_included and derivation.expression in derived_columns:
-        if join.metaData.online:
-            if offline_included_group_bys:
+        if join.metaData.online and offline_included_group_bys:

Also applies to: 421-423

🧰 Tools
🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)

.github/workflows/test_python.yaml (1)

37-38: Remove trailing spaces.

Line 38 contains trailing spaces.

-        run: |          
+        run: |
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)

.github/image/Dockerfile (1)

59-59: Pip Package Management:
Consider pinning package versions to ensure stability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4dbe3ed and 3e0c5f0.

📒 Files selected for processing (36)
  • .bazelrc (1 hunks)
  • .github/image/Dockerfile (3 hunks)
  • .github/workflows/test_python.yaml (1 hunks)
  • .github/workflows/test_scala_no_spark.yaml (1 hunks)
  • .github/workflows/test_scala_spark.yaml (1 hunks)
  • api/py/ai/chronon/cli/compile/compile_context.py (1 hunks)
  • api/py/ai/chronon/cli/compile/compiler.py (1 hunks)
  • api/py/ai/chronon/cli/compile/conf_validator.py (1 hunks)
  • api/py/ai/chronon/cli/compile/fill_templates.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_configs.py (1 hunks)
  • api/py/ai/chronon/cli/compile/parse_teams.py (1 hunks)
  • api/py/ai/chronon/cli/compile/serializer.py (1 hunks)
  • api/py/ai/chronon/cli/entrypoint.py (1 hunks)
  • api/py/ai/chronon/cli/git_utils.py (1 hunks)
  • api/py/ai/chronon/cli/logger.py (1 hunks)
  • api/py/ai/chronon/cli/logging.py (1 hunks)
  • api/py/ai/chronon/group_by.py (0 hunks)
  • api/py/ai/chronon/join.py (0 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1 hunks)
  • api/py/test/sample/teams.py (1 hunks)
  • api/py/test/test_git_utils.py (1 hunks)
  • api/py/test/test_utils.py (10 hunks)
  • api/py/test/test_validator.py (3 hunks)
  • api/py/tox.ini (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
  • api/thrift/api.thrift (4 hunks)
  • api/thrift/common.thrift (1 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (8 hunks)
  • hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (9 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala (1 hunks)
  • tools/build_rules/thrift/thrift.bzl (0 hunks)
💤 Files with no reviewable changes (3)
  • tools/build_rules/thrift/thrift.bzl
  • api/py/ai/chronon/join.py
  • api/py/ai/chronon/group_by.py
🚧 Files skipped from review as they are similar to previous changes (16)
  • spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala
  • api/thrift/common.thrift
  • hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala
  • .bazelrc
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
  • .github/workflows/test_scala_no_spark.yaml
  • api/py/tox.ini
  • spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala
  • hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala
  • api/py/test/test_utils.py
  • api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
  • spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
  • api/py/ai/chronon/cli/logging.py
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
  • api/py/ai/chronon/cli/compile/fill_templates.py
  • api/py/test/test_validator.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/teams.py

8-8: Undefined name EnvironmentVariables

(F821)


45-45: Undefined name EnvironmentVariables

(F821)


66-66: Undefined name EnvironmentVariables

(F821)

api/py/ai/chronon/cli/compile/conf_validator.py

20-20: os imported but unused

Remove unused import: os

(F401)


54-57: Return the condition directly

Inline condition

(SIM103)


356-357: Use a single if statement instead of nested if statements

(SIM102)


421-422: Use a single if statement instead of nested if statements

(SIM102)

api/py/ai/chronon/cli/compile/compile_context.py

2-2: logging.error imported but unused

Remove unused import: logging.error

(F401)


107-107: Loop control variable sub_dirs not used within loop body

Rename unused sub_dirs to _sub_dirs

(B007)

api/py/ai/chronon/cli/compile/compiler.py

14-14: time imported but unused

Remove unused import: time

(F401)


36-36: Local variable status is assigned to but never used

Remove assignment to unused variable status

(F841)

api/py/ai/chronon/cli/compile/parse_configs.py

5-5: typing.Dict imported but unused

Remove unused import

(F401)


5-5: typing.Tuple imported but unused

Remove unused import

(F401)

api/py/ai/chronon/cli/compile/serializer.py

19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused

Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory

(F401)


78-81: Combine if branches using logical or operator

Combine if branches

(SIM114)

api/py/ai/chronon/cli/entrypoint.py

15-15: Do not use bare except

(E722)

api/py/ai/chronon/cli/git_utils.py

4-4: typing.Set imported but unused

Remove unused import: typing.Set

(F401)


33-33: Do not use bare except

(E722)


73-73: Do not use bare except

(E722)

api/py/test/test_git_utils.py

5-5: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)

🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml

[error] 38-38: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: other_spark_tests
🔇 Additional comments (22)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)

26-37: Verify if LocalDataLoaderTest object is still needed.

The object contains cleanup code but might be unused now that tests are moved.

✅ Verification successful

LocalDataLoaderTest appears to be isolated to its own test file.

  • Only self-references exist (including a commented-out test class).
  • No other Scala files reference it.

It seems safe to remove it if its cleanup code is no longer required.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for references to LocalDataLoaderTest object
rg "LocalDataLoaderTest\." --type scala

Length of output: 41


Script:

#!/bin/bash
# Search all Scala files for any mention of LocalDataLoaderTest.
rg "LocalDataLoaderTest" -g "*.scala"

Length of output: 437

api/py/ai/chronon/cli/compile/compile_context.py (1)

111-112: Fix reversed condition.

This logic skips non-hidden files and processes hidden ones, contradicting the comment.

-if not f.startswith("."):  # ignore hidden files - such as .DS_Store
-    continue
+if f.startswith("."):
+    continue
api/py/ai/chronon/cli/logger.py (1)

58-58: Specify exception type.

Raise statement should specify an exception type as previously noted.

-raise
+raise ValueError(message)
api/py/ai/chronon/cli/entrypoint.py (1)

8-16: ⚠️ Potential issue

Specify exception type.

 def get_current_branch():
     try:
         return (
             subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"])
             .decode("utf-8")
             .strip()
         )
-    except:
+    except subprocess.CalledProcessError as e:
+        logger.warning(f"Failed to get branch: {e}")
         return "main"

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

15-15: Do not use bare except

(E722)

api/py/test/sample/teams.py (1)

1-1: ⚠️ Potential issue

Add missing import.

-from ai.chronon.api.ttypes import Team
+from ai.chronon.api.ttypes import Team, EnvironmentVariables

Likely invalid or redundant comment.

api/py/ai/chronon/cli/compile/parse_teams.py (1)

21-21: ⚠️ Potential issue

Use absolute import path.

-    team_module = importlib.import_module("teams")
+    team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")

Likely invalid or redundant comment.

api/py/ai/chronon/repo/compile.py (1)

241-275: LGTM!

The enhanced validation and error handling logic is well-implemented with proper error messages and warnings.

api/thrift/api.thrift (2)

441-446: LGTM: Well-structured environment variables.

The EnvironmentVariables struct provides a clear organization of environment variables across different contexts.


448-457: LGTM: Team metadata structure.

The Team struct effectively encapsulates team-related information.

api/py/ai/chronon/cli/compile/conf_validator.py (3)

334-378: LGTM: Well-structured derivation validation.

The derivation validation logic is comprehensive and handles all edge cases.

🧰 Tools
🪛 Ruff (0.8.2)

356-357: Use a single if statement instead of nested if statements

(SIM102)


379-440: LGTM: Thorough join validation.

The join validation covers all necessary checks including production status and online/offline compatibility.

🧰 Tools
🪛 Ruff (0.8.2)

421-422: Use a single if statement instead of nested if statements

(SIM102)


441-515: LGTM: Comprehensive group by validation.

The group by validation effectively handles all validation scenarios.

.github/workflows/test_scala_spark.yaml (2)

38-41: LGTM: Secure credential handling.

Properly decodes and stores Bazel cache credentials.


42-47: LGTM: Efficient test configuration.

Uses Bazel's remote caching for improved test performance.

.github/workflows/test_python.yaml (2)

49-49: LGTM: Updated linting configuration.

Appropriate exclusion of cli directory from flake8 checks.


37-37: LGTM: Explicit shell specification.

Clear specification of bash shell for command execution.

Also applies to: 43-43, 52-52

.github/image/Dockerfile (6)

6-6: Dependencies Check:
Ensure python3.12-venv and openjdk-17-jdk are required.


17-19: Bazel Setup:
Bazelisk binary is set correctly; verify that version v1.18.0 meets current needs.


23-56: Thrift Installation & Cleanup:
The build, install, and purge flow is solid. Confirm purged packages aren’t needed later.


61-61: Tool Installation:
Installing build-essential and git is clear and straightforward.


62-62: Java Symlink:
Symlink creation is concise; confirm that the target directory exists.


65-71: Installation Verification:
All key tools (Java, Thrift, Python, Pip, Bazel, Git) are verified.

Comment on lines +39 to 40
// Not needed since we are doing this via interactive.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Interactive.py misses equivalent test coverage.

  • No test functions were found in api/py/ai/chronon/repo/interactive.py.
  • No references to LocalDataLoader appear there.

It looks like the transition from Scala tests to Python testing isn’t fully complete. Please add tests in interactive.py or elsewhere to cover this functionality.

🔗 Analysis chain

Verify test coverage in interactive.py.

Please confirm that equivalent test coverage exists in interactive.py.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for corresponding test coverage in interactive.py
rg -l "test_table_1_data" | grep "interactive.py"

Length of output: 47


Script:

#!/bin/bash
# List any interactive.py files in the repository.
echo "=== Listing interactive.py files ==="
fd interactive.py

# For each interactive.py file found, display any test functions and references to LocalDataLoader.
echo "=== Checking interactive.py for test coverage and LocalDataLoader references ==="
for file in $(fd interactive.py); do 
  echo "File: $file"
  echo "-- Test functions --"
  grep -H "def test_" "$file" || echo "No test functions found in $file"
  echo "-- LocalDataLoader references --"
  grep -H "LocalDataLoader" "$file" || echo "No LocalDataLoader references found in $file"
done

Length of output: 1016

Comment on lines +64 to +82
sample_team = Team(
namespace="test",
env=EnvironmentVariables(
common={
"GCP_BIGTABLE_INSTANCE_ID": "test-instance" # example, custom bigtable instance
},
backfill={ # smaller jobs for testing
"EXECUTOR_CORES": "2",
"DRIVER_MEMORY": "15G",
"EXECUTOR_MEMORY": "4G",
"PARALLELISM": "4",
"MAX_EXECUTORS": "4",
},
upload={
"PARALLELISM": "2",
"MAX_EXECUTORS": "4",
},
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove duplicate team configuration.

sample_team is identical to test team. Remove the duplicate.

🧰 Tools
🪛 Ruff (0.8.2)

66-66: Undefined name EnvironmentVariables

(F821)

Comment on lines +43 to +44
metadata = obj.metaData

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize metadata access.

-    metadata = obj.metaData
-
-    require(obj.metaData is not None, "Cannot update empty metadata")
+    require(obj.metaData is not None, "Cannot update empty metadata")
+    metadata = obj.metaData
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metadata = obj.metaData
require(obj.metaData is not None, "Cannot update empty metadata")
metadata = obj.metaData

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/test_python.yaml (1)

41-43: Remove trailing whitespace.
Extra whitespace in the run block header (line 43) should be removed.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 43-43: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 3e0c5f0 and 82704f3.

📒 Files selected for processing (1)
  • .github/workflows/test_python.yaml (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml

[error] 43-43: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: other_spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
.github/workflows/test_python.yaml (3)

36-40: Git config step approved.
Git user configuration enhances commit traceability.


47-55: Lint step updated.
Using bash and excluding the 'cli' directory is intentional and consistent.


57-67: Tests step updated.
The bash shell setting ensures a consistent test environment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
project/ThriftGen.scala (1)

32-37: Use UTF-8 for writing and add proper resource management.

-    val writer = new java.io.PrintWriter(file)
-    try {
-      writer.write(newContent)
-    } finally {
-      writer.close()
-    }
+    val writer = new java.io.PrintWriter(file, "UTF-8")
+    try {
+      writer.write(newContent)
+    } finally {
+      writer.close()
+    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between b3a30f0 and 845949f.

📒 Files selected for processing (2)
  • .github/workflows/test_scala_fmt.yaml (1 hunks)
  • project/ThriftGen.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/test_scala_fmt.yaml
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: no_spark_scala_tests
  • GitHub Check: other_spark_tests

Comment on lines +28 to 30
val source = scala.io.Source.fromFile(file, "UTF-8")
val content = source.mkString
source.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Good addition of UTF-8 encoding, but needs consistent application.

While specifying UTF-8 for reading is good, the writer should also use UTF-8.

-    val source = scala.io.Source.fromFile(file, "UTF-8")
-    val content = source.mkString
-    source.close()
+    var source: scala.io.Source = null
+    try {
+      source = scala.io.Source.fromFile(file, "UTF-8")
+      content = source.mkString
+    } finally {
+      if (source != null) source.close()
+    }

Committable suggestion skipped: line range outside the PR's diff.

@kumar-zlai kumar-zlai merged commit 38f36f8 into main Feb 4, 2025
8 checks passed
@kumar-zlai kumar-zlai deleted the cli_v2 branch February 4, 2025 20:48
@coderabbitai coderabbitai bot mentioned this pull request Mar 3, 2025
4 tasks
kumar-zlai added a commit that referenced this pull request Apr 25, 2025
## Summary

Newer version with better ergonomics, and more maintainable codebase -
smaller files, simpler logic - with little indirection etc with user
facing simplification

- [x] always compile the whole repo
- [x] teams thrift and teams python
- [x] compile context as its own object
- [ ] integration test on sample
- [ ] progress bar for compile
- [ ] sync to remote w/ progress bar
- [ ] display workflow
- [ ] display workflow progress

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New CLI Commands**
- Introduced streamlined commands for synchronizing, backfilling, and
deploying operations for easier management.

- **Enhanced Logging**
- Improved colored, structured log outputs for clearer real-time
monitoring and debugging.

- **Configuration & Validation Upgrades**
- Strengthened configuration management and validation processes to
ensure reliable operations.
- Added a comprehensive validation framework for Chronon API thrift
objects.

- **Build & Infrastructure Improvements**
- Transitioned to a new container base and modernized testing/build
systems for better performance and stability.

- **Team & Utility Enhancements**
- Expanded team configuration options and refined utility processes to
streamline overall workflows.
- Introduced new data classes and methods for improved configuration and
compilation context management.
- Enhanced templating functionality for dynamic replacements based on
object properties.
- Improved handling of Git operations and enhanced error logging for
better traceability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ken Morton <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
kumar-zlai added a commit that referenced this pull request Apr 29, 2025
## Summary

Newer version with better ergonomics, and more maintainable codebase -
smaller files, simpler logic - with little indirection etc with user
facing simplification

- [x] always compile the whole repo
- [x] teams thrift and teams python
- [x] compile context as its own object
- [ ] integration test on sample
- [ ] progress bar for compile
- [ ] sync to remote w/ progress bar
- [ ] display workflow
- [ ] display workflow progress

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New CLI Commands**
- Introduced streamlined commands for synchronizing, backfilling, and
deploying operations for easier management.

- **Enhanced Logging**
- Improved colored, structured log outputs for clearer real-time
monitoring and debugging.

- **Configuration & Validation Upgrades**
- Strengthened configuration management and validation processes to
ensure reliable operations.
- Added a comprehensive validation framework for Chronon API thrift
objects.

- **Build & Infrastructure Improvements**
- Transitioned to a new container base and modernized testing/build
systems for better performance and stability.

- **Team & Utility Enhancements**
- Expanded team configuration options and refined utility processes to
streamline overall workflows.
- Introduced new data classes and methods for improved configuration and
compilation context management.
- Enhanced templating functionality for dynamic replacements based on
object properties.
- Improved handling of Git operations and enhanced error logging for
better traceability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ken Morton <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

Newer version with better ergonomics, and more maintainable codebase -
smaller files, simpler logic - with little indirection etc with user
facing simplification

- [x] always compile the whole repo
- [x] teams thrift and teams python
- [x] compile context as its own object
- [ ] integration test on sample
- [ ] progress bar for compile
- [ ] sync to remote w/ progress bar
- [ ] display workflow
- [ ] display workflow progress

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New CLI Commands**
- Introduced streamlined commands for synchronizing, backfilling, and
deploying operations for easier management.

- **Enhanced Logging**
- Improved colored, structured log outputs for clearer real-time
monitoring and debugging.

- **Configuration & Validation Upgrades**
- Strengthened configuration management and validation processes to
ensure reliable operations.
- Added a comprehensive validation framework for Chronon API thrift
objects.

- **Build & Infrastructure Improvements**
- Transitioned to a new container base and modernized testing/build
systems for better performance and stability.

- **Team & Utility Enhancements**
- Expanded team configuration options and refined utility processes to
streamline overall workflows.
- Introduced new data classes and methods for improved configuration and
compilation context management.
- Enhanced templating functionality for dynamic replacements based on
object properties.
- Improved handling of Git operations and enhanced error logging for
better traceability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ken Morton <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

Newer version with better ergonomics, and more maintainable codebase -
smaller files, simpler logic - with little indirection etc with user
facing simplification

- [x] always compile the whole repo
- [x] teams thrift and teams python
- [x] compile context as its own object
- [ ] integration test on sample
- [ ] progress bar for compile
- [ ] sync to remote w/ progress bar
- [ ] display workflow
- [ ] display workflow progress

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New CLI Commands**
- Introduced streamlined commands for synchronizing, backfilling, and
deploying operations for easier management.

- **Enhanced Logging**
- Improved colored, structured log outputs for clearer real-time
monitoring and debugging.

- **Configuration & Validation Upgrades**
- Strengthened configuration management and validation processes to
ensure reliable operations.
- Added a comprehensive validation framework for Chronon API thrift
objects.

- **Build & Infrastructure Improvements**
- Transitioned to a new container base and modernized testing/build
systems for better performance and stability.

- **Team & Utility Enhancements**
- Expanded team configuration options and refined utility processes to
streamline overall workflows.
- Introduced new data classes and methods for improved configuration and
compilation context management.
- Enhanced templating functionality for dynamic replacements based on
object properties.
- Improved handling of Git operations and enhanced error logging for
better traceability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ken Morton <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
@coderabbitai coderabbitai bot mentioned this pull request May 15, 2025
4 tasks
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

Newer version with better ergonomics, and more maintainable codebase -
smaller files, simpler logic - with little indirection etc with user
facing simplification

- [x] always compile the whole repo
- [x] teams thrift and teams python
- [x] compile context as its own object
- [ ] integration test on sample
- [ ] progress bar for compile
- [ ] sync to remote w/ progress bar
- [ ] display workflow
- [ ] display workflow progress

## Cheour clientslist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New CLI Commands**
- Introduced streamlined commands for synchronizing, baour clientsfilling, and
deploying operations for easier management.

- **Enhanced Logging**
- Improved colored, structured log outputs for clearer real-time
monitoring and debugging.

- **Configuration & Validation Upgrades**
- Strengthened configuration management and validation processes to
ensure reliable operations.
- Added a comprehensive validation framework for Chronon API thrift
objects.

- **Build & Infrastructure Improvements**
- Transitioned to a new container base and modernized testing/build
systems for better performance and stability.

- **Team & Utility Enhancements**
- Expanded team configuration options and refined utility processes to
streamline overall workflows.
- Introduced new data classes and methods for improved configuration and
compilation context management.
- Enhanced templating functionality for dynamic replacements based on
object properties.
- Improved handling of Git operations and enhanced error logging for
better traceability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ken Morton <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
@coderabbitai coderabbitai bot mentioned this pull request Jun 13, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants