-
Notifications
You must be signed in to change notification settings - Fork 9
cli v2 #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli v2 #277
Conversation
WalkthroughThis update enhances the system’s configuration and compilation pipeline. New data classes and methods have been added for managing compile contexts, templating, validation, and serialization. The CLI now features commands for sync, backfill, and deploy, while Git utilities have been expanded for branch and change management. Additional improvements include colored logging, upgraded Thrift structures and enums, updated Docker and build configurations, and various test and formatting adjustments. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant E as CLI Entrypoint
participant G as GitUtils
participant C as CompileContext
participant P as Compiler
participant L as Logger
U->>E: Run CLI command (sync/backfill/deploy)
E->>G: Get current branch
G-->>E: Return branch name
E->>C: Initialize compile context
E->>P: Call compiler.main(...)
P-->>E: Return compile results
E->>L: Log results
E-->>U: Output final result
Possibly related PRs
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🧹 Nitpick comments (15)
api/py/ai/chronon/cli/git_utils.py (1)
1-4: Remove unused import.
Setis never used.-from typing import List, Optional, Set +from typing import List, Optional🧰 Tools
🪛 Ruff (0.8.2)
4-4:
typing.Setimported but unusedRemove unused import:
typing.Set(F401)
api/py/ai/chronon/cli/logging.py (1)
17-30: Add error handling for timestamp creation.def format(self, record): - time_str = datetime.fromtimestamp(record.created).strftime("%H:%M:%S") + try: + time_str = datetime.fromtimestamp(record.created).strftime("%H:%M:%S") + except (ValueError, TypeError, OSError) as e: + time_str = "??:??:??" + logger = logging.getLogger(__name__) + logger.error(f"Failed to format timestamp: {e}")api/py/ai/chronon/cli/compile/compiler.py (1)
48-66: Add progress bar for better UX.+ with tqdm(total=len(obj_dict), desc=f"Processing {config_info.cls}") as pbar: for name, obj in obj_dict.items(): errors = _write_config_or_return_errors( name, obj, config_info, compile_context ) if errors: compile_result.error_dict[name] = errors else: compile_result.obj_dict[name] = obj + pbar.update(1)api/py/ai/chronon/cli/compile/parse_teams.py (1)
11-28: Add error handling for module import failures.Consider wrapping the module import in a try-except block to provide more informative error messages.
🧰 Tools
🪛 Ruff (0.8.2)
17-17: Undefined name
importlib(F821)
api/py/test/test_git_utils.py (1)
1-11: Remove unused import.import pytest import os import shutil import subprocess -from pathlib import Path from ai.chronon.cli.git_utils import ( get_current_branch, get_changes_since_commit, get_changes_since_fork, )🧰 Tools
🪛 Ruff (0.8.2)
5-5:
pathlib.Pathimported but unusedRemove unused import:
pathlib.Path(F401)
api/py/ai/chronon/cli/compile/serializer.py (5)
41-86: Consider adding type hints and docstrings.The
_convertmethod handles type conversion comprehensively but lacks documentation.Add type hints and docstrings:
- def _convert(self, val, ttype, ttype_info): + def _convert(self, val: Any, ttype: int, ttype_info: Tuple) -> Any: + """Convert JSON value to Thrift type. + + Args: + val: Value to convert + ttype: Thrift type constant + ttype_info: Type information tuple + + Returns: + Converted value + + Raises: + TypeError: If type is not recognized + """🧰 Tools
🪛 Ruff (0.8.2)
78-81: Combine
ifbranches using logicaloroperatorCombine
ifbranches(SIM114)
93-100: Add error handling for Thrift serialization.The function should catch potential Thrift serialization errors.
def json2binary(json_str, thrift_class): - thrift = json2thrift(json_str, thrift_class) - transport = TMemoryBuffer() - protocol = TBinaryProtocolAccelerated(transport) - thrift.write(protocol) - return transport.getvalue() + try: + thrift = json2thrift(json_str, thrift_class) + transport = TMemoryBuffer() + protocol = TBinaryProtocolAccelerated(transport) + thrift.write(protocol) + return transport.getvalue() + except Exception as e: + raise Exception(f"Failed to serialize {thrift_class.__name__} to binary: {str(e)}") from e
113-118: Add error handling for JSON serialization.The function should catch potential serialization errors.
def thrift_simple_json(obj): - simple = TSerialization.serialize(obj, protocol_factory=TSimpleJSONProtocolFactory()) - parsed = json.loads(simple) - return json.dumps(parsed, indent=2) + try: + simple = TSerialization.serialize(obj, protocol_factory=TSimpleJSONProtocolFactory()) + parsed = json.loads(simple) + return json.dumps(parsed, indent=2) + except Exception as e: + raise Exception(f"Failed to serialize {type(obj).__name__} to JSON: {str(e)}") from e
17-20: Remove unused import.
TJSONProtocolFactoryis not used in the code.from thrift.protocol.TJSONProtocol import ( TSimpleJSONProtocolFactory, - TJSONProtocolFactory, )🧰 Tools
🪛 Ruff (0.8.2)
19-19:
thrift.protocol.TJSONProtocol.TJSONProtocolFactoryimported but unusedRemove unused import:
thrift.protocol.TJSONProtocol.TJSONProtocolFactory(F401)
78-82: Combine if branches for integer types.Simplify the code by combining the integer type checks.
- elif ttype == TType.I64: - ret = int(val) - elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE: - ret = int(val) + elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE): + ret = int(val)🧰 Tools
🪛 Ruff (0.8.2)
78-81: Combine
ifbranches using logicaloroperatorCombine
ifbranches(SIM114)
api/py/ai/chronon/cli/compile/validator.py (5)
35-36: Remove unused imports.-from ai.chronon.repo import JOIN_FOLDER_NAME, GROUP_BY_FOLDER_NAME -from ai.chronon.repo.serializer import thrift_simple_json, file2thrift +from ai.chronon.repo.serializer import thrift_simple_json🧰 Tools
🪛 Ruff (0.8.2)
35-35:
ai.chronon.repo.JOIN_FOLDER_NAMEimported but unusedRemove unused import
(F401)
35-35:
ai.chronon.repo.GROUP_BY_FOLDER_NAMEimported but unusedRemove unused import
(F401)
36-36:
ai.chronon.repo.serializer.file2thriftimported but unusedRemove unused import:
ai.chronon.repo.serializer.file2thrift(F401)
184-191: Add docstring to helper function.The function's purpose and return value should be documented.
def _source_has_topic(source: Source) -> bool: + """ + Check if the source has a topic defined. + + Args: + source: The source object to check + Returns: + bool: True if the source has a topic, False otherwise + """ if source.events:
198-209: Consider early return for better readability.The function can be simplified.
def _group_by_has_hourly_windows(groupBy: GroupBy) -> bool: aggs: List[Aggregation] = groupBy.aggregations if not aggs: return False for agg in aggs: for window in agg.windows: if window.timeUnit == common.TimeUnit.HOUR: return True - - return False + return False
410-416: Simplify nested if statements.Combine the conditions for better readability.
- if join.metaData.online: - if offline_included_group_bys: - errors.append( - "join {} is online but includes the following offline group_bys: {}".format( - join.metaData.name, ", ".join(offline_included_group_bys) - ) - ) + if join.metaData.online and offline_included_group_bys: + errors.append( + "join {} is online but includes the following offline group_bys: {}".format( + join.metaData.name, ", ".join(offline_included_group_bys) + ) + )🧰 Tools
🪛 Ruff (0.8.2)
410-411: Use a single
ifstatement instead of nestedifstatements(SIM102)
455-459: Improve error message clarity.The error message could be more specific about the incompatibility.
- f"group_by {group_by.metaData.name} is defined to be daily refreshed but contains hourly windows. " + f"group_by {group_by.metaData.name} is defined as a batch (daily) feature but contains hourly windows, which is incompatible."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (16)
api/py/ai/chronon/cli/compile/compile_context.py(1 hunks)api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/fill_templates.py(1 hunks)api/py/ai/chronon/cli/compile/parse_configs.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(1 hunks)api/py/ai/chronon/cli/compile/serializer.py(1 hunks)api/py/ai/chronon/cli/compile/validator.py(12 hunks)api/py/ai/chronon/cli/entrypoint.py(1 hunks)api/py/ai/chronon/cli/git_utils.py(1 hunks)api/py/ai/chronon/cli/logging.py(1 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/test/sample/teams.py(1 hunks)api/py/test/test_git_utils.py(1 hunks)api/thrift/api.thrift(4 hunks)api/thrift/common.thrift(1 hunks)spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/parse_configs.py
3-3: logging imported but unused
Remove unused import: logging
(F401)
8-8: ai.chronon.cli.compile.constants.ConfigInfo imported but unused
Remove unused import: ai.chronon.cli.compile.constants.ConfigInfo
(F401)
12-12: Undefined name log_level
(F821)
api/py/ai/chronon/cli/compile/compiler.py
8-8: ai.chronon.cli.compile.parse_teams imported but unused
Remove unused import: ai.chronon.cli.compile.parse_teams
(F401)
9-9: ai.chronon.cli.compile.constants imported but unused
Remove unused import: ai.chronon.cli.compile.constants
(F401)
13-13: ai.chronon.api.ttypes.Team imported but unused
Remove unused import: ai.chronon.api.ttypes.Team
(F401)
15-15: ai.chronon.repo.validator.ChrononRepoValidator imported but unused
Remove unused import: ai.chronon.repo.validator.ChrononRepoValidator
(F401)
16-16: ai.chronon.api.ttypes.GroupBy imported but unused
Remove unused import
(F401)
16-16: ai.chronon.api.ttypes.Join imported but unused
Remove unused import
(F401)
16-16: ai.chronon.api.ttypes.StagingQuery imported but unused
Remove unused import
(F401)
16-16: ai.chronon.api.ttypes.Model imported but unused
Remove unused import
(F401)
22-22: Undefined name dataclass
(F821)
24-24: Undefined name ConfigInfo
(F821)
70-70: Undefined name ConfigInfo
(F821)
api/py/test/sample/teams.py
8-8: Undefined name EnvironmentVariables
(F821)
45-45: Undefined name EnvironmentVariables
(F821)
66-66: Undefined name EnvironmentVariables
(F821)
api/py/ai/chronon/cli/compile/serializer.py
19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused
Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory
(F401)
78-81: Combine if branches using logical or operator
Combine if branches
(SIM114)
api/py/test/test_git_utils.py
5-5: pathlib.Path imported but unused
Remove unused import: pathlib.Path
(F401)
api/py/ai/chronon/cli/compile/compile_context.py
19-19: Undefined name Team
(F821)
19-19: Undefined name teams
(F821)
21-21: Undefined name ChrononRepoValidator
(F821)
21-21: Undefined name ChrononRepoValidator
(F821)
24-24: Undefined name Join
(F821)
26-26: Undefined name GroupBy
(F821)
30-30: Undefined name StagingQuery
(F821)
34-34: Undefined name Model
(F821)
87-87: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
94-94: Undefined name file2thrift
(F821)
api/py/ai/chronon/cli/compile/fill_templates.py
8-8: Undefined name utils
(F821)
16-16: Undefined name teams_path
(F821)
18-18: Undefined name Join
(F821)
28-28: Undefined name Join
(F821)
33-33: Undefined name utils
(F821)
api/py/ai/chronon/cli/compile/parse_teams.py
2-2: ai.chronon.api.ttypes.MetaData imported but unused
Remove unused import: ai.chronon.api.ttypes.MetaData
(F401)
4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused
Remove unused import: ai.chronon.cli.compile.parse_configs.from_file
(F401)
17-17: Undefined name importlib
(F821)
30-30: Undefined name Any
(F821)
32-32: Undefined name metadata
(F821)
47-47: Undefined name team_conf_file
(F821)
49-49: Undefined name metadata
(F821)
51-51: Undefined name metadata
(F821)
54-54: Undefined name metadata
(F821)
api/py/ai/chronon/cli/git_utils.py
4-4: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
31-31: Do not use bare except
(E722)
71-71: Do not use bare except
(E722)
api/py/ai/chronon/cli/entrypoint.py
2-2: sys imported but unused
Remove unused import: sys
(F401)
5-5: enum.Enum imported but unused
Remove unused import: enum.Enum
(F401)
7-7: ai.chronon.cli.compile.write_files imported but unused
Remove unused import: ai.chronon.cli.compile.write_files
(F401)
16-16: Do not use bare except
(E722)
api/py/ai/chronon/cli/compile/validator.py
35-35: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused
Remove unused import
(F401)
35-35: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused
Remove unused import
(F401)
36-36: ai.chronon.repo.serializer.file2thrift imported but unused
Remove unused import: ai.chronon.repo.serializer.file2thrift
(F401)
213-213: Undefined name CompileContext
(F821)
410-411: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: table_utils_delta_format_spark_tests
- GitHub Check: other_spark_tests
- GitHub Check: mutation_spark_tests
- GitHub Check: join_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: no_spark_scala_tests
- GitHub Check: fetcher_spark_tests
🔇 Additional comments (28)
api/py/ai/chronon/cli/git_utils.py (4)
42-57: LGTM.
Logic for fork point retrieval looks adequate.
59-67: LGTM.
Fetching file content at commit is straightforward.
75-125: Ensure correct error handling.
The fallback for untracked files is good, but confirm you handle partial failures properly.
128-154: Overall logic looks good.
Filteringreal_changesis a neat approach.api/thrift/common.thrift (1)
16-20: Addition ofConfigTypelooks good.
No issues detected.api/py/ai/chronon/cli/logging.py (1)
1-14: LGTM! Color codes are well-organized and follow standard ANSI escape sequences.api/py/ai/chronon/cli/entrypoint.py (4)
20-24: LGTM! Well-documented CLI group.
26-31: Implement sync functionality.The sync command only echoes the branch name without actual implementation.
Would you like me to help implement the sync functionality?
33-56: Implement backfill functionality.The backfill command only echoes parameters without actual implementation.
Would you like me to help implement the backfill functionality?
58-67: Implement deploy functionality.The deploy command only echoes parameters without actual implementation.
Would you like me to help implement the deploy functionality?
api/py/ai/chronon/cli/compile/parse_configs.py (1)
15-41: LGTM! Well-structured function with proper error handling.api/py/ai/chronon/cli/compile/compiler.py (1)
29-47: LGTM! Clear initialization and logging.api/py/ai/chronon/cli/compile/parse_teams.py (1)
58-94: LGTM! Well-documented helper functions.api/py/test/sample/teams.py (2)
6-20: Replace placeholder values in TODOs.Several configuration values need to be replaced with actual values:
- Infra team email
- Serde class and args
- Customer ID
- GCP Project ID and resources
🧰 Tools
🪛 Ruff (0.8.2)
8-8: Undefined name
EnvironmentVariables(F821)
84-90: LGTM! Simple team configurations.api/py/test/test_git_utils.py (1)
13-85: LGTM! Well-structured test with proper cleanup.The test provides comprehensive coverage of Git utilities with proper setup and teardown.
api/py/ai/chronon/cli/compile/compile_context.py (1)
40-80: LGTM! Well-documented methods with clear examples.api/py/ai/chronon/cli/compile/serializer.py (3)
27-40: LGTM! Clean implementation of JSON to Thrift decoder.The class properly handles both dict and JSON string inputs.
89-90: LGTM! Clean wrapper function.
102-110: LGTM! Good error handling.The error message is descriptive and helpful.
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1)
53-53: LGTM! Simplified log formatting.Removed unnecessary ITALIC style while keeping the GREEN color for file:line in logs.
api/py/ai/chronon/repo/compile.py (4)
241-244: LGTM! Good logging of object details.Clear logging of team and object name.
249-256: LGTM! Proper skip handling.Good warning messages for skipped objects.
258-264: LGTM! Validation error handling.Clear error messages for validation failures.
267-275: LGTM! Safe overwrite checks.Proper handling of force overwrite and online config protection.
api/thrift/api.thrift (3)
Line range hint
236-254: Good deprecation of old env vars structure.Properly marked old
modeToEnvMapas deprecated in favor of newenvfield.
441-446: LGTM! Clean environment variables structure.Well-organized structure for different deployment contexts.
448-456: LGTM! Comprehensive team configuration.Good organization of team metadata and configuration.
| def get_current_file_content(file_path: str) -> Optional[str]: | ||
| try: | ||
| return Path(file_path).read_text() | ||
| except: | ||
| return None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid the bare except: at line 71.
Use a more precise exception type.
-except:
+except OSError:
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_current_file_content(file_path: str) -> Optional[str]: | |
| try: | |
| return Path(file_path).read_text() | |
| except: | |
| return None | |
| def get_current_file_content(file_path: str) -> Optional[str]: | |
| try: | |
| return Path(file_path).read_text() | |
| except OSError: | |
| return None | |
🧰 Tools
🪛 Ruff (0.8.2)
71-71: Do not use bare except
(E722)
| def get_current_branch() -> str: | ||
|
|
||
| try: | ||
| subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL) | ||
|
|
||
| return ( | ||
| subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"]) | ||
| .decode("utf-8") | ||
| .strip() | ||
| ) | ||
|
|
||
| except subprocess.CalledProcessError as e: | ||
|
|
||
| try: | ||
| head_file = Path(".git/HEAD").resolve() | ||
| if head_file.exists(): | ||
| content = head_file.read_text().strip() | ||
| if content.startswith("ref: refs/heads/"): | ||
| return content.split("/")[-1] | ||
|
|
||
| except: | ||
| pass | ||
|
|
||
| print( | ||
| f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}", | ||
| file=sys.stderr, | ||
| ) | ||
|
|
||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid the bare except: at line 31.
Specify a more precise exception (e.g. except OSError:).
-except:
+except OSError:
pass📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_current_branch() -> str: | |
| try: | |
| subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL) | |
| return ( | |
| subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"]) | |
| .decode("utf-8") | |
| .strip() | |
| ) | |
| except subprocess.CalledProcessError as e: | |
| try: | |
| head_file = Path(".git/HEAD").resolve() | |
| if head_file.exists(): | |
| content = head_file.read_text().strip() | |
| if content.startswith("ref: refs/heads/"): | |
| return content.split("/")[-1] | |
| except: | |
| pass | |
| print( | |
| f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}", | |
| file=sys.stderr, | |
| ) | |
| raise | |
| def get_current_branch() -> str: | |
| try: | |
| subprocess.check_output(["git", "rev-parse", "HEAD"], stderr=subprocess.DEVNULL) | |
| return ( | |
| subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"]) | |
| .decode("utf-8") | |
| .strip() | |
| ) | |
| except subprocess.CalledProcessError as e: | |
| try: | |
| head_file = Path(".git/HEAD").resolve() | |
| if head_file.exists(): | |
| content = head_file.read_text().strip() | |
| if content.startswith("ref: refs/heads/"): | |
| return content.split("/")[-1] | |
| except OSError: | |
| pass | |
| print( | |
| f"Error: {e.stderr.decode('utf-8') if e.stderr else 'Not a git repository or no commits'}", | |
| file=sys.stderr, | |
| ) | |
| raise |
🧰 Tools
🪛 Ruff (0.8.2)
31-31: Do not use bare except
(E722)
| import ai.chronon.cli.compile.parse_teams as teams | ||
|
|
||
|
|
||
| def _fill_template(table, obj, namespace): | ||
|
|
||
| if table: | ||
| table = table.replace( | ||
| "{{ logged_table }}", utils.log_table_name(obj, full_name=True) | ||
| ) | ||
| table = table.replace("{{ db }}", namespace) | ||
|
|
||
| return table | ||
|
|
||
|
|
||
| def set_templated_values(obj, cls, team_name): | ||
| namespace = teams.get_team_conf(teams_path, team_name, "namespace") | ||
|
|
||
| if cls == Join and obj.bootstrapParts: | ||
|
|
||
| for bootstrap in obj.bootstrapParts: | ||
| bootstrap.table = _fill_template(bootstrap.table, obj, namespace) | ||
|
|
||
| if obj.metaData.dependencies: | ||
| obj.metaData.dependencies = [ | ||
| _fill_template(dep, obj, namespace) for dep in obj.metaData.dependencies | ||
| ] | ||
|
|
||
| if cls == Join and obj.labelParts: | ||
|
|
||
| obj.labelParts.metaData.dependencies = [ | ||
| label_dep.replace( | ||
| "{{ join_backfill_table }}", | ||
| utils.output_table_name(obj, full_name=True), | ||
| ) | ||
| for label_dep in obj.labelParts.metaData.dependencies | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined references for utils, teams_path, Join.
Declare or import them to avoid runtime errors.
+import ai.chronon.cli.compile.utils as utils
+from ai.chronon.cli.compile.join import Join
+teams_path = "path/to/teams" # or retrieve from configCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
8-8: Undefined name utils
(F821)
16-16: Undefined name teams_path
(F821)
18-18: Undefined name Join
(F821)
28-28: Undefined name Join
(F821)
33-33: Undefined name utils
(F821)
| def get_logger(log_level=logging.INFO): | ||
|
|
||
| handler = logging.StreamHandler(sys.stdout) | ||
| handler.setFormatter(ColorFormatter()) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
| logger.addHandler(handler) | ||
| logger.setLevel(log_level) | ||
|
|
||
| return logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Prevent duplicate handlers.
def get_logger(log_level=logging.INFO):
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(ColorFormatter())
logger = logging.getLogger(__name__)
+ logger.handlers.clear() # Prevent duplicate handlers
logger.addHandler(handler)
logger.setLevel(log_level)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_logger(log_level=logging.INFO): | |
| handler = logging.StreamHandler(sys.stdout) | |
| handler.setFormatter(ColorFormatter()) | |
| logger = logging.getLogger(__name__) | |
| logger.addHandler(handler) | |
| logger.setLevel(log_level) | |
| return logger | |
| def get_logger(log_level=logging.INFO): | |
| handler = logging.StreamHandler(sys.stdout) | |
| handler.setFormatter(ColorFormatter()) | |
| logger = logging.getLogger(__name__) | |
| logger.handlers.clear() # Prevent duplicate handlers | |
| logger.addHandler(handler) | |
| logger.setLevel(log_level) | |
| return logger |
| import click | ||
| import sys | ||
| from datetime import datetime, timedelta | ||
| import subprocess | ||
| from enum import Enum | ||
|
|
||
| import ai.chronon.cli.compile.write_files as write_files | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unused imports.
import click
-import sys
from datetime import datetime, timedelta
import subprocess
-from enum import Enum
-
-import ai.chronon.cli.compile.write_files as write_files📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import click | |
| import sys | |
| from datetime import datetime, timedelta | |
| import subprocess | |
| from enum import Enum | |
| import ai.chronon.cli.compile.write_files as write_files | |
| import click | |
| from datetime import datetime, timedelta | |
| import subprocess |
🧰 Tools
🪛 Ruff (0.8.2)
2-2: sys imported but unused
Remove unused import: sys
(F401)
5-5: enum.Enum imported but unused
Remove unused import: enum.Enum
(F401)
7-7: ai.chronon.cli.compile.write_files imported but unused
Remove unused import: ai.chronon.cli.compile.write_files
(F401)
| def update_metadata(obj: Any, team_dict: Dict[str, Team]): | ||
|
|
||
| assert metadata is not None, "Cannot update empty metadata" | ||
|
|
||
| name = obj.metadata.name | ||
| team = obj.metadata.team | ||
|
|
||
| assert ( | ||
| team is not None | ||
| ), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error." | ||
|
|
||
| assert ( | ||
| team in team_dict | ||
| ), f"Team '{team}' not found in teams.py. Please add an entry." | ||
|
|
||
| assert ( | ||
| DEFAULT_CONF_TEAM in team_dict | ||
| ), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry." | ||
|
|
||
| metadata.outputNamespace = team_dict[team].outputNamespace | ||
|
|
||
| metadata.env = _merge_environment_variables( | ||
| team_dict[DEFAULT_CONF_TEAM].env, | ||
| team_dict[team].env, | ||
| metadata.env, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variables and add type hints.
+from typing import Any
+
def update_metadata(obj: Any, team_dict: Dict[str, Team]):
- assert metadata is not None, "Cannot update empty metadata"
+ assert obj.metadata is not None, "Cannot update empty metadata"
# ...
- ), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry."
+ ), f"Default team '{DEFAULT_CONF_TEAM}' not found in teams.py. Please add an entry."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def update_metadata(obj: Any, team_dict: Dict[str, Team]): | |
| assert metadata is not None, "Cannot update empty metadata" | |
| name = obj.metadata.name | |
| team = obj.metadata.team | |
| assert ( | |
| team is not None | |
| ), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error." | |
| assert ( | |
| team in team_dict | |
| ), f"Team '{team}' not found in teams.py. Please add an entry." | |
| assert ( | |
| DEFAULT_CONF_TEAM in team_dict | |
| ), f"Default team '{DEFAULT_CONF_TEAM}' not found in {team_conf_file}. Please add an entry." | |
| metadata.outputNamespace = team_dict[team].outputNamespace | |
| metadata.env = _merge_environment_variables( | |
| team_dict[DEFAULT_CONF_TEAM].env, | |
| team_dict[team].env, | |
| metadata.env, | |
| ) | |
| from typing import Any | |
| def update_metadata(obj: Any, team_dict: Dict[str, Team]): | |
| assert obj.metadata is not None, "Cannot update empty metadata" | |
| name = obj.metadata.name | |
| team = obj.metadata.team | |
| assert ( | |
| team is not None | |
| ), f"Team name is required in metadata for {name}. This usually set by compiler - so this is an internal error." | |
| assert ( | |
| team in team_dict | |
| ), f"Team '{team}' not found in teams.py. Please add an entry." | |
| assert ( | |
| DEFAULT_CONF_TEAM in team_dict | |
| ), f"Default team '{DEFAULT_CONF_TEAM}' not found in teams.py. Please add an entry." | |
| metadata.outputNamespace = team_dict[team].outputNamespace | |
| metadata.env = _merge_environment_variables( | |
| team_dict[DEFAULT_CONF_TEAM].env, | |
| team_dict[team].env, | |
| metadata.env, | |
| ) |
🧰 Tools
🪛 Ruff (0.8.2)
30-30: Undefined name Any
(F821)
32-32: Undefined name metadata
(F821)
47-47: Undefined name team_conf_file
(F821)
49-49: Undefined name metadata
(F821)
51-51: Undefined name metadata
(F821)
54-54: Undefined name metadata
(F821)
| @@ -0,0 +1,90 @@ | |||
| from ai.chronon.api.ttypes import Team | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing import.
-from ai.chronon.api.ttypes import Team
+from ai.chronon.api.ttypes import Team, EnvironmentVariables📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from ai.chronon.api.ttypes import Team | |
| from ai.chronon.api.ttypes import Team, EnvironmentVariables |
| from dataclasses import dataclass | ||
| import os | ||
| from typing import Any, Dict, List | ||
|
|
||
| from ai.chronon.api.common.ttypes import ConfigType | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing imports.
from dataclasses import dataclass
import os
from typing import Any, Dict, List
from ai.chronon.api.common.ttypes import ConfigType
+from ai.chronon.api.ttypes import Team
+from ai.chronon.cli.compile import teams
+from ai.chronon.cli.validation import ChrononRepoValidator
+from ai.chronon.api.models import Join, GroupBy, StagingQuery, Model
+from ai.chronon.cli.utils import file2thrift📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from dataclasses import dataclass | |
| import os | |
| from typing import Any, Dict, List | |
| from ai.chronon.api.common.ttypes import ConfigType | |
| from dataclasses import dataclass | |
| import os | |
| from typing import Any, Dict, List | |
| from ai.chronon.api.common.ttypes import ConfigType | |
| from ai.chronon.api.ttypes import Team | |
| from ai.chronon.cli.compile import teams | |
| from ai.chronon.cli.validation import ChrononRepoValidator | |
| from ai.chronon.api.models import Join, GroupBy, StagingQuery, Model | |
| from ai.chronon.cli.utils import file2thrift |
| def existing_confs(self, obj_class: type) -> Dict[str, object]: | ||
|
|
||
| result = {} | ||
|
|
||
| output_dir = self.output_dir(obj_class) | ||
|
|
||
| for sub_root, sub_dirs, sub_files in os.walk(output_dir): | ||
|
|
||
| for f in sub_files: | ||
|
|
||
| if not f.startswith("."): # ignore hidden files - such as .DS_Store | ||
| continue | ||
|
|
||
| obj = file2thrift(os.path.join(sub_root, f), obj_class) | ||
|
|
||
| if obj and hasattr(obj, "metaData"): | ||
| result[obj.metaData.name] = obj | ||
|
|
||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix hidden files check and rename unused variable.
- def existing_confs(self, obj_class: type) -> Dict[str, object]:
+ def existing_confs(self, obj_class: type) -> Dict[str, object]:
result = {}
output_dir = self.output_dir(obj_class)
- for sub_root, sub_dirs, sub_files in os.walk(output_dir):
+ for sub_root, _sub_dirs, sub_files in os.walk(output_dir):
for f in sub_files:
- if not f.startswith("."): # ignore hidden files
+ if f.startswith("."): # ignore hidden files
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def existing_confs(self, obj_class: type) -> Dict[str, object]: | |
| result = {} | |
| output_dir = self.output_dir(obj_class) | |
| for sub_root, sub_dirs, sub_files in os.walk(output_dir): | |
| for f in sub_files: | |
| if not f.startswith("."): # ignore hidden files - such as .DS_Store | |
| continue | |
| obj = file2thrift(os.path.join(sub_root, f), obj_class) | |
| if obj and hasattr(obj, "metaData"): | |
| result[obj.metaData.name] = obj | |
| return result | |
| def existing_confs(self, obj_class: type) -> Dict[str, object]: | |
| result = {} | |
| output_dir = self.output_dir(obj_class) | |
| for sub_root, _sub_dirs, sub_files in os.walk(output_dir): | |
| for f in sub_files: | |
| if f.startswith("."): # ignore hidden files - such as .DS_Store | |
| continue | |
| obj = file2thrift(os.path.join(sub_root, f), obj_class) | |
| if obj and hasattr(obj, "metaData"): | |
| result[obj.metaData.name] = obj | |
| return result |
🧰 Tools
🪛 Ruff (0.8.2)
87-87: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
94-94: Undefined name file2thrift
(F821)
|
|
||
| class ChrononRepoValidator(object): | ||
| def __init__(self, chronon_root_path: str, output_root: str, log_level=logging.INFO): | ||
| def __init__(self, compile_context: CompileContext, log_level=logging.INFO): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing import for CompileContext.
+from ai.chronon.cli.compile.compile_context import CompileContextCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
213-213: Undefined name CompileContext
(F821)
|
|
||
|
|
||
| @cli.command() | ||
| @click.argument("conf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that this points to the python file or the compiled directory? I was thinking python, with an object selector in case you specify a file with multiple objects within it.
Let's keep these arguments consistent with what's in runner.py (
chronon/api/py/ai/chronon/repo/runner.py
Line 144 in c0f1645
| def backfill(self, start_date, end_date, force_recompute=False, plan=False): |
And in fact, we can even complete the implementation in this file to call those runner functions, which are attached to the base python objects (like here: https://github.com/zipline-ai/chronon/blob/main/api/py/ai/chronon/group_by.py#L610-L612).
So the logic here would be:
- Extract object
- Call object.backfill(args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we can add scope to runner.py, although I wonder if scope is the right word? Although it's more verbose, run_upstream=t/f and run_downstream=t/f might be more explicit and clearer.
varant-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a lot of compile logic in my previous PR that I want to make sure that we're still keeping. Since we compile the whole repo when we try to run a given object (ensures compile of the target file works, errors if not, warns if other things are failing, etc...)
| "GCP_DATAPROC_CLUSTER_NAME": "canary-2", # TODO: GCP Dataproc Cluster Name | ||
| "GCP_BIGTABLE_INSTANCE_ID": "zipline-canary-instance", # TODO: GCP Bigtable Instance ID | ||
| }, | ||
| backfill={ # Backfills tend to be larger than other jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main thing here is I think we'd like to just use spark config props - that removes an extra layer of indirection, and makes it clear to the user what this is for. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am down for that actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
🔭 Outside diff range comments (3)
online/src/main/scala/ai/chronon/online/TopicChecker.scala (1)
43-80: Close admin client in topic check.
Likewise, callclose()after checking topics to release resources.def topicShouldExist(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Unit = { val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap)) try { val adminClient = AdminClient.create(props) ... + adminClient.close() } catch { case ex: Exception => throw new RuntimeException(s"Failed to check for topic $topic in $bootstrap", ex) } }frontend/src/routes/thrift/+page.svelte (1)
1-136: Move API testing code to dedicated test file.Testing code should not be in production routes. Consider:
- Moving to a test file
- Using environment variables for IDs
- Using relative timestamps
distribution/build_and_upload_gcp_artifacts.sh (1)
24-27: Fix Variable Expansion
Prefix EXPECTED_MINIMUM_MINOR_PYTHON_VERSION and MINOR_PYTHON_VERSION with '$' for proper evaluation.- if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then + if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
🧹 Nitpick comments (47)
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)
565-611: Consider adding null/empty case test.Test case looks good but could be more comprehensive.
Add a test case for empty/null listing IDs:
it should "handle explode invocations in select clauses" in { // ... existing test code ... + val emptyRow: Map[String, Any] = Map( + "event_name" -> "backend_cart_payment", + "properties" -> makeHashMap("sold_listing_ids" -> ""), + ) + + val empty_res = cu.performSql(emptyRow) + empty_res.size shouldBe 0 }spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (1)
52-53: Add explicit return type for UDF.Improve type safety by declaring return type.
- private def edit_distance: UserDefinedFunction = - functions.udf((left: Object, right: Object) => EditDistance.between(left, right)) + private def edit_distance: UserDefinedFunction = + functions.udf((left: Object, right: Object) => EditDistance.between(left, right): Int)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (1)
55-66: Enhance date conversion test structure.While the test covers both date types, consider:
- Extracting duplicate conversion logic
- Asserting the actual BigQuery date values
it should "bigquery connector converts spark dates regardless of date setting" in { val input = spark.createDataFrame(Seq((1, "2021-01-01"))).toDF("id", "ds") + def testDateConversion(useJava8Api: Boolean) = { + spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, useJava8Api) + val date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1) + val expectedType = if (useJava8Api) classOf[java.time.LocalDate] else classOf[java.sql.Date] + assert(date.getClass == expectedType) + val bqDate = SparkBigQueryUtil.sparkDateToBigQuery(date) + assert(bqDate != null) // Add specific value assertion + } - spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true) - val java8Date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1) - assert(java8Date.isInstanceOf[java.time.LocalDate]) - SparkBigQueryUtil.sparkDateToBigQuery(java8Date) - - spark.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, false) - val nonJava8Date = input.select(col("id"), to_date(col("ds"))).collect.take(1).head.get(1) - assert(nonJava8Date.isInstanceOf[java.sql.Date]) - SparkBigQueryUtil.sparkDateToBigQuery(nonJava8Date) + testDateConversion(true) + testDateConversion(false) }online/src/main/scala/ai/chronon/online/CatalystUtil.scala (2)
147-150: Use typed container.
Returning Seq[Array[Any]] may undermine type safety.
167-193: Method size.
Consider smaller subroutines for readability.hub/src/main/java/ai/chronon/hub/HubVerticle.java (1)
82-85: Consider separate request classes for join and column drift.Using JoinDriftRequest for both join and column drift endpoints could cause confusion. Create distinct request classes for better type safety.
-router.get("/api/v1/join/:name/column/:columnName/drift").handler(RouteHandlerWrapper.createHandler(driftHandler::getColumnDrift, JoinDriftRequest.class)); +router.get("/api/v1/join/:name/column/:columnName/drift").handler(RouteHandlerWrapper.createHandler(driftHandler::getColumnDrift, ColumnDriftRequest.class));cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)
6-6: Document rationale for wildcard import.Consider adding a comment explaining why wildcard import is preferred over specific imports.
79-80: Consider adding descriptive error message for Hive fallback.Good fallback logic, but users might benefit from knowing why Hive was chosen.
- scala.Option(btTableIdentifier.getProject).map(BigQueryFormat(_, bigQueryClient, Map.empty)).getOrElse(Hive)) + scala.Option(btTableIdentifier.getProject) + .map(BigQueryFormat(_, bigQueryClient, Map.empty)) + .getOrElse { + logger.info(s"No BigQuery project found for $tableName, falling back to Hive") + Hive + })flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala (1)
14-16: Make cache capacity configurable.
Separate the cache size from static code so it can be customized via config if needed.- private val CacheCapacity: Int = 10 + private val CacheCapacity: Int = conf.get("registry_cache_capacity").map(_.toInt).getOrElse(10)flink/src/main/scala/ai/chronon/flink/KafkaFlinkSource.scala (1)
20-28: Improve bootstrap fallback.
You assume host and port if the config is missing yet throw an error only for the port. Consider a clearer fallback or using defaults for both.flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (2)
18-19: Prefer final variables.
Mark constructor params asvalorprivate valif they are not re-assigned.
49-55: Blocking call caution.
Await.result(..., 30.seconds)can stall event loops. Consider async handling.hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (1)
49-55: Blocking call caution.
Await.result(..., 30.seconds)can freeze threads. Async approach may be safer.api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import.
from logging import erroris unused. Drop it.-from logging import error🧰 Tools
🪛 Ruff (0.8.2)
2-2:
logging.errorimported but unusedRemove unused import:
logging.error(F401)
107-107: Rename unused variable.
Renamesub_dirsto_sub_dirsto clarify it's unused.-for sub_root, sub_dirs, sub_files in os.walk(output_dir): +for sub_root, _sub_dirs, sub_files in os.walk(output_dir):🧰 Tools
🪛 Ruff (0.8.2)
107-107: Loop control variable
sub_dirsnot used within loop bodyRename unused
sub_dirsto_sub_dirs(B007)
hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (2)
20-39: Handle unsupported types carefully
Throwing RuntimeException is okay, but consider a domain-specific exception.
44-48: Casting with reflection
Consider safer casts or typed definitions.frontend/src/lib/api/api.ts (1)
40-46: Generic getConf
Clear approach, but return type might need stricter checks.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (1)
21-24: Consider consistent exception usage.
Elsewhere, you useUnsupportedOperationException; here you useNotImplementedError.-throw new NotImplementedError("alterTableProperties not yet supported for BigQuery") +throw new UnsupportedOperationException("alterTableProperties not yet supported for BigQuery")frontend/src/lib/components/NavigationBar.svelte (1)
251-261: Hard-coded condition may cause confusion.Consider removing or externalizing
'risk.user_transactions.txn_join'for clarity.api/thrift/fetcher.thrift (1)
6-11: Consider adding field constraints and improving docs.Add:
- Valid ranges for timestamp fields
- Max length for dataset string
- Expected format for keyBytes
frontend/src/routes/joins/+page.svelte (1)
8-12: Simplify sort function.The sort function can be simplified using string comparison.
- const sortedItems = data.items.sort((a: IJoin, b: IJoin) => { - if (a.metaData?.name === 'risk.user_transactions.txn_join') return -1; - if (b.metaData?.name === 'risk.user_transactions.txn_join') return 1; - return 0; - }); + const sortedItems = data.items.sort((a: IJoin, b: IJoin) => + (b.metaData?.name === 'risk.user_transactions.txn_join') as any - + (a.metaData?.name === 'risk.user_transactions.txn_join') as any + );distribution/run_zipline_quickstart.py (1)
10-11: Fix line continuation style.Use parentheses for line continuation instead of backslash.
- quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)) - , "run_zipline_quickstart.sh") + quickstart_sh = os.path.join( + os.path.dirname(os.path.realpath(__file__)), + "run_zipline_quickstart.sh" + )frontend/src/routes/joins/[slug]/observability/ModelTable.svelte (1)
15-17: Simplify ModelType indexing.The conditional check for undefined can be simplified using optional chaining.
- value={model.modelType === undefined ? undefined : ModelType[model.modelType]} + value={ModelType[model.modelType ?? -1]}api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1)
18-18: Consider using relative timestamps.Hardcoded timestamp might become outdated.
- key.setTileStartTimestampMillis(1738195200000L) // 2025-01-29T00:00:00Z + key.setTileStartTimestampMillis(System.currentTimeMillis())api/src/main/scala/ai/chronon/api/TilingUtils.scala (1)
15-15: Document the compact protocol bug.Add details about the bug for future reference.
frontend/src/lib/server/conf-loader.ts (1)
39-42: Enhance error handling specificity.Consider catching specific error types and providing more detailed error messages.
} catch (error) { - console.error(`Failed to load ${entityMatch.label.toLowerCase()}:`, error); - throw error; + const errorMessage = error instanceof Error ? error.message : 'Unknown error'; + console.error(`Failed to load ${entityMatch.label.toLowerCase()}: ${errorMessage}`); + throw new Error(`Configuration loading failed: ${errorMessage}`); }api/py/ai/chronon/cli/compile/fill_templates.py (2)
4-4: Remove unused import.The
parse_teamsimport is not used in the code.-import ai.chronon.cli.compile.parse_teams as teams🧰 Tools
🪛 Ruff (0.8.2)
4-4:
ai.chronon.cli.compile.parse_teamsimported but unusedRemove unused import:
ai.chronon.cli.compile.parse_teams(F401)
7-15: Add input validation.Add validation for input parameters to prevent potential errors.
def _fill_template(table, obj, namespace): + if not isinstance(table, str): + return table if table: table = table.replace(api/py/ai/chronon/cli/logger.py (1)
5-14: Consider using an enum for color codes.Organize color codes using an enum for better maintainability.
from enum import Enum class Colors(Enum): TIME = "\033[36m" # Cyan DEBUG = "\033[36m" # Cyan INFO = "\033[32m" # Green WARNING = "\033[33m" # Yellow ERROR = "\033[31m" # Red CRITICAL = "\033[41m" # White on Red FILE = "\033[35m" # Purple RESET = "\033[0m"spark/src/main/scala/ai/chronon/spark/streaming/TopicCheckerApp.scala (1)
39-41: Consider graceful shutdown.Using
System.exit(0)might not allow proper cleanup. Consider using a more graceful shutdown mechanism.api/py/ai/chronon/cli/compile/parse_configs.py (1)
4-4: Remove unused imports.-import logging -from typing import Any, Dict, List, Tuple +from typing import Any, List -import ai.chronon.cli.compile.parse_teams as teamsAlso applies to: 6-6, 9-9
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
loggingimported but unusedRemove unused import:
logging(F401)
api/py/ai/chronon/cli/entrypoint.py (1)
2-2: Remove unused imports.-import sys -from enum import EnumAlso applies to: 5-5
🧰 Tools
🪛 Ruff (0.8.2)
2-2:
sysimported but unusedRemove unused import:
sys(F401)
api/py/ai/chronon/cli/compile/compiler.py (1)
48-49: Clean up spinner implementation.Status variable is unused and spinner code is duplicated.
- with console.status("[bold green]Loading...", spinner="dots") as status: + with console.status("[bold green]Loading...", spinner="dots"):🧰 Tools
🪛 Ruff (0.8.2)
48-48: Local variable
statusis assigned to but never usedRemove assignment to unused variable
status(F841)
api/src/main/scala/ai/chronon/api/Constants.scala (1)
80-82: Enhance documentation for magic null constants.While the comment explains language compatibility, it should also document:
- Why these specific values were chosen
- When to use these constants vs. actual null values
api/thrift/hub.thrift (1)
119-124: Add documentation for each enum value.Each ConfType value should be documented to explain its purpose and usage.
online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (1)
37-52: Enhance parse method robustness.Consider these improvements:
- Validate params format before splitting
- Handle malformed key-value pairs gracefully
def parse(topic: String): TopicInfo = { assert(topic.nonEmpty, s"invalid topic: $topic") val (messageBus, rest) = if (topic.contains("://")) { val tokens = topic.split("://", 2) tokens.head -> tokens.last } else { "kafka" -> topic } assert(rest.nonEmpty, s"invalid topic: $topic") val fields = rest.split("/") val topicName = fields.head - val params = fields.tail.map { f => - val kv = f.split("=", 2); kv.head -> kv.last - }.toMap + val params = fields.tail.flatMap { f => + try { + val kv = f.split("=", 2) + if (kv.length == 2) Some(kv.head -> kv.last) else None + } catch { + case _: Exception => None + } + }.toMap TopicInfo(topicName, messageBus, params) }api/py/ai/chronon/cli/git_utils.py (1)
4-4: Remove unused import.-from typing import List, Optional, Set +from typing import List, Optional🧰 Tools
🪛 Ruff (0.8.2)
4-4:
typing.Setimported but unusedRemove unused import:
typing.Set(F401)
hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala (1)
23-24: Add test cases for concurrent requests and error scenarios.Consider adding tests for:
- Concurrent drift requests
- DriftStore failures
- Empty result sets
Also applies to: 64-88
flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (1)
8-9: Restore removed documentation for TimestampedIR and TimestampedTile.Add back the documentation to maintain code clarity.
api/py/ai/chronon/cli/compile/validator.py (1)
458-470: Consider combining temporal and topic checks.The temporal and topic checks could be combined to reduce code complexity.
- non_temporal = ( - group_by.accuracy is None or group_by.accuracy == Accuracy.SNAPSHOT - ) - - no_topic = not _group_by_has_topic(group_by) - has_hourly_windows = _group_by_has_hourly_windows(group_by) - - # batch features cannot contain hourly windows - if (no_topic and non_temporal) and has_hourly_windows: + if (not _group_by_has_topic(group_by) and + (group_by.accuracy is None or group_by.accuracy == Accuracy.SNAPSHOT) and + _group_by_has_hourly_windows(group_by)):distribution/build_and_upload_gcp_artifacts.sh (2)
33-33: Split VERSION Assignment
Separate the VERSION extraction from the 'pip wheel' command to avoid masking return values.
9-9: Enhance cd Robustness
Consider appending "|| exit" to the cd command for added safety (e.g.,cd $CHRONON_ROOT_DIR || exit).distribution/run_zipline_quickstart.sh (3)
7-8: Robust Working Directory Change
Use "cd $WORKING_DIR || exit" to ensure the script halts if the directory change fails.🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
10-11: Remove Unused RED Variable
The RED variable is declared but never used; remove it if it’s unnecessary.🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
15-18: Address Bigtable TODO
The TODO for deleting bigtable rows remains unresolved—update this per the requirements.service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (1)
90-103: Enhance error messages.Add more context to error messages.
- LOGGER.error("Failed to serialize response", e); + LOGGER.error("Failed to serialize response to JSON using TSimpleJSONProtocol", e); - LOGGER.error("Unexpected error during serialization", e); + LOGGER.error("Unexpected error during JSON serialization of " + output.getClass().getSimpleName(), e);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (131)
.github/workflows/require_triggered_status_checks.yaml(1 hunks).github/workflows/test_frontend.yaml(1 hunks).plugin-versions(1 hunks).tool-versions(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala(1 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala(1 hunks)api/py/ai/chronon/cli/compile/compile_context.py(1 hunks)api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/fill_templates.py(1 hunks)api/py/ai/chronon/cli/compile/parse_configs.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(1 hunks)api/py/ai/chronon/cli/compile/validator.py(12 hunks)api/py/ai/chronon/cli/entrypoint.py(1 hunks)api/py/ai/chronon/cli/git_utils.py(1 hunks)api/py/ai/chronon/cli/logger.py(1 hunks)api/py/ai/chronon/group_by.py(0 hunks)api/py/ai/chronon/join.py(0 hunks)api/py/ai/chronon/repo/hub_uploader.py(2 hunks)api/py/ai/chronon/repo/run.py(13 hunks)api/py/test/sample/teams.json(2 hunks)api/src/main/scala/ai/chronon/api/Constants.scala(1 hunks)api/src/main/scala/ai/chronon/api/DataType.scala(1 hunks)api/src/main/scala/ai/chronon/api/TilingUtils.scala(1 hunks)api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala(1 hunks)api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala(1 hunks)api/thrift/api.thrift(4 hunks)api/thrift/fetcher.thrift(1 hunks)api/thrift/hub.thrift(1 hunks)api/thrift/observability.thrift(1 hunks)build.sbt(8 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala(7 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala(5 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala(2 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala(5 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProviderTest.scala(1 hunks)distribution/build_and_upload_gcp_artifacts.sh(3 hunks)distribution/run_zipline_quickstart.py(1 hunks)distribution/run_zipline_quickstart.sh(1 hunks)flink/BUILD.bazel(2 hunks)flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala(5 hunks)flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala(8 hunks)flink/src/main/scala/ai/chronon/flink/FlinkJob.scala(11 hunks)flink/src/main/scala/ai/chronon/flink/KafkaFlinkSource.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/MetricsSink.scala(2 hunks)flink/src/main/scala/ai/chronon/flink/SchemaProvider.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/SchemaRegistrySchemaProvider.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/SourceProvider.scala(0 hunks)flink/src/main/scala/ai/chronon/flink/TestFlinkJob.scala(4 hunks)flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala(1 hunks)flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala(5 hunks)flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala(1 hunks)flink/src/test/resources/user.avsc(1 hunks)flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala(4 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala(6 hunks)flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala(1 hunks)flink/src/test/scala/ai/chronon/flink/test/SchemaRegistrySchemaProviderSpec.scala(1 hunks)flink/src/test/scala/org/apache/spark/sql/avro/AvroDeSerializationSupportSpec.scala(1 hunks)frontend/src/lib/api/api.test.ts(3 hunks)frontend/src/lib/api/api.ts(3 hunks)frontend/src/lib/components/LogicalNodeTable.svelte(1 hunks)frontend/src/lib/components/NavigationBar.svelte(3 hunks)frontend/src/lib/server/conf-loader.ts(1 hunks)frontend/src/lib/types/Entity/Entity.ts(1 hunks)frontend/src/lib/types/Model/Model.test.ts(0 hunks)frontend/src/lib/types/Model/Model.ts(0 hunks)frontend/src/routes/groupbys/+page.server.ts(1 hunks)frontend/src/routes/groupbys/+page.svelte(1 hunks)frontend/src/routes/joins/+page.server.ts(1 hunks)frontend/src/routes/joins/+page.svelte(1 hunks)frontend/src/routes/joins/[slug]/observability/ModelTable.svelte(1 hunks)frontend/src/routes/joins/[slug]/services/joins.service.ts(2 hunks)frontend/src/routes/models/+page.server.ts(1 hunks)frontend/src/routes/models/+page.svelte(1 hunks)frontend/src/routes/stagingqueries/+page.server.ts(1 hunks)frontend/src/routes/stagingqueries/+page.svelte(1 hunks)frontend/src/routes/thrift/+page.svelte(1 hunks)hub/src/main/java/ai/chronon/hub/HubVerticle.java(3 hunks)hub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala(1 hunks)hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala(1 hunks)hub/src/main/scala/ai/chronon/hub/handlers/JoinsHandler.scala(0 hunks)hub/src/main/scala/ai/chronon/hub/handlers/ModelsHandler.scala(0 hunks)hub/src/main/scala/ai/chronon/hub/handlers/Paginate.scala(0 hunks)hub/src/main/scala/ai/chronon/hub/handlers/SearchHandler.scala(0 hunks)hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala(1 hunks)hub/src/main/scala/ai/chronon/hub/model/Model.scala(0 hunks)hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala(1 hunks)hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala(1 hunks)hub/src/test/scala/ai/chronon/hub/handlers/JoinHandlerTest.scala(0 hunks)hub/src/test/scala/ai/chronon/hub/handlers/ModelHandlerTest.scala(0 hunks)hub/src/test/scala/ai/chronon/hub/handlers/SearchHandlerTest.scala(0 hunks)online/src/main/java/ai/chronon/online/JavaFetcher.java(2 hunks)online/src/main/scala/ai/chronon/online/AvroConversions.scala(4 hunks)online/src/main/scala/ai/chronon/online/CatalystUtil.scala(5 hunks)online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala(2 hunks)online/src/main/scala/ai/chronon/online/Fetcher.scala(2 hunks)online/src/main/scala/ai/chronon/online/FetcherBase.scala(5 hunks)online/src/main/scala/ai/chronon/online/FetcherMain.scala(1 hunks)online/src/main/scala/ai/chronon/online/MetadataStore.scala(2 hunks)online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala(1 hunks)online/src/main/scala/ai/chronon/online/TopicChecker.scala(3 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(1 hunks)online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala(4 hunks)online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala(2 hunks)online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala(24 hunks)online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala(2 hunks)online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala(1 hunks)online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala(5 hunks)service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java(2 hunks)spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/Extensions.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/GroupBy.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/JobSubmitter.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/JoinBase.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/JoinUtils.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/TableUtils.scala(8 hunks)spark/src/main/scala/ai/chronon/spark/format/Format.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/streaming/KafkaStreamBuilder.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/streaming/TopicCheckerApp.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala(3 hunks)spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala(1 hunks)tools/build_rules/dependencies/maven_repository.bzl(2 hunks)tools/build_rules/dependencies/spark_repository.bzl(1 hunks)tools/build_rules/flink/BUILD(1 hunks)tools/build_rules/prelude_bazel(1 hunks)tools/build_rules/spark/BUILD(1 hunks)
💤 Files with no reviewable changes (13)
- api/py/ai/chronon/join.py
- hub/src/test/scala/ai/chronon/hub/handlers/ModelHandlerTest.scala
- hub/src/main/scala/ai/chronon/hub/handlers/Paginate.scala
- frontend/src/lib/types/Model/Model.ts
- hub/src/test/scala/ai/chronon/hub/handlers/JoinHandlerTest.scala
- hub/src/test/scala/ai/chronon/hub/handlers/SearchHandlerTest.scala
- hub/src/main/scala/ai/chronon/hub/handlers/JoinsHandler.scala
- hub/src/main/scala/ai/chronon/hub/handlers/SearchHandler.scala
- hub/src/main/scala/ai/chronon/hub/handlers/ModelsHandler.scala
- api/py/ai/chronon/group_by.py
- flink/src/main/scala/ai/chronon/flink/SourceProvider.scala
- hub/src/main/scala/ai/chronon/hub/model/Model.scala
- frontend/src/lib/types/Model/Model.test.ts
✅ Files skipped from review due to trivial changes (9)
- .tool-versions
- spark/src/main/scala/ai/chronon/spark/JobSubmitter.scala
- tools/build_rules/spark/BUILD
- api/src/main/scala/ai/chronon/api/DataType.scala
- aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala
- api/py/ai/chronon/repo/hub_uploader.py
- cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProviderTest.scala
- flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
- aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala
🚧 Files skipped from review as they are similar to previous changes (1)
- spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
🧰 Additional context used
📓 Learnings (4)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2024-11-26T19:47:53.900Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
.github/workflows/test_frontend.yaml (1)
Learnt from: ken-zlai
PR: zipline-ai/chronon#185
File: .github/workflows/test_frontend.yaml:41-42
Timestamp: 2025-01-08T18:29:51.176Z
Learning: In GitHub Actions, the `version` parameter in `awalsh128/cache-apt-pkgs-action` is a cache control number used to invalidate and regenerate the cache. It does not specify the package version - the actual package version is determined by the Ubuntu package repositories.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (2)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart.sh
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 39-39: Declare and assign separately to avoid masking return values.
(SC2155)
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/fill_templates.py
4-4: ai.chronon.cli.compile.parse_teams imported but unused
Remove unused import: ai.chronon.cli.compile.parse_teams
(F401)
api/py/ai/chronon/cli/compile/parse_configs.py
4-4: logging imported but unused
Remove unused import: logging
(F401)
6-6: typing.Dict imported but unused
Remove unused import
(F401)
6-6: typing.Tuple imported but unused
Remove unused import
(F401)
9-9: ai.chronon.cli.compile.parse_teams imported but unused
Remove unused import: ai.chronon.cli.compile.parse_teams
(F401)
api/py/ai/chronon/cli/compile/parse_teams.py
4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused
Remove unused import: ai.chronon.cli.compile.parse_configs.from_file
(F401)
api/py/ai/chronon/cli/compile/compile_context.py
2-2: logging.error imported but unused
Remove unused import: logging.error
(F401)
107-107: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
api/py/ai/chronon/repo/run.py
817-817: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
api/py/ai/chronon/cli/compile/validator.py
421-422: Use a single if statement instead of nested if statements
(SIM102)
api/py/ai/chronon/cli/compile/compiler.py
20-20: Undefined name console
(F821)
25-25: Undefined name console
(F821)
28-28: Undefined name console
(F821)
28-28: f-string without any placeholders
Remove extraneous f prefix
(F541)
48-48: Local variable status is assigned to but never used
Remove assignment to unused variable status
(F841)
api/py/ai/chronon/cli/entrypoint.py
2-2: sys imported but unused
Remove unused import: sys
(F401)
5-5: enum.Enum imported but unused
Remove unused import: enum.Enum
(F401)
17-17: Do not use bare except
(E722)
api/py/ai/chronon/cli/git_utils.py
4-4: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
33-33: Do not use bare except
(E722)
73-73: Do not use bare except
(E722)
🔇 Additional comments (213)
.github/workflows/require_triggered_status_checks.yaml (2)
1-8: Workflow Config OK.
YAML is concise and correctly sets up branch protection on pull requests.
9-14: Steps Config Validated.
The action is properly configured with token input and a 10s delay..plugin-versions (1)
2-2: LGTM!The bazelisk plugin entry is correctly formatted and uses a specific commit hash.
build.sbt (3)
218-219: LGTM!The Confluent dependencies are properly configured with consistent versions.
Also applies to: 225-225, 249-249
293-293: LGTM!Minor version bump for spark-bigquery-with-dependencies.
300-301: LGTM!Proper exclusion rule to prevent dependency conflicts.
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (2)
25-25: LGTM!Import required for ScalaTest matchers.
182-182: LGTM! Safer result handling.Consistent use of
.headOptionimproves robustness.Also applies to: 201-201, 225-225, 250-250, 275-275, 298-298, 310-310, 318-318, 327-327, 335-335, 352-352, 362-362, 372-372, 387-387, 407-407, 425-425, 443-443, 470-470, 488-488, 502-502, 521-521, 540-540, 553-553, 649-649
spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (3)
20-20: LGTM!Import follows package organization best practices.
85-95: LGTM!Clean integration with percentile-based metrics and edit operation tracking.
109-115: LGTM!Proper type-specific metric application.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2)
9-13: LGTM!Import changes align with the new date conversion test case.
48-52: Add test coverage for the new BigQueryFormat parameter.Pattern matching was updated to handle 3 parameters, but the test only verifies the type.
Consider adding assertions to validate the values of all three parameters.
online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (1)
48-48: Add sequence size validation.Silently taking the first result could hide data issues.
Consider adding size validation:
- reintroduceExceptions(catalystUtil.performSql(keys ++ values).headOption.orNull, values) + val results = catalystUtil.performSql(keys ++ values) + require(results.size <= 1, s"Expected at most one result, got ${results.size}") + reintroduceExceptions(results.headOption.orNull, values)✅ Verification successful
Validate SQL result sequence size.
- Adding an explicit size check is ideal to catch unexpected multiple rows.
- Consider updating the code as follows:
- reintroduceExceptions(catalystUtil.performSql(keys ++ values).headOption.orNull, values) + val results = catalystUtil.performSql(keys ++ values) // get SQL results + require(results.size <= 1, s"Expected at most one result, got ${results.size}") // validate size + reintroduceExceptions(results.headOption.orNull, values) // process first result or null🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
online/src/test/scala/ai/chronon/online/test/CatalystUtilHiveUDFTest.scala (1)
9-9: LGTM! Clear test description.The updated test description better reflects the functionality being tested.
online/src/main/scala/ai/chronon/online/CatalystUtil.scala (11)
44-44: Consider concurrency.
ArrayBuffer is not thread-safe.
113-114: Check backward compatibility.
Changing return type from Option to Seq might affect existing callers.
140-140: Consistent transformation.
Storing transformFunc returning Seq aligns with multi-result strategy.
154-154: Verify usage.
Method now yields Seq—ensure callers handle multiple results.
159-163: Multiple results.
Returning Seq[Map[String, Any]] is consistent with the broader changes.
201-210: Mutable buffer caution.
Ensure no concurrency issues with shared buffer usage.
215-221: Filter logic.
Returning ArrayBuffer fosters multi-row processing.
236-245: Projection chain.
Implementation seems coherent.
247-251: Simple projection.
Generates array buffers of single row.
255-259: Collect usage.
Potential memory risk if data is large.
262-266: Projection approach.
Straightforward.flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (2)
2-2: LGTM!Import statement aligns with the broader refactoring changes.
34-34: Verify timestamp availability.The simplified timestamp access looks cleaner, but ensure
tsMillisis always available inWriteResponse.✅ Verification successful
Timestamp availability verified.
tsMillisis defined inWriteResponse(inflink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala).- Its consistent use in
equalsandhashCodeconfirms its presence.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for WriteResponse definition and usage ast-grep --pattern 'case class WriteResponse { $$$ tsMillis: Long $$$ }'Length of output: 80
Script:
#!/bin/bash # Search for WriteResponse definitions across the codebase to check for tsMillis. rg -n "case class WriteResponse" .Length of output: 36
Script:
#!/bin/bash # Comprehensive search for WriteResponse definitions in Scala. rg -n "(case class|class|trait)\s+WriteResponse" .Length of output: 165
Script:
#!/bin/bash # Dump WriteResponse class for tsMillis context. sed -n '80,150p' ./flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scalaLength of output: 1350
flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (25)
4-5: Essential imports
All good.
9-12: New imports for watermark & Avro
No issues.
19-19: MetadataStore usage
Looks correct.
21-21: TopicInfo usage
No concerns.
22-23: Event-time imports
All good.
29-29: Checkpoint config import
Valid usage.
44-44: Duration import
Used for watermarks. Fine.
62-62: sinkFn updated
Now uses AvroCodecOutput. Seems consistent.
115-120: Watermark assignment
Good approach for event-time.
121-121: Avro output
No issues.
170-175: Second watermark assignment
Mirrors the first approach. Good.
203-203: Watermarks in tiling
Ensures correct event-time logic.
225-225: Tiling DS Avro output
Consistent with usage above.
260-263: AllowedOutOfOrderness
5 minutes for late data is typical.
264-267: Idleness timeout
Handles low event rates neatly.
268-278: Custom watermarkStrategy
Implements bounded out-of-orderness & idleness.
287-288: Mock source option
Minor, looks good.
289-290: kafkaBootstrap
Optional param. No concerns.
299-299: groupByName
Straight usage. Fine.
303-303: Kafka bootstrap
Clean usage.
314-349: Retrieving serving info
Robust. Good error handling.
360-361: Checkpoint retention commentary
No issues.
379-379: Creating jobDatastream
Looks fine.
381-385: Metrics sink
Nice separation.
390-390: buildApi method
Private scoping is good.hub/src/main/java/ai/chronon/hub/HubVerticle.java (2)
6-7: LGTM!Imports align with the new drift analysis functionality.
63-66: Verify error handling in ConfHandler.Routes look good. Consider verifying error handling for invalid configurations.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (2)
63-65: LGTM! Consistent client usage.Good practice to pass bigQueryClient explicitly.
37-43: LGTM! Verify direct write method compatibility.Safe changes with CREATE_NEVER disposition. Direct write method is more efficient.
✅ Verification successful
LGTM – Direct write and CREATE_NEVER are correctly set in GcpFormatProvider.scala.
- "writeMethod" is "direct" (Line 38).
- "createDisposition" uses CREATE_NEVER (Line 39).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if any tests or configurations assume indirect write method rg -l "writeMethod.*indirect"Length of output: 31
Script:
#!/bin/bash # Verify new configuration uses direct writeMethod and CREATE_NEVER disposition rg -n 'writeMethod\s*->\s*"direct"' rg -n 'CREATE_NEVER'Length of output: 217
Script:
#!/bin/bash # Check for explicit "direct" usage in writeMethod. grep -n '-> "direct"' cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scalaLength of output: 217
Script:
#!/bin/bash # Confirm direct writeMethod config using PCRE. grep -n -P "\"writeMethod\"\\s*->\\s*\"direct\"" cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala # Confirm CREATE_NEVER config. grep -n -F 'CREATE_NEVER' cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scalaLength of output: 348
flink/src/main/scala/org/apache/spark/sql/avro/AvroDeserializationSupport.scala (2)
21-25: Encoder creation clarity.
Method is concise. EnsureAvroDataToCatalysthandles schema changes gracefully.
71-80: Check null returns.
Returningnullafter errors can cause confusion in downstream processing.hub/src/main/scala/ai/chronon/hub/handlers/DriftHandler.scala (2)
63-66: Handle empty result more robustly.
headOption.getOrElse(...)returns an empty series. Check if that's correct.
68-79: Offset parsing structure.
Implementation is fine, but consider rejecting unrecognized suffixes more explicitly.api/py/ai/chronon/cli/compile/compile_context.py (1)
111-112: Hidden files logic bug.
You likely want to skip hidden files iff.startswith('.')is true. This logic is reversed.-if not f.startswith("."): # ignore hidden files - continue +if f.startswith("."): + continuehub/src/main/scala/ai/chronon/hub/handlers/ConfHandler.scala (3)
14-16: Class initialization looks good
Sufficient logging is used.
53-71: List retrieval
Straightforward approach; no issues found.
76-109: Search logic
Efficient usage of filter; ensure large registries are handled.flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (3)
28-48: Async wait
Proper concurrency parameters. Looks good.
99-105: Timeout handling
Logs error, increments counter, sets status false. Good minimal approach.
107-131: Multi-put logic
Handles success/failure well; logging is precise.frontend/src/lib/api/api.ts (14)
2-2: Additional imports
Well-structured references.Also applies to: 4-8, 13-14
48-50: getJoin
Simple wrapper, no issues.
52-54: getGroupBy
Mirrors getJoin. OK.
56-58: getModel
Similar pattern, no concerns.
60-62: getStagingQuery
Consistent usage of getConf.
64-68: search
Invokes a single endpoint. Good approach.
135-140: getConfList
Uses confType param. Looks correct.
142-144: getJoinList
Retrieve from ConfType.JOIN. Perfect.
146-148: getGroupByList
Works similarly.
150-152: getModelList
No issues.
154-156: getStagingQueryList
Consistency maintained.
158-172: getJoinDrift
Builds URL params well.
174-191: getColumnDrift
Similar structure, no concerns.
193-201: getColumnSummary
Fetches summary data similarly.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala (4)
5-11: Imports look appropriate.
No issues noted with these BigQuery-related imports.
17-18: Case class extension is fine.
The constructor parameterbqClientis a clear approach for BigQuery.
29-65: Creation logic seems correct.
Aligned with the external check for table existence, and consistent with the retrieved learnings.
139-142: Unsupported operations are clearly marked.
No further concerns.flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (7)
8-9: New imports are correct.
These support the new output type design.
92-92: Good move to AvroCodecOutput.
Replaces PutRequest nicely.
104-104: flatMap signature updated.
Applies AvroCodecOutput in collector—works well.
116-120: Map-to-AvroCodecOutput logic looks valid.
Key/value bytes are handled confidently.
132-132: TiledAvroCodecFn also updated to AvroCodecOutput.
Consistent approach.
143-143: flatMap for tiles updated.
Matches new output type usage.
155-173: Tile-to-AvroCodecOutput conversion is fine.
Retains debug logging; no issue found.flink/src/main/scala/ai/chronon/flink/TestFlinkJob.scala (7)
5-7: Import updates support row-based logic.
No issues noted.Also applies to: 11-12, 15-15, 18-22, 25-26, 28-28, 33-36
49-73: E2EEventSource now yields Row.
Implementation is straightforward and uses parallelism well.
80-80: Time difference calculation is fine.
No changes needed.
85-92: DummyInitializationContext is minimal and functional.
Works for the tests.
95-106: Schema definitions appear correct.
Switch to DataType-based fields is consistent.
108-120: makeSource logic is coherent.
Encodes to Avro and decodes into Spark Rows effectively.
197-212: buildTestFlinkJob approach is consistent.
Builds a Flink pipeline with row-based sources.frontend/src/lib/components/NavigationBar.svelte (3)
25-30: Imports look good.
52-52: Initialization is correct.
60-69: Merged arrays for consistent typed results.online/src/main/scala/ai/chronon/online/FetcherBase.scala (4)
47-47: New import for ExecutionContext is valid.
327-364: Filtering out invalid requests improves safety.
613-644: Updated response parsing with fallback logic is clear.
382-382: Confirm usage of.toSeqafter this lineEnsure no unexpected map iteration breakage.
✅ Verification successful
Confirmed:
.toSequsage in FetcherBase is intact.
The shell script output indicates that the call remains present inonline/src/main/scala/ai/chronon/online/FetcherBase.scala, aligning with similar conversions elsewhere. No unexpected iteration issues were found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Searching suspicious references in the file to confirm usage rg -A 3 "toSeq"Length of output: 47736
api/py/ai/chronon/repo/run.py (6)
33-33: Enum import helps new job types.
135-137: New jar constants for Flink and Dataproc.
139-141: New job type enum broadens streaming support.
419-426: New fields handle Flink streaming params.
430-431: Conditional fetch logic is consistent.
983-984: Extended signature for new streaming params is fine.frontend/src/routes/stagingqueries/+page.svelte (1)
1-7: LGTM!Clean and consistent with other route components.
frontend/src/routes/models/+page.svelte (1)
1-7: LGTM!Consistent implementation.
frontend/src/routes/groupbys/+page.svelte (1)
1-7: LGTM!Matches the pattern used in other routes.
frontend/src/routes/joins/+page.svelte (1)
15-15: LGTM!The
LogicalNodeTablecomponent is used correctly with the required props.flink/src/main/scala/ai/chronon/flink/SchemaProvider.scala (1)
8-17: LGTM!Well-structured abstract class with clear documentation and purpose.
tools/build_rules/dependencies/spark_repository.bzl (1)
21-24: LGTM! Good catch on the version conflict.The exclusion prevents NoSuchMethodError by avoiding commons-cli 1.2 vs 1.5.0 conflict with Flink runtime.
tools/build_rules/flink/BUILD (2)
1-1: LGTM! Public visibility is appropriate for build rules.
5-19: Verify the main_class hack.The hack comment needs explanation.
Add a comment explaining why None is used as main_class.
api/src/main/scala/ai/chronon/api/TilingUtils.scala (1)
17-25: LGTM! Thread-local serializers prevent concurrency issues.Good use of ThreadLocal to handle concurrent serialization safely.
flink/src/test/resources/user.avsc (1)
1-59: Well-structured and documented schema.The Avro schema is comprehensive with appropriate field types and documentation.
frontend/src/lib/api/api.test.ts (1)
33-35: LGTM!Test cases are well-structured and correctly updated to reflect the API changes.
Also applies to: 50-50, 61-61
frontend/src/lib/types/Entity/Entity.ts (1)
55-56: Track workaround for future removal.Consider creating a ticket to track the removal of this workaround once the Asana task is resolved.
✅ Verification successful
Reminder: Create a tracking ticket for workaround removal.
- The workaround in
frontend/src/lib/types/Entity/Entity.tsis confirmed.- Similar "todo" and workaround comments appear elsewhere.
- Recommend creating a ticket to track its removal once the Asana task is resolved.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other workarounds rg -i "workaround|hack|todo|fixme" --type tsLength of output: 933
api/py/ai/chronon/cli/compile/parse_configs.py (1)
14-20: LGTM!Well-structured dataclass with clear field definitions.
api/py/ai/chronon/cli/entrypoint.py (2)
27-34: LGTM!Well-structured CLI commands with clear documentation and appropriate default values.
Also applies to: 36-59, 61-67
10-19:⚠️ Potential issueSpecify exception type in get_current_branch.
def get_current_branch(): try: return ( subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"]) .decode("utf-8") .strip() ) - except: + except subprocess.CalledProcessError as e: + logger.warning(f"Failed to get branch: {e}") return "main"Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
17-17: Do not use bare
except(E722)
flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (1)
24-27: LGTM!Clean migration from PutRequest to AvroCodecOutput with consistent test assertions.
Also applies to: 46-49
spark/src/main/scala/ai/chronon/spark/streaming/KafkaStreamBuilder.scala (1)
21-21: LGTM!Clean addition of TopicChecker import.
frontend/src/routes/joins/[slug]/services/joins.service.ts (2)
2-2: LGTM! Import changes align with the new API structure.Also applies to: 5-5
44-46: Verify the optional chaining.The change from
models.itemstomodels.models?.find()suggests a potential breaking change in the API response structure.✅ Verification successful
Optional Chaining Verified
The search confirms only the use ofmodels.models?.find(), matching the updated API structure. No evidence suggests any misuse of optional chaining.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other usages of the models property rg -A 2 'models\.(items|models)'Length of output: 324
flink/BUILD.bazel (2)
15-15: LGTM! Kafka client dependency added.
60-76: LGTM! Binary targets properly configured.The exclusion of runtime dependencies via deploy_env prevents version conflicts.
api/py/ai/chronon/cli/compile/parse_teams.py (4)
13-38: LGTM! Team loading implementation.Clear error handling and logging of found teams.
74-111: LGTM! Helper functions for merging.The merge functions handle None cases properly.
4-4: 🛠️ Refactor suggestionRemove unused import.
The
from_fileimport is not used in this module.-from ai.chronon.cli.compile.parse_configs import from_fileLikely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
ai.chronon.cli.compile.parse_configs.from_fileimported but unusedRemove unused import:
ai.chronon.cli.compile.parse_configs.from_file(F401)
40-72: Verify metadata update logic.The metadata update assumes specific object structure. Ensure all objects passed to this function have the required fields.
✅ Verification successful
Verified metadata update logic.
- The sole call in
compiler.pypassesco.obj, andupdate_metadataenforces thatobj.metaData(with itsnameandteamfields) is non-null.- No usage contradicts the required structure.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to update_metadata rg -A 3 'update_metadata\('Length of output: 692
tools/build_rules/prelude_bazel (1)
41-42: LGTM! Flink dependencies added.The Kafka connector and Avro dependencies align with the changes in BUILD.bazel.
api/py/ai/chronon/cli/compile/compiler.py (1)
111-116: Add error handling for file operations.File operations should handle IOError and OSError exceptions.
api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (1)
1-79: LGTM! Well-structured tests with good edge case coverage.Tests handle nulls, special values, and edge cases thoroughly.
flink/src/test/scala/ai/chronon/flink/test/SchemaRegistrySchemaProviderSpec.scala (1)
1-60: LGTM! Comprehensive schema registry tests.Good coverage of success and failure cases with proper mocking.
spark/src/main/scala/ai/chronon/spark/format/Format.scala (2)
10-10: LGTM!Logger initialization follows best practices with @transient and lazy val.
77-89: Similar error handling needed here.Same error handling improvement as suggested for createTable should be applied here.
api/thrift/hub.thrift (1)
126-133: Add validation for branch/version exclusivity.The comment states that either branch or version should be set, but this isn't enforced. Consider adding validation in the handler.
api/py/ai/chronon/cli/git_utils.py (4)
33-34: Specify exception type.- except: + except OSError: pass🧰 Tools
🪛 Ruff (0.8.2)
33-33: Do not use bare
except(E722)
73-74: Specify exception type.- except: + except OSError: return None🧰 Tools
🪛 Ruff (0.8.2)
73-73: Do not use bare
except(E722)
77-128: LGTM!Good error handling and logging implementation.
130-157: LGTM!Well-structured implementation with proper error handling.
flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (4)
20-36: LGTM!Well-documented with proper equals/hashCode implementation.
47-65: LGTM!Well-documented with proper equals/hashCode implementation.
71-91: LGTM!Well-documented with proper equals/hashCode implementation.
96-120: LGTM!Well-documented with proper equals/hashCode implementation.
flink/src/test/scala/org/apache/spark/sql/avro/AvroDeSerializationSupportSpec.scala (2)
23-83: LGTM!Well-structured test cases with proper resource cleanup.
86-125: LGTM!Well-structured helper object with comprehensive test data creation.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (1)
42-131: LGTM!Formatting changes improve readability.
tools/build_rules/dependencies/maven_repository.bzl (2)
84-85: LGTM: Added Flink Kafka and Avro dependencies.The additions align with the PR objectives for enhanced Flink support.
104-119: Well-documented Hadoop exclusions.Clear explanation for excluding Hadoop dependencies to prevent IllegalAccessError.
api/thrift/observability.thrift (3)
141-142: LGTM: Added default drift metric.Setting JENSEN_SHANNON as default aligns with industry standards per the documentation.
144-162: LGTM: Well-structured JoinDriftRequest and JoinDriftResponse.Clean interface design with required and optional fields.
157-162: LGTM: Consistent JoinSummaryRequest structure.Follows the same pattern as JoinDriftRequest.
spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (1)
27-27: LGTM: Added date type conversion support.Proper handling of Avro date logical types during serialization.
Also applies to: 50-50
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (3)
98-99: LGTM: Using magic value instead of null.More robust handling of missing values.
117-133: LGTM: Improved validation in collectDoubles.Better tracking of valid inputs with sawValidInput flag.
147-155: LGTM: Enhanced handling of special values.Properly handles NaN and Infinite values by converting to magic values.
flink/src/test/scala/ai/chronon/flink/test/FlinkJobIntegrationTest.scala (1)
31-44: Add null checks for WriteResponse fields.Validate keyBytes, valueBytes, and tsMillis before access to prevent NPEs.
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)
87-94: LGTM!The filtering logic ensures that only relevant keys are processed.
online/src/main/java/ai/chronon/online/JavaFetcher.java (1)
27-27: LGTM!Clean builder pattern implementation with good example usage.
Also applies to: 39-48, 50-119
hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala (1)
1-296: LGTM!Comprehensive test coverage with good use of mocking and edge cases.
online/src/main/scala/ai/chronon/online/AvroConversions.scala (1)
20-20: LGTM!Clean date type support with good documentation.
Also applies to: 29-29, 37-44, 58-63, 121-122
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
83-83: LGTM!Adding
java.time.LocalDateto Kryo serializer is necessary for date serialization.spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (1)
114-115: LGTM!Improved null safety using
Optionto handle potential null values fromgetPercentileDriftSeries.cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (3)
69-70: LGTM!Added savepoint support for Flink jobs.
193-198: LGTM!Well-organized argument prefixes improve code readability.
246-257: LGTM!Clean pattern matching for job type handling.
service_commons/src/main/java/ai/chronon/service/RouteHandlerWrapper.java (2)
90-103: LGTM! Improved error handling and serialization.The new implementation provides better error handling and uses a consistent serialization approach with TSimpleJSONProtocol.
5-5: LGTM!Import required for JSON serialization.
online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (3)
229-240: LGTM! Good test coverage for the happy path.Test verifies basic fetch functionality with proper assertions.
242-255: LGTM! Good null handling test.Test ensures proper behavior with null keys.
257-270: LGTM! Good error case coverage.Test verifies proper handling of missing keys.
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3)
116-116: LGTM! Consistent null handling.Changes standardize null handling using Constants.magicNull values.
Also applies to: 254-255, 257-258, 313-313
317-334: LGTM! Good edge case coverage.Test verifies handling of Long.MAX_VALUE and magicNullLong values.
352-373: LGTM! Comprehensive null handling tests.Test covers various combinations of null and non-null fields.
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
74-74: LGTM! Better method naming and caching.Changes improve clarity with better method naming and add efficient caching with lazy val.
Also applies to: 111-111
hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala (1)
122-130: LGTM! Improved null handling using Option.The changes enhance safety by using
Optionandmapmethods instead of direct null checks.api/thrift/api.thrift (2)
8-8: LGTM! Updated thrift generation command.The command path has been updated to use relative paths.
441-457: LGTM! Well-structured data types for configuration.The new
EnvironmentVariablesandTeamstructs provide a clean way to manage environment variables and team information.spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (4)
26-26: LGTM! Consolidated Spark SQL imports.Using a wildcard import for Spark SQL classes improves readability.
34-34: LGTM! Added TestRecord case class.Simple and clean test data structure.
487-500: LGTM! Added test for empty DataFrame handling.Good coverage for edge case handling.
480-481: LGTM! Standardized table creation with file format.Consistent use of PARQUET format for table creation.
Also applies to: 521-522
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
386-392: LGTM! Added SQL string escaping.Properly escaping single quotes prevents SQL injection vulnerabilities.
api/py/ai/chronon/cli/compile/validator.py (4)
22-32: LGTM!The new imports are well-organized and necessary for the added functionality.
183-213: LGTM!The new helper functions are well-structured and encapsulate specific validation logic.
220-242: LGTM!The constructor changes improve parameter organization and initialization.
474-479: LGTM!The error messages are clear and well-formatted.
spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1)
292-293: Verify message bus compatibility.The change from
topicTypetomessageBusmight affect existing topic configurations.✅ Verification successful
Message bus usage verified.
- No
topicTypefound; all code now usesmessageBus.- Consistent usage across modules (spark, flink, online).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining uses of topicType rg -A 2 "topicType" # Search for messageBus usage patterns rg -A 2 "messageBus"Length of output: 1905
spark/src/main/scala/ai/chronon/spark/JoinBase.scala (1)
236-238: LGTM!The variable renaming from
leftDftostatsDfimproves code clarity.online/src/main/scala/ai/chronon/online/Fetcher.scala (1)
99-107: LGTM!The addition of
executionContextOverrideprovides good flexibility for execution context management.spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1)
188-188: LGTM! Appropriate logging level change.Changed logging level from error to info since these are expected states, not errors.
Also applies to: 197-197
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
299-299: LGTM! Better method name.Changed from
timeRangetocalculateTimeRangefor better clarity.spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
630-632: LGTM! Good refactoring.Simplified code by delegating to FetcherMain and reusing FetcherArgs.
.github/workflows/test_frontend.yaml (1)
39-39: LGTM! Action version bump.Updated cache-apt-pkgs-action to latest version v1.4.3.
api/py/test/sample/teams.json (2)
18-18: GCP Cluster Name Updated
GCP_DATAPROC_CLUSTER_NAME is now set to "zipline-canary-cluster".
64-64: Quickstart Namespace Updated
Namespace changed to "canary-443022.data" to match the new project structure.distribution/build_and_upload_gcp_artifacts.sh (3)
44-45: New Build Steps Added
Added sbt commands for Flink and Service assemblies.
49-50: Jar Path Assignments
FLINK_JAR and SERVICE_JAR variables are assigned; verify the paths align with your build outputs.
62-70: Jar Existence Checks
Checks for SERVICE_JAR and FLINK_JAR help catch build failures early.distribution/run_zipline_quickstart.sh (1)
20-27: Git Config Checkout
Git clone and branch checkout steps look correct; ensure SSH key access is properly configured.🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 22-22: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
frontend/src/routes/joins/+page.server.ts (1)
1-4: LGTM! Clean refactor to use centralized config loader.Consistent with other route handlers.
frontend/src/routes/models/+page.server.ts (1)
1-4: LGTM! Follows the established pattern.Identical to other route handlers.
frontend/src/routes/groupbys/+page.server.ts (1)
1-4: LGTM! Maintains consistency with other routes.Same implementation as other handlers.
frontend/src/routes/stagingqueries/+page.server.ts (1)
1-4: LGTM! Completes the unified approach.Matches the pattern used in other routes.
Let's verify the implementation of the shared
loadConfListfunction:✅ Verification successful
LoadConfList Implementation Verified
The implementation ofloadConfListinfrontend/src/lib/server/conf-loader.tsmatches the expected unified approach. The function correctly extracts the pathname, finds the corresponding entity, handles missing matches, wraps the API call in a try/catch, and returns the required structure.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the implementation of loadConfList ast-grep --pattern 'export function loadConfList($_) { $$$ }'Length of output: 2103
| val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]]) | ||
| val resultJson = gson.toJson(result.head) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null safety checks.
Direct use of .head could throw if empty. Consider using .headOption or handling empty results.
- val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
- val resultJson = gson.toJson(result.head)
+ val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]])
+ val resultJson = result.headOption.map(gson.toJson).getOrElse("[]")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]]) | |
| val resultJson = gson.toJson(result.head) | |
| val result = cu.performSql(decoder.apply(dks).asInstanceOf[Array[Any]]) | |
| val resultJson = result.headOption.map(gson.toJson).getOrElse("[]") |
| val resList = cu.performSql(CommonScalarsRow) | ||
| assertEquals(resList.size, 1) | ||
| val resMap = resList.head | ||
| assertEquals(resMap.size, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Empty results are not handled in performSql implementations
- The CatalystUtil.scala implementations of performSql don't include an empty-result check.
- The test in CatalystUtilHiveUDFTest.scala assumes a non-empty result by calling head, which can lead to runtime errors.
🔗 Analysis chain
Verify error handling for empty results.
The code assumes the list will always have at least one element.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for error handling in performSql implementation
ast-grep --pattern 'def performSql($$$) {
$$$
if ($results.isEmpty) {
$$$
}
$$$
}'
Length of output: 96
Script:
#!/bin/bash
# Show context of performSql definition.
rg -A 12 "def performSql\(" --no-heading
echo "-----"
# Look for any empty results error handling
rg "if\s*\(.*\.isEmpty\)" --no-heading -n
Length of output: 13789
| parsedSchema.schemaType() match { | ||
| case AvroSchema.TYPE => | ||
| val schema = parsedSchema.asInstanceOf[AvroSchema] | ||
| AvroDeserializationSupport.build(topicInfo.name, schema.canonicalString(), schemaRegistryWireFormat = true) | ||
| case _ => throw new IllegalArgumentException(s"Unsupported schema type: ${parsedSchema.schemaType()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Plan for more schema types.
This code currently only handles Avro. If support for other formats is needed later, consider a pluggable design.
| def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = { | ||
| val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap)) | ||
| val adminClient = AdminClient.create(props) | ||
| val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic); | ||
| topicDescription.get().partitions().size() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Close admin client.
Ensure the adminClient is closed to free resources.
def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = {
val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap))
val adminClient = AdminClient.create(props)
val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic)
val partitions = topicDescription.get().partitions().size()
+ adminClient.close()
partitions
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = { | |
| val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap)) | |
| val adminClient = AdminClient.create(props) | |
| val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic); | |
| topicDescription.get().partitions().size() | |
| } | |
| def getPartitions(topic: String, bootstrap: String, additionalProps: Map[String, String] = Map.empty): Int = { | |
| val props = mapToJavaProperties(additionalProps ++ Map(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrap)) | |
| val adminClient = AdminClient.create(props) | |
| val topicDescription = adminClient.describeTopics(util.Arrays.asList(topic)).values().get(topic) | |
| val partitions = topicDescription.get().partitions().size() | |
| adminClient.close() | |
| partitions | |
| } |
| executionContextOverride: ExecutionContext = null) | ||
| extends MetadataStore(kvStore, metaDataSet, timeoutMillis, executionContextOverride) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid null ExecutionContext.
Use a default or require one:
-def executionContextOverride: ExecutionContext = null
+def executionContextOverride: ExecutionContext = ExecutionContext.globalCommittable suggestion skipped: line range outside the PR's diff.
| def createTable(df: DataFrame, | ||
| tableName: String, | ||
| partitionColumns: Seq[String], | ||
| tableProperties: Map[String, String], | ||
| fileFormat: String): (String => Unit) => Unit = { | ||
|
|
||
| def inner(df: DataFrame, | ||
| tableName: String, | ||
| partitionColumns: Seq[String], | ||
| tableProperties: Map[String, String], | ||
| fileFormat: String)(sqlEvaluator: String => Unit) = { | ||
| val creationSql = | ||
| createTableSql(tableName, | ||
| df.schema, | ||
| partitionColumns, | ||
| tableProperties, | ||
| fileFormatString(fileFormat), | ||
| createTableTypeString) | ||
|
|
||
| sqlEvaluator(creationSql) | ||
|
|
||
| } | ||
|
|
||
| inner(df, tableName, partitionColumns, tableProperties, fileFormat) | ||
|
|
||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for SQL execution.
The SQL execution could fail. Consider wrapping the sqlEvaluator call in a Try block.
def inner(df: DataFrame,
tableName: String,
partitionColumns: Seq[String],
tableProperties: Map[String, String],
fileFormat: String)(sqlEvaluator: String => Unit) = {
val creationSql =
createTableSql(tableName,
df.schema,
partitionColumns,
tableProperties,
fileFormatString(fileFormat),
createTableTypeString)
- sqlEvaluator(creationSql)
+ Try(sqlEvaluator(creationSql)) match {
+ case Success(_) => logger.info(s"Successfully created table $tableName")
+ case Failure(e) =>
+ logger.error(s"Failed to create table $tableName", e)
+ throw e
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def createTable(df: DataFrame, | |
| tableName: String, | |
| partitionColumns: Seq[String], | |
| tableProperties: Map[String, String], | |
| fileFormat: String): (String => Unit) => Unit = { | |
| def inner(df: DataFrame, | |
| tableName: String, | |
| partitionColumns: Seq[String], | |
| tableProperties: Map[String, String], | |
| fileFormat: String)(sqlEvaluator: String => Unit) = { | |
| val creationSql = | |
| createTableSql(tableName, | |
| df.schema, | |
| partitionColumns, | |
| tableProperties, | |
| fileFormatString(fileFormat), | |
| createTableTypeString) | |
| sqlEvaluator(creationSql) | |
| } | |
| inner(df, tableName, partitionColumns, tableProperties, fileFormat) | |
| } | |
| def createTable(df: DataFrame, | |
| tableName: String, | |
| partitionColumns: Seq[String], | |
| tableProperties: Map[String, String], | |
| fileFormat: String): (String => Unit) => Unit = { | |
| def inner(df: DataFrame, | |
| tableName: String, | |
| partitionColumns: Seq[String], | |
| tableProperties: Map[String, String], | |
| fileFormat: String)(sqlEvaluator: String => Unit) = { | |
| val creationSql = | |
| createTableSql(tableName, | |
| df.schema, | |
| partitionColumns, | |
| tableProperties, | |
| fileFormatString(fileFormat), | |
| createTableTypeString) | |
| Try(sqlEvaluator(creationSql)) match { | |
| case Success(_) => logger.info(s"Successfully created table $tableName") | |
| case Failure(e) => | |
| logger.error(s"Failed to create table $tableName", e) | |
| throw e | |
| } | |
| } | |
| inner(df, tableName, partitionColumns, tableProperties, fileFormat) | |
| } |
| val keyMapList = | ||
| if (args.keyJson.isDefined) { | ||
| Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson()))) | ||
| } else { | ||
| logger.info(s"Reading requests from ${args.keyJsonFile()}") | ||
| val file = Source.fromFile(args.keyJsonFile()) | ||
| val mapList = file.getLines().map(json => readMap(json)).toList | ||
| file.close() | ||
| mapList | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation for JSON files.
Validate JSON file path and content before processing.
val keyMapList =
if (args.keyJson.isDefined) {
+ validateJson(args.keyJson())
Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson())))
} else {
logger.info(s"Reading requests from ${args.keyJsonFile()}")
+ validateFilePath(args.keyJsonFile())
val file = Source.fromFile(args.keyJsonFile())
val mapList = file.getLines().map(json => readMap(json)).toList
file.close()
mapList
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val keyMapList = | |
| if (args.keyJson.isDefined) { | |
| Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson()))) | |
| } else { | |
| logger.info(s"Reading requests from ${args.keyJsonFile()}") | |
| val file = Source.fromFile(args.keyJsonFile()) | |
| val mapList = file.getLines().map(json => readMap(json)).toList | |
| file.close() | |
| mapList | |
| } | |
| val keyMapList = | |
| if (args.keyJson.isDefined) { | |
| validateJson(args.keyJson()) | |
| Try(readMapList(args.keyJson())).toOption.getOrElse(Seq(readMap(args.keyJson()))) | |
| } else { | |
| logger.info(s"Reading requests from ${args.keyJsonFile()}") | |
| validateFilePath(args.keyJsonFile()) | |
| val file = Source.fromFile(args.keyJsonFile()) | |
| val mapList = file.getLines().map(json => readMap(json)).toList | |
| file.close() | |
| mapList | |
| } |
| def impl(props: Map[String, String]): Api = { | ||
| val urls = Array(new File(onlineJar()).toURI.toURL) | ||
| val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader) | ||
| val cls = cl.loadClass(onlineClass()) | ||
| val constructor = cls.getConstructors.apply(0) | ||
| val onlineImpl = constructor.newInstance(props) | ||
| onlineImpl.asInstanceOf[Api] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement security measures for class loading.
Current implementation has security risks:
- Unrestricted class loading
- No validation of jar path
Add security checks:
def impl(props: Map[String, String]): Api = {
+ // Validate jar path is in allowed directory
+ val allowedPath = new File("/path/to/allowed/jars")
+ val jarFile = new File(onlineJar())
+ if (!jarFile.getCanonicalPath.startsWith(allowedPath.getCanonicalPath)) {
+ throw new SecurityException("Invalid jar location")
+ }
val urls = Array(new File(onlineJar()).toURI.toURL)
val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader)
+ // Validate class name against allowlist
+ if (!allowedClasses.contains(onlineClass())) {
+ throw new SecurityException("Invalid class")
+ }
val cls = cl.loadClass(onlineClass())
val constructor = cls.getConstructors.apply(0)
val onlineImpl = constructor.newInstance(props)
onlineImpl.asInstanceOf[Api]
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def impl(props: Map[String, String]): Api = { | |
| val urls = Array(new File(onlineJar()).toURI.toURL) | |
| val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader) | |
| val cls = cl.loadClass(onlineClass()) | |
| val constructor = cls.getConstructors.apply(0) | |
| val onlineImpl = constructor.newInstance(props) | |
| onlineImpl.asInstanceOf[Api] | |
| } | |
| def impl(props: Map[String, String]): Api = { | |
| // Validate jar path is in allowed directory | |
| val allowedPath = new File("/path/to/allowed/jars") | |
| val jarFile = new File(onlineJar()) | |
| if (!jarFile.getCanonicalPath.startsWith(allowedPath.getCanonicalPath)) { | |
| throw new SecurityException("Invalid jar location") | |
| } | |
| val urls = Array(new File(onlineJar()).toURI.toURL) | |
| val cl = ScalaClassLoader.fromURLs(urls, this.getClass.getClassLoader) | |
| // Validate class name against allowlist | |
| if (!allowedClasses.contains(onlineClass())) { | |
| throw new SecurityException("Invalid class") | |
| } | |
| val cls = cl.loadClass(onlineClass()) | |
| val constructor = cls.getConstructors.apply(0) | |
| val onlineImpl = constructor.newInstance(props) | |
| onlineImpl.asInstanceOf[Api] | |
| } |
| class MetadataStore(kvStore: KVStore, | ||
| val dataset: String = MetadataDataset, | ||
| timeoutMillis: Long, | ||
| executionContextOverride: ExecutionContext = null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use Option instead of null default value.
Replace executionContextOverride: ExecutionContext = null with executionContextOverride: Option[ExecutionContext] = None.
-class MetadataStore(kvStore: KVStore,
- val dataset: String = MetadataDataset,
- timeoutMillis: Long,
- executionContextOverride: ExecutionContext = null) {
+class MetadataStore(kvStore: KVStore,
+ val dataset: String = MetadataDataset,
+ timeoutMillis: Long,
+ executionContextOverride: Option[ExecutionContext] = None) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class MetadataStore(kvStore: KVStore, | |
| val dataset: String = MetadataDataset, | |
| timeoutMillis: Long, | |
| executionContextOverride: ExecutionContext = null) { | |
| class MetadataStore(kvStore: KVStore, | |
| val dataset: String = MetadataDataset, | |
| timeoutMillis: Long, | |
| executionContextOverride: Option[ExecutionContext] = None) { |
| implicit val executionContext: ExecutionContext = | ||
| Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Update Option handling.
Update to use idiomatic Option handling.
- implicit val executionContext: ExecutionContext =
- Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext)
+ implicit val executionContext: ExecutionContext =
+ executionContextOverride.getOrElse(FlexibleExecutionContext.buildExecutionContext)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| implicit val executionContext: ExecutionContext = | |
| Option(executionContextOverride).getOrElse(FlexibleExecutionContext.buildExecutionContext) | |
| implicit val executionContext: ExecutionContext = | |
| executionContextOverride.getOrElse(FlexibleExecutionContext.buildExecutionContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (14)
api/py/ai/chronon/cli/compile/serializer.py (4)
17-20: Remove unused import.
TJSONProtocolFactoryis imported but never used.-from thrift.protocol.TJSONProtocol import ( - TSimpleJSONProtocolFactory, - TJSONProtocolFactory, -) +from thrift.protocol.TJSONProtocol import TSimpleJSONProtocolFactory🧰 Tools
🪛 Ruff (0.8.2)
19-19:
thrift.protocol.TJSONProtocol.TJSONProtocolFactoryimported but unusedRemove unused import:
thrift.protocol.TJSONProtocol.TJSONProtocolFactory(F401)
78-82: Optimize integer type handling.Combine integer type checks for better readability.
- elif ttype == TType.I64: - ret = int(val) - elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE: - ret = int(val) + elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE): + ret = int(val)🧰 Tools
🪛 Ruff (0.8.2)
78-81: Combine
ifbranches using logicaloroperatorCombine
ifbranches(SIM114)
41-86: Add input validation.Add type checking for input values to prevent runtime errors.
def _convert(self, val, ttype, ttype_info): + if val is None: + return None if ttype == TType.STRUCT:🧰 Tools
🪛 Ruff (0.8.2)
78-81: Combine
ifbranches using logicaloroperatorCombine
ifbranches(SIM114)
93-99: Add docstring explaining binary format.Document the binary format specification for better maintainability.
def json2binary(json_str, thrift_class): + """Convert JSON to Thrift binary format. + + Args: + json_str: JSON string to convert + thrift_class: Target Thrift class + Returns: + bytes: Thrift binary format using TBinaryProtocolAccelerated + """ thrift = json2thrift(json_str, thrift_class)api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unusederrorimport.
Line 2 importserrorfromloggingbut never uses it. Trim unnecessary imports to keep the code clean.Apply this diff:
-from logging import error🧰 Tools
🪛 Ruff (0.8.2)
2-2:
logging.errorimported but unusedRemove unused import:
logging.error(F401)
107-107: Rename unused loop variable.
Renamesub_dirsto_sub_dirsto indicate it’s unused.for sub_root, sub_dirs, sub_files in os.walk(output_dir): - for sub_root, sub_dirs, sub_files in os.walk(output_dir): + for sub_root, _sub_dirs, sub_files in os.walk(output_dir):🧰 Tools
🪛 Ruff (0.8.2)
107-107: Loop control variable
sub_dirsnot used within loop bodyRename unused
sub_dirsto_sub_dirs(B007)
api/py/ai/chronon/cli/compile/fill_templates.py (1)
4-4: Remove unused import.
The importparse_teamsis never referenced. It’s best to remove it.-import ai.chronon.cli.compile.parse_teams as teams🧰 Tools
🪛 Ruff (0.8.2)
4-4:
ai.chronon.cli.compile.parse_teamsimported but unusedRemove unused import:
ai.chronon.cli.compile.parse_teams(F401)
api/py/ai/chronon/cli/compile/parse_configs.py (1)
4-4: Remove unused imports.-import logging from typing import Any, Dict, List, Tuple -import ai.chronon.cli.compile.parse_teams as teamsAlso applies to: 6-6, 9-9
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
loggingimported but unusedRemove unused import:
logging(F401)
api/py/ai/chronon/cli/entrypoint.py (2)
2-2: Remove unused imports.-import sys -from enum import EnumAlso applies to: 5-5
🧰 Tools
🪛 Ruff (0.8.2)
2-2:
sysimported but unusedRemove unused import:
sys(F401)
49-53: Consider using explicit flags instead of scope.Replace
scopewith explicitrun_upstreamandrun_downstreamflags for clarity.api/py/test/test_git_utils.py (1)
5-5: Remove unused import.-from pathlib import Path🧰 Tools
🪛 Ruff (0.8.2)
5-5:
pathlib.Pathimported but unusedRemove unused import:
pathlib.Path(F401)
api/thrift/api.thrift (1)
254-254: Consider documenting the deprecation reason.Add a comment explaining why the env field is deprecated.
api/py/ai/chronon/cli/compile/validator.py (1)
421-428: Consider simplifying nested if statements.Combine the nested if statements into a single condition.
- if join.metaData.online: - if offline_included_group_bys: + if join.metaData.online and offline_included_group_bys:🧰 Tools
🪛 Ruff (0.8.2)
421-422: Use a single
ifstatement instead of nestedifstatements(SIM102)
.github/workflows/test_scala_no_spark.yaml (1)
45-47: Consider removing unused SBT configuration.The SBT_OPTS environment variable is no longer needed with Bazel.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (26)
.github/image/Dockerfile(2 hunks).github/workflows/test_scala_no_spark.yaml(1 hunks)api/py/ai/chronon/cli/compile/compile_context.py(1 hunks)api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/fill_templates.py(1 hunks)api/py/ai/chronon/cli/compile/parse_configs.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(1 hunks)api/py/ai/chronon/cli/compile/serializer.py(1 hunks)api/py/ai/chronon/cli/compile/validator.py(12 hunks)api/py/ai/chronon/cli/entrypoint.py(1 hunks)api/py/ai/chronon/cli/git_utils.py(1 hunks)api/py/ai/chronon/cli/logger.py(1 hunks)api/py/ai/chronon/cli/logging.py(1 hunks)api/py/ai/chronon/group_by.py(0 hunks)api/py/ai/chronon/join.py(0 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/test/sample/teams.py(1 hunks)api/py/test/test_git_utils.py(1 hunks)api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala(2 hunks)api/thrift/api.thrift(4 hunks)api/thrift/common.thrift(1 hunks)hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala(8 hunks)hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala(9 hunks)online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala(12 hunks)spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala(1 hunks)tools/build_rules/thrift/thrift.bzl(0 hunks)
💤 Files with no reviewable changes (3)
- api/py/ai/chronon/join.py
- api/py/ai/chronon/group_by.py
- tools/build_rules/thrift/thrift.bzl
✅ Files skipped from review due to trivial changes (2)
- hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala
- hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala
🚧 Files skipped from review as they are similar to previous changes (4)
- api/thrift/common.thrift
- api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
- spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
- api/py/ai/chronon/cli/logger.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/teams.py
8-8: Undefined name EnvironmentVariables
(F821)
45-45: Undefined name EnvironmentVariables
(F821)
66-66: Undefined name EnvironmentVariables
(F821)
api/py/ai/chronon/cli/compile/parse_teams.py
4-4: ai.chronon.cli.compile.parse_configs.from_file imported but unused
Remove unused import: ai.chronon.cli.compile.parse_configs.from_file
(F401)
api/py/ai/chronon/cli/git_utils.py
4-4: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
33-33: Do not use bare except
(E722)
73-73: Do not use bare except
(E722)
api/py/ai/chronon/cli/compile/compile_context.py
2-2: logging.error imported but unused
Remove unused import: logging.error
(F401)
107-107: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
api/py/ai/chronon/cli/compile/validator.py
421-422: Use a single if statement instead of nested if statements
(SIM102)
api/py/ai/chronon/cli/compile/compiler.py
20-20: Undefined name console
(F821)
25-25: Undefined name console
(F821)
28-28: Undefined name console
(F821)
28-28: f-string without any placeholders
Remove extraneous f prefix
(F541)
48-48: Local variable status is assigned to but never used
Remove assignment to unused variable status
(F841)
api/py/ai/chronon/cli/compile/fill_templates.py
4-4: ai.chronon.cli.compile.parse_teams imported but unused
Remove unused import: ai.chronon.cli.compile.parse_teams
(F401)
api/py/ai/chronon/cli/compile/parse_configs.py
4-4: logging imported but unused
Remove unused import: logging
(F401)
6-6: typing.Dict imported but unused
Remove unused import
(F401)
6-6: typing.Tuple imported but unused
Remove unused import
(F401)
9-9: ai.chronon.cli.compile.parse_teams imported but unused
Remove unused import: ai.chronon.cli.compile.parse_teams
(F401)
api/py/ai/chronon/cli/compile/serializer.py
19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused
Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory
(F401)
78-81: Combine if branches using logical or operator
Combine if branches
(SIM114)
api/py/ai/chronon/cli/entrypoint.py
2-2: sys imported but unused
Remove unused import: sys
(F401)
5-5: enum.Enum imported but unused
Remove unused import: enum.Enum
(F401)
17-17: Do not use bare except
(E722)
api/py/test/test_git_utils.py
5-5: pathlib.Path imported but unused
Remove unused import: pathlib.Path
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: table_utils_delta_format_spark_tests
- GitHub Check: other_spark_tests
- GitHub Check: mutation_spark_tests
- GitHub Check: join_spark_tests
- GitHub Check: fetcher_spark_tests
- GitHub Check: build-and-push
🔇 Additional comments (40)
api/py/ai/chronon/cli/compile/compile_context.py (1)
111-112:⚠️ Potential issueFix inverted logic for ignoring hidden files.
Currently, the code processes only hidden files. Correct it to skip hidden files.-if not f.startswith("."): # ignore hidden files - such as .DS_Store +if f.startswith("."): # ignore hidden files - such as .DS_Store continueLikely invalid or redundant comment.
api/py/ai/chronon/cli/logging.py (1)
37-39: Prevent duplicate log handlers.
Clear existing handlers when creating a logger to avoid multiple handler attachments.logger = logging.getLogger(__name__) +logger.handlers.clear() logger.addHandler(handler) logger.setLevel(log_level)api/py/ai/chronon/cli/compile/parse_configs.py (2)
14-20: LGTM! Well-structured dataclass.The CompiledObj dataclass effectively encapsulates compilation results with clear field definitions.
54-55:⚠️ Potential issueAdd path validation.
module_name = rel_path_without_extension.replace("/", ".") +if not all(part.isidentifier() for part in module_name.split(".")): + raise ValueError(f"Invalid module name: {module_name}") +if ".." in module_name: + raise ValueError(f"Path traversal detected in: {module_name}") conf_type, team_name, rest = module_name.split(".", 2)Likely invalid or redundant comment.
api/py/ai/chronon/cli/entrypoint.py (1)
17-18:⚠️ Potential issueSpecify exception type.
- except: + except subprocess.CalledProcessError as e: + logger.warning(f"Failed to get branch: {e}") return "main"Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
17-17: Do not use bare
except(E722)
api/py/test/sample/teams.py (1)
1-1:⚠️ Potential issueAdd missing import.
-from ai.chronon.api.ttypes import Team +from ai.chronon.api.ttypes import Team, EnvironmentVariablesLikely invalid or redundant comment.
api/py/test/test_git_utils.py (2)
13-24: LGTM! Well-structured fixture.The git_repo fixture properly handles setup and cleanup.
26-83: LGTM! Comprehensive test coverage.Test cases thoroughly verify Git utility functions with proper assertions.
api/py/ai/chronon/cli/compile/parse_teams.py (5)
13-37: LGTM!The function correctly loads and validates team configurations.
40-71: LGTM!The function correctly updates and validates metadata.
74-90: LGTM!The function correctly merges dictionaries with proper override behavior.
92-111: LGTM!The function correctly merges environment variables.
4-4: 🛠️ Refactor suggestionRemove unused import.
-from ai.chronon.cli.compile.parse_configs import from_fileLikely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
4-4:
ai.chronon.cli.compile.parse_configs.from_fileimported but unusedRemove unused import:
ai.chronon.cli.compile.parse_configs.from_file(F401)
api/py/ai/chronon/cli/compile/compiler.py (1)
86-88:⚠️ Potential issueFix success ratio calculation.
The denominator should use
compiled_objectsinstead ofcompile_result.obj_dict.- f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}" + f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"Likely invalid or redundant comment.
api/py/ai/chronon/cli/git_utils.py (6)
44-59: LGTM!The function correctly retrieves and validates the merge base.
61-68: LGTM!The function correctly retrieves file content from a specific commit.
77-127: LGTM!The function correctly retrieves and validates changed files.
130-157: LGTM!The function correctly identifies and validates changes since fork point.
33-34: 🛠️ Refactor suggestionSpecify exception type.
- except: + except (OSError, IOError): passLikely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
33-33: Do not use bare
except(E722)
73-74: 🛠️ Refactor suggestionSpecify exception type.
- except: + except (OSError, IOError): return NoneLikely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
73-73: Do not use bare
except(E722)
api/py/ai/chronon/repo/compile.py (2)
225-277: LGTM!The function correctly validates and writes objects with proper error handling.
279-291: LGTM!The function correctly serializes and writes objects as JSON.
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (4)
58-61: LGTM! Array formatting improvements.The consistent array formatting enhances readability.
Also applies to: 85-88, 110-114, 146-149, 220-223, 248-252, 288-291, 306-309, 327-331, 342-346, 365-369
317-334: LGTM! Good test coverage for edge cases.Test case for Long.MAX_VALUE and magicNullLong values improves robustness.
336-350: LGTM! Good test coverage for null values.Test case for handling all null Long values ensures consistent behavior.
352-373: LGTM! Good test coverage for mixed values.Test case for mixed null and non-null Long fields improves completeness.
api/thrift/api.thrift (2)
441-446: LGTM! Well-structured environment variables.Clean separation of environment variables by execution mode.
448-457: LGTM! Good team configuration structure.Team struct provides essential fields for team management.
api/py/ai/chronon/cli/compile/validator.py (3)
183-191: LGTM! Good encapsulation of topic checks.Clean helper function for checking topics in sources.
193-195: LGTM! Clear group-by topic check.Simple and effective check for topics in group-by sources.
197-213: LGTM! Well-structured window check.Clear logic for checking hourly windows in group-by aggregations.
.github/workflows/test_scala_no_spark.yaml (1)
51-51: LGTM! Clean transition to Bazel.Consistent use of Bazel test commands across all modules.
Also applies to: 55-55, 59-59, 63-63, 67-67, 71-71, 75-75
.github/image/Dockerfile (8)
1-1: Base image update.
Switched from Alpine to Ubuntu.
6-6: Updated install command.
Usingaptto install tools on Ubuntu.
22-34: Thrift dependencies.
Updated package names (e.g.,build-essential,libboost-dev).
35-42: Thrift build.
Build commands and cleanup for Thrift look sound.
44-57: Cleanup commands.
Usesapt purgeandapt autoremoveappropriately.
60-60: Python packages install.
Installs testing/linting tools with the system break flag.
68-70: ARM Bazelisk setup.
Downloads Bazelisk for ARM and sets proper permissions.
71-73: JVM symlink.
Installsbuild-essentialand creates JVM symlink for ARM.
| # Create a spinner with custom text | ||
| with console.status("[bold green]Loading...", spinner="dots") as status: | ||
|
|
||
| # Simulate some work | ||
| for i in range(3): | ||
| time.sleep(1) | ||
| console.log(f"{i + 1}. Completed task") | ||
|
|
||
| time.sleep(1) | ||
| console.log(f"Done!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move spinner implementation after console initialization.
The console variable is used before it's imported and initialized. Move this code block after line 46.
🧰 Tools
🪛 Ruff (0.8.2)
20-20: Undefined name console
(F821)
25-25: Undefined name console
(F821)
28-28: Undefined name console
(F821)
28-28: f-string without any placeholders
Remove extraneous f prefix
(F541)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/image/Dockerfile (1)
71-73: Redundant Environment Setup
Reinstalling build-essential and re-linking JAVA may be redundant. Double-check if necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
.github/image/Dockerfile(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: table_utils_delta_format_spark_tests
- GitHub Check: fetcher_spark_tests
- GitHub Check: mutation_spark_tests
- GitHub Check: join_spark_tests
- GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (6)
.github/image/Dockerfile (6)
1-1: Base Image Update
Using Ubuntu as the base is clear.
6-6: APT Packages Installed
APT command is correct for Ubuntu.
22-34: Thrift Dependencies Setup
APT installs required build deps; verify the package list meets build needs.
35-42: Thrift Build Process
Thrift is built from source with minimal flags. Confirm these options suit requirements.
44-57: Cleanup Commands
APT purge, autoremove, and list removal are efficient. Ensure no runtime libs are removed.
60-60: Python Package Installation
The pip install command with --break-system-packages is acceptable.
.github/image/Dockerfile
Outdated
| RUN curl -fsSL "https://github.com/bazelbuild/bazelisk/releases/download/v1.18.0/bazelisk-linux-aarch64" -o /usr/local/bin/bazel | ||
| ENV PATH="/usr/local/bin/bazel:${PATH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Bazel Installation Conflict
Two installations (v1.18.0 via curl and v1.19.0 via ADD) may conflict. Consolidate to one.
.github/image/Dockerfile
Outdated
| RUN chmod 755 /usr/local/bin/bazel | ||
|
|
||
| RUN apt update && apt install -y build-essential |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Bazelisk Redundancy
Using ADD for Bazelisk v1.19.0 after initial curl download causes redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/image/Dockerfile (1)
1-1: Pin Ubuntu Version.
Usingubuntu:latestrisks unpredictable builds. Pin to a specific version (e.g.,ubuntu:22.04).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
.github/image/Dockerfile(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: table_utils_delta_format_spark_tests
- GitHub Check: mutation_spark_tests
- GitHub Check: join_spark_tests
- GitHub Check: fetcher_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: build-and-push
🔇 Additional comments (7)
.github/image/Dockerfile (7)
6-6: Clear package installation.
Command is concise; verify all packages are needed.
18-18: Bazel PATH update.
Ensure Bazel binary is correctly prioritized. Already flagged in past reviews.
22-34: Thrift dependencies setup.
Apt install block is clear and complete.
35-42: Thrift build & cleanup.
Tar extraction and removal of temp files are well-handled.
44-57: Post-build cleanup.
Proper use ofapt purge,autoremove, and list cleanup.
60-60: Python tools installation.
Pip installs essential testing and linting tools correctly.
68-70: Reinstall build-essential & JAVA symlink.
Reinstallingbuild-essentialafter cleanup and creating the JAVA symlink for ARM are sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
.github/image/Dockerfile (3)
1-1: Base Image Update.
Consider pinning a specific Ubuntu version instead oflatestfor reproducibility.
6-6: APT Installation.
APT update/install looks fine; consider cleaning the cache to reduce image size.
18-18: PATH Update.
Multiple ENV PATH modifications—verify the ordering is as intended.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
.github/image/Dockerfile(2 hunks).github/workflows/test_scala_spark.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build-and-push
🔇 Additional comments (6)
.github/workflows/test_scala_spark.yaml (1)
40-40: Bazel test command is concise.
The Bazel command looks correct if the target exists..github/image/Dockerfile (5)
22-34: Thrift Dependencies.
APT dependency block is clear; consider non-interactive options if needed.
35-43: Thrift Build & Cleanup.
Build and cleanup commands look solid. Confirm the configure flags meet requirements.
44-57: APT Purge & Cleanup.
Properly removes build dependencies. Reinstalling build-essential later appears intentional.
60-60: Pip Installation.
Using--break-system-packagesis acceptable; ensure compatibility with your environment.
68-71: Post-Cleanup Setup.
Reinstalling build-essential and setting up the Java symlink are clear—verify their necessity post-purge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
.github/workflows/test_scala_spark.yaml (1)
25-25: Invalid runner label.
Changeubuntu-8_cores-32_gbto a supported label (e.g.ubuntu-latest-8-cores).🧰 Tools
🪛 actionlint (1.7.4)
25-25: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🧹 Nitpick comments (4)
.github/workflows/test_scala_spark.yaml (1)
46-46: Remove trailing space.
Trim extra whitespace at the end of the--google_credentialsline.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 46-46: trailing spaces
(trailing-spaces)
.github/image/Dockerfile (2)
59-59: Consider version pinning.
Pin versions forpytest,tox, andflake8for better reproducibility.
68-69: Verify JAVA symlink.
Ensure the symlink for JAVA_HOME works reliably across environments..github/workflows/test_scala_no_spark.yaml (1)
53-53: Remove trailing spaces.
Trim all trailing whitespace in the Bazel command lines.Also applies to: 60-60, 67-67, 74-74, 81-81, 88-88, 95-95
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 53-53: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (9)
.github/image/Dockerfile(3 hunks).github/workflows/build_and_push_docker.yaml(1 hunks).github/workflows/test_scala_no_spark.yaml(1 hunks).github/workflows/test_scala_spark.yaml(1 hunks)spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala(1 hunks)tools/build_rules/thrift/thrift.bzl(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala
- spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala
- spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala
🚧 Files skipped from review as they are similar to previous changes (2)
- spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
- tools/build_rules/thrift/thrift.bzl
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/build_and_push_docker.yaml
15-15: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_spark.yaml
[error] 46-46: trailing spaces
(trailing-spaces)
.github/workflows/test_scala_no_spark.yaml
[error] 53-53: trailing spaces
(trailing-spaces)
[error] 60-60: trailing spaces
(trailing-spaces)
[error] 67-67: trailing spaces
(trailing-spaces)
[error] 74-74: trailing spaces
(trailing-spaces)
[error] 81-81: trailing spaces
(trailing-spaces)
[error] 88-88: trailing spaces
(trailing-spaces)
[error] 95-95: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: build-and-push
| jobs: | ||
| build-and-push: | ||
| runs-on: ubuntu-latest | ||
| runs-on: ubuntu-8_cores-32_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid runner label.
Label ubuntu-8_cores-32_gb is not recognized. Use a valid label (e.g. ubuntu-latest-8-cores).
🧰 Tools
🪛 actionlint (1.7.4)
15-15: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/test_python.yaml (1)
37-38: Remove trailing whitespace.
Trailing whitespace detected on line 38.- run: | + run: |🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 38-38: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
.github/workflows/test_python.yaml(2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml
[error] 38-38: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: other_spark_tests
- GitHub Check: no_spark_scala_tests
- GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (2)
.github/workflows/test_python.yaml (2)
43-43: Explicit shell for lint step.
Usingshell: bashhere provides clarity.
52-52: Explicit shell for tests.
Consistent shell declaration improves readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (11)
api/py/ai/chronon/cli/compile/parse_teams.py (2)
73-89: Simplify map merging using dictionary comprehension.Current implementation can be more concise.
- result = {} - for m in maps: - if m is None: - continue - for key, value in m.items(): - result[key] = value - return result + return {k: v for m in maps if m is not None for k, v in m.items()}
91-91: Add return type hint for clarity.Function return type should be explicit.
-def _merge_environment_variables(*env_vars: List[EnvironmentVariables]): +def _merge_environment_variables(*env_vars: List[EnvironmentVariables]) -> EnvironmentVariables:api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import- from logging import error🧰 Tools
🪛 Ruff (0.8.2)
2-2:
logging.errorimported but unusedRemove unused import:
logging.error(F401)
107-107: Rename unused variable
Keep it as_sub_dirsto reflect it's unused.- for sub_root, sub_dirs, sub_files in os.walk(output_dir): + for sub_root, _sub_dirs, sub_files in os.walk(output_dir):🧰 Tools
🪛 Ruff (0.8.2)
107-107: Loop control variable
sub_dirsnot used within loop bodyRename unused
sub_dirsto_sub_dirs(B007)
api/py/ai/chronon/cli/compile/conf_validator.py (4)
20-20: Remove unused import- import os🧰 Tools
🪛 Ruff (0.8.2)
20-20:
osimported but unusedRemove unused import:
os(F401)
54-57: Inline condition- if group_by.metaData.online or group_by.backfillStartDate: - return True - else: - return False + return group_by.metaData.online or group_by.backfillStartDate🧰 Tools
🪛 Ruff (0.8.2)
54-57: Return the condition directly
Inline condition
(SIM103)
356-359: Use single if- if wild_card_derivation_included: - if derivation.expression in derived_columns: - derived_columns.remove(derivation.expression) + if wild_card_derivation_included and derivation.expression in derived_columns: + derived_columns.remove(derivation.expression)🧰 Tools
🪛 Ruff (0.8.2)
356-357: Use a single
ifstatement instead of nestedifstatements(SIM102)
421-427: Combine nested if- if join.metaData.online: - if offline_included_group_bys: - errors.append(...) + if join.metaData.online and offline_included_group_bys: + errors.append(...)🧰 Tools
🪛 Ruff (0.8.2)
421-422: Use a single
ifstatement instead of nestedifstatements(SIM102)
api/py/ai/chronon/cli/compile/parse_configs.py (1)
5-5: Remove unused imports- from typing import Any, Dict, List, Tuple + from typing import Any, List🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Dictimported but unusedRemove unused import
(F401)
5-5:
typing.Tupleimported but unusedRemove unused import
(F401)
api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1 (1)
78-78: Document aggregation operation value.Operation value 3 should be documented or use a named constant for clarity.
.github/workflows/test_python.yaml (1)
38-38: Remove trailing spaces.- run: | + run: |🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 38-38: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (12)
.github/image/Dockerfile(3 hunks).github/workflows/test_python.yaml(1 hunks)api/py/ai/chronon/cli/compile/compile_context.py(1 hunks)api/py/ai/chronon/cli/compile/conf_validator.py(1 hunks)api/py/ai/chronon/cli/compile/fill_templates.py(1 hunks)api/py/ai/chronon/cli/compile/parse_configs.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(1 hunks)api/py/ai/chronon/cli/entrypoint.py(1 hunks)api/py/ai/chronon/group_by.py(0 hunks)api/py/ai/chronon/join.py(0 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1(1 hunks)
💤 Files with no reviewable changes (2)
- api/py/ai/chronon/join.py
- api/py/ai/chronon/group_by.py
🚧 Files skipped from review as they are similar to previous changes (2)
- api/py/ai/chronon/repo/compile.py
- api/py/ai/chronon/cli/compile/fill_templates.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/compile_context.py
2-2: logging.error imported but unused
Remove unused import: logging.error
(F401)
107-107: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
api/py/ai/chronon/cli/compile/conf_validator.py
20-20: os imported but unused
Remove unused import: os
(F401)
54-57: Return the condition directly
Inline condition
(SIM103)
356-357: Use a single if statement instead of nested if statements
(SIM102)
421-422: Use a single if statement instead of nested if statements
(SIM102)
api/py/ai/chronon/cli/compile/parse_configs.py
5-5: typing.Dict imported but unused
Remove unused import
(F401)
5-5: typing.Tuple imported but unused
Remove unused import
(F401)
api/py/ai/chronon/cli/entrypoint.py
15-15: Do not use bare except
(E722)
🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml
[error] 38-38: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: no_spark_scala_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
- GitHub Check: build-and-push
🔇 Additional comments (7)
api/py/ai/chronon/cli/compile/parse_teams.py (2)
39-71: LGTM!Good validation chain and proper environment variable merging.
31-34: Validate required Team attributes before adding to dictionary.Add validation for required Team fields like outputNamespace.
if isinstance(obj, Team): + require(obj.outputNamespace is not None, f"Team {name} missing required outputNamespace") obj.name = name logger.info(f"Found team {name}") team_dict[name] = objapi/py/ai/chronon/cli/compile/parse_configs.py (1)
53-53: Check list size before splitting
Could raise ValueError if there aren’t enough parts.conf_parts = module_name.split(".", 2) if len(conf_parts) < 3: raise ValueError(f"Invalid module name: {module_name}")api/py/ai/chronon/cli/entrypoint.py (4)
8-16: Catch specific exception in get_current_branch.def get_current_branch(): try: return ( subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"]) .decode("utf-8") .strip() ) - except: + except subprocess.CalledProcessError as e: + logger.warning(f"Failed to get branch: {e}") return "main"🧰 Tools
🪛 Ruff (0.8.2)
15-15: Do not use bare
except(E722)
25-31: LGTM! Clean implementation of sync command.
59-64: LGTM! Clean implementation of deploy command.
46-51: 🛠️ Refactor suggestionReplace scope with explicit run flags.
Per discussion, replace
scopewith explicitrun_upstreamandrun_downstreamflags for clarity.-@click.option( - "--scope", - type=click.Choice(["upstream", "self", "downstream"]), - default="upstream", - help="Scope of configs to backfill", -) +@click.option("--run-upstream", is_flag=True, default=True, help="Run upstream configs") +@click.option("--run-downstream", is_flag=True, default=False, help="Run downstream configs")Likely invalid or redundant comment.
| f"Team config file: {teams_file} not found. Please add one.", | ||
| ) | ||
|
|
||
| team_module = importlib.import_module("teams") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use absolute import path for teams module.
Current import assumes teams.py is in PYTHONPATH. Use absolute path to avoid import errors.
- team_module = importlib.import_module("teams")
+ team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| team_module = importlib.import_module("teams") | |
| team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams") |
| if not f.startswith("."): # ignore hidden files - such as .DS_Store | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix reversed file-filter condition
Skipping all non-dot files contradicts the comment.
- if not f.startswith("."): # ignore hidden files
- continue
+ if f.startswith("."): # ignore hidden files
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not f.startswith("."): # ignore hidden files - such as .DS_Store | |
| continue | |
| if f.startswith("."): # ignore hidden files | |
| continue |
.github/image/Dockerfile
Outdated
| RUN apt update && apt install -y build-essential git | ||
| RUN mkdir -p /usr/lib/jvm && ln -s /usr/lib/jvm/java-17-openjdk-amd64/ /usr/lib/jvm/default-jvm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move Java symlink creation before verification.
Java verification may fail if symlink doesn't exist. Move these lines before the verification step.
| @@ -1,15 +1,9 @@ | |||
| FROM alpine:latest | |||
| FROM ubuntu:latest | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pin Ubuntu version for reproducibility.
-FROM ubuntu:latest
+FROM ubuntu:22.04📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FROM ubuntu:latest | |
| FROM ubuntu:22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
api/py/test/test_utils.py (1)
175-189: Avoid redefining test function.
test_get_related_table_names_for_group_bysis already defined at line 164.-def test_get_related_table_names_for_group_bys(): +def test_get_related_table_names_for_group_bys_different_name():🧰 Tools
🪛 Ruff (0.8.2)
175-175: Redefinition of unused
test_get_related_table_names_for_group_bysfrom line 164(F811)
🧹 Nitpick comments (8)
api/py/ai/chronon/cli/compile/conf_validator.py (4)
20-20: Remove unused import.
osis not used anywhere. Consider removing it.-import os🧰 Tools
🪛 Ruff (0.8.2)
20-20:
osimported but unusedRemove unused import:
os(F401)
54-57: Inline the condition.Directly return the boolean expression instead of using
if-else.-def _is_batch_upload_needed(group_by: GroupBy) -> bool: - if group_by.metaData.online or group_by.backfillStartDate: - return True - else: - return False +def _is_batch_upload_needed(group_by: GroupBy) -> bool: + return group_by.metaData.online or group_by.backfillStartDate🧰 Tools
🪛 Ruff (0.8.2)
54-57: Return the condition directly
Inline condition
(SIM103)
356-357: Merge nested if statements.Combine to simplify and avoid redundancy.
-if wild_card_derivation_included: - if derivation.expression in derived_columns: +if wild_card_derivation_included and derivation.expression in derived_columns: derived_columns.remove(derivation.expression)🧰 Tools
🪛 Ruff (0.8.2)
356-357: Use a single
ifstatement instead of nestedifstatements(SIM102)
421-422: Combine nested checks.Shorten by joining conditions into a single
if.-if join.metaData.online: - if offline_included_group_bys: +if join.metaData.online and offline_included_group_bys: errors.append(...)🧰 Tools
🪛 Ruff (0.8.2)
421-422: Use a single
ifstatement instead of nestedifstatements(SIM102)
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import.
logging.erroris never used.-from logging import error🧰 Tools
🪛 Ruff (0.8.2)
2-2:
logging.errorimported but unusedRemove unused import:
logging.error(F401)
107-107: Rename unused variable.
sub_dirsis not used in the loop. Rename it to_sub_dirs.-for sub_root, sub_dirs, sub_files in os.walk(output_dir): +for sub_root, _sub_dirs, sub_files in os.walk(output_dir):🧰 Tools
🪛 Ruff (0.8.2)
107-107: Loop control variable
sub_dirsnot used within loop bodyRename unused
sub_dirsto_sub_dirs(B007)
api/py/ai/chronon/cli/compile/compiler.py (2)
14-14: Remove unused import.
timeis not used.-import time🧰 Tools
🪛 Ruff (0.8.2)
14-14:
timeimported but unusedRemove unused import:
time(F401)
36-36: Remove unused variable.
statusisn't used inside the context.-with console.status("[bold green]Loading...", spinner="dots") as status: +with console.status("[bold green]Loading...", spinner="dots"):🧰 Tools
🪛 Ruff (0.8.2)
36-36: Local variable
statusis assigned to but never usedRemove assignment to unused variable
status(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (4)
api/py/ai/chronon/cli/compile/compile_context.py(1 hunks)api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/conf_validator.py(1 hunks)api/py/test/test_utils.py(5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/cli/compile/conf_validator.py
20-20: os imported but unused
Remove unused import: os
(F401)
54-57: Return the condition directly
Inline condition
(SIM103)
356-357: Use a single if statement instead of nested if statements
(SIM102)
421-422: Use a single if statement instead of nested if statements
(SIM102)
api/py/ai/chronon/cli/compile/compile_context.py
2-2: logging.error imported but unused
Remove unused import: logging.error
(F401)
107-107: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
api/py/ai/chronon/cli/compile/compiler.py
14-14: time imported but unused
Remove unused import: time
(F401)
36-36: Local variable status is assigned to but never used
Remove assignment to unused variable status
(F841)
api/py/test/test_utils.py
175-175: Redefinition of unused test_get_related_table_names_for_group_bys from line 164
(F811)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: no_spark_scala_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: other_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
api/py/ai/chronon/cli/compile/compile_context.py (1)
111-112:⚠️ Potential issueFix reversed file-filter condition.
To skip hidden files, invert the check:
-if not f.startswith("."): # ignore hidden files - such as .DS_Store +if f.startswith("."): # ignore hidden files - such as .DS_Store continueLikely invalid or redundant comment.
api/py/ai/chronon/cli/compile/compiler.py (1)
74-75:⚠️ Potential issueFix success ratio calculation.
It's showing
X/Xinstead ofX/Total.- f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}" + f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
api/py/test/test_utils.py (2)
171-174: Remove debug print and add error handling.def dopen(path): full_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), path) - print(full_path) - return open(full_path) + try: + return open(full_path) + except FileNotFoundError as e: + raise FileNotFoundError(f"Could not open {full_path}: {e}")
270-270: Remove debug print statement.- print("Current working directory:", os.getcwd())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (4)
api/py/test/test_utils.py(10 hunks)api/py/test/test_validator.py(3 hunks)api/py/tox.ini(1 hunks)tools/build_rules/thrift/thrift.bzl(0 hunks)
💤 Files with no reviewable changes (1)
- tools/build_rules/thrift/thrift.bzl
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/test_utils.py
189-189: Redefinition of unused test_get_related_table_names_for_group_bys from line 177
(F811)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: other_spark_tests
- GitHub Check: no_spark_scala_tests
- GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (7)
api/py/tox.ini (2)
8-8: Update allowlist_externals: Addedmkdirandcpto support new commands.
13-20: New compile commands: Twocompile.pycalls with distinct--input_pathvalues and subsequent directory creation and copy. Ensure the execution order and output paths align with expectations.api/py/test/test_validator.py (4)
26-30: LGTM: Improved path handlingDynamic path generation makes tests more portable.
36-39: LGTM: Enhanced readabilityClean formatting of list comprehensions.
Also applies to: 44-51, 56-64
74-76: LGTM: Better function signature formattingImproved readability by splitting long lines.
Also applies to: 83-85, 93-95, 102-104
138-140: LGTM: Clean import formattingBetter readability with properly split import statements.
Also applies to: 161-163
api/py/test/test_utils.py (1)
99-337: Test coverage looks comprehensive!Good job on covering various scenarios for group_bys, joins, and table name utilities.
🧰 Tools
🪛 Ruff (0.8.2)
189-189: Redefinition of unused
test_get_related_table_names_for_group_bysfrom line 177(F811)
api/py/test/test_utils.py
Outdated
| def test_get_related_table_names_for_group_bys( | ||
| group_by_requiring_backfill, | ||
| online_group_by_requiring_streaming | ||
| group_by_requiring_backfill, online_group_by_requiring_streaming | ||
| ): | ||
| with open('test/sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1') as conf_file: | ||
| with dopen( | ||
| "sample/production/group_bys/sample_team/entity_sample_group_by_from_module.v1" | ||
| ) as conf_file: | ||
| json = conf_file.read() | ||
| group_by = json2thrift(json, api.GroupBy) | ||
| tables = utils.get_related_table_names(group_by) | ||
| assert any(table.endswith("_upload") for table in tables) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate test function name.
Two functions with the same name will cause the first one to be ignored.
-def test_get_related_table_names_for_group_bys(
+def test_get_related_table_names_for_group_bys_with_streaming(
group_by_requiring_backfill, online_group_by_requiring_streaming
):Also applies to: 189-205
[Ticket](https://app.asana.com/0/0/1209184028451140) This PR includes both frontend and backend updates. Since @nikhil-zlai and I reviewed the high-level changes today, it might be helpful for @sean-zlai and me to go over them during our sync tomorrow. There are some key points I'd like to highlight, but I wanted to get this up early for review. **Reminder:** @sean-zlai will handle merging the new drift/summary functionality into the observability frontend. - Added new endpoints for Conf, Confs, search, drift, summary, etc., currently called on the `/thrift` route for demonstration. The old endpoints can be removed once we wire in the new ones. - Created `LogicalNodeTable.svelte` to display all `ConfTypes`, including empty states. - Updated search functionality to work for all `ConfTypes`, though only `risk.user_transactions.txn_join` is enabled for now. - Implemented new endpoints using `RouteHandlerWrapper`. - Removed old endpoints (except `/timeseries`, which is still needed for charts). - Separated drift and summary into two distinct endpoints. - Introduced: - `DriftHandler` (handles drift and summaries). - `ConfHandler` (handles Conf, Confs, and Conf search). - Added filtering by column to `DriftStore`. - Improved and added various tests. - Developed a new serializer that integrates well with `RouteHandlerWrapper`. Let me know if you have any questions! - [x] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - **New Features** - Added configuration management capabilities across joins, models, group bys, and staging queries. - Introduced drift analysis and summary tracking for configurations. - Enhanced API endpoints for retrieving and searching configurations. - New `LogicalNodeTable` component for displaying configuration data. - Added new structures and enumerations for handling configuration requests in the API. - **Improvements** - Refactored API response handling to use JSON serialization. - Improved null value handling in data processing with predefined constants. - Updated type definitions for better type safety. - Streamlined configuration retrieval and search mechanisms. - Enhanced handling of search results by categorizing entity types. - **Bug Fixes** - Resolved issues with null value representation in long and double iterators. - Improved error handling in API request processing. - **Breaking Changes** - Removed previous model and join metadata endpoints. - Updated API method signatures for configuration retrieval. - Eliminated deprecated response types related to joins and models. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
api/py/ai/chronon/cli/compile/compiler.py (1)
74-76:⚠️ Potential issueFix success ratio calculation.
The success ratio uses the same value for numerator and denominator.
Apply this diff to fix the calculation:
- f"{len(compile_result.obj_dict)}/{len(compile_result.obj_dict)}" + f"{len(compile_result.obj_dict)}/{len(compiled_objects)}"api/py/ai/chronon/cli/git_utils.py (2)
33-34: 🛠️ Refactor suggestionUse specific exception type.
The bare except clause could hide important errors.
Apply this diff to use a more specific exception type:
- except: + except OSError: pass🧰 Tools
🪛 Ruff (0.8.2)
33-33: Do not use bare
except(E722)
73-74: 🛠️ Refactor suggestionUse specific exception type.
The bare except clause could hide important errors.
Apply this diff to use a more specific exception type:
- except: + except OSError: return None🧰 Tools
🪛 Ruff (0.8.2)
73-73: Do not use bare
except(E722)
.github/image/Dockerfile (1)
1-1: 🛠️ Refactor suggestionPin Base Image Version:
Use a fixed tag (e.g.,ubuntu:22.04) instead oflatestfor reproducibility.
🧹 Nitpick comments (14)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)
41-68: Consider removing commented code.Since the functionality is moved to interactive.py, consider removing the commented test class instead of keeping it as reference.
api/py/ai/chronon/cli/compile/compile_context.py (2)
2-2: Remove unused import.The
errorimport is never used.-from logging import error🧰 Tools
🪛 Ruff (0.8.2)
2-2:
logging.errorimported but unusedRemove unused import:
logging.error(F401)
107-107: Rename unused variable.
sub_dirsis unused. Use_sub_dirsto clarify intent.-for sub_root, sub_dirs, sub_files in os.walk(output_dir): +for sub_root, _sub_dirs, sub_files in os.walk(output_dir):🧰 Tools
🪛 Ruff (0.8.2)
107-107: Loop control variable
sub_dirsnot used within loop bodyRename unused
sub_dirsto_sub_dirs(B007)
api/py/ai/chronon/cli/compile/parse_configs.py (1)
5-5: Remove unused imports.
DictandTupleare unused.-from typing import Any, Dict, List, Tuple +from typing import Any, List🧰 Tools
🪛 Ruff (0.8.2)
5-5:
typing.Dictimported but unusedRemove unused import
(F401)
5-5:
typing.Tupleimported but unusedRemove unused import
(F401)
api/py/ai/chronon/cli/entrypoint.py (1)
46-51: Consider more explicit scope options.Replace
scopewithrun_upstreamandrun_downstreamflags for clarity.api/py/test/test_git_utils.py (1)
5-5: Remove unused import.-from pathlib import Path🧰 Tools
🪛 Ruff (0.8.2)
5-5:
pathlib.Pathimported but unusedRemove unused import:
pathlib.Path(F401)
api/py/ai/chronon/cli/compile/compiler.py (1)
14-14: Remove unused import.The
timemodule is imported but not used.Apply this diff to remove the unused import:
-import time🧰 Tools
🪛 Ruff (0.8.2)
14-14:
timeimported but unusedRemove unused import:
time(F401)
api/py/ai/chronon/cli/compile/serializer.py (1)
78-81: Combine if branches for integer types.The branches for I64, I32, I16, and BYTE can be combined using logical OR operator.
Apply this diff to combine the branches:
- elif ttype == TType.I64: - ret = int(val) - elif ttype == TType.I32 or ttype == TType.I16 or ttype == TType.BYTE: - ret = int(val) + elif ttype in (TType.I64, TType.I32, TType.I16, TType.BYTE): + ret = int(val)🧰 Tools
🪛 Ruff (0.8.2)
78-81: Combine
ifbranches using logicaloroperatorCombine
ifbranches(SIM114)
api/thrift/api.thrift (1)
237-238: Consider removing the deprecated field.Since
modeToEnvMapis deprecated andenvis its replacement, consider removing the old field.Also applies to: 254-254
api/py/ai/chronon/cli/compile/conf_validator.py (3)
18-20: Remove unused import.The
osmodule is imported but never used.-import os🧰 Tools
🪛 Ruff (0.8.2)
20-20:
osimported but unusedRemove unused import:
os(F401)
53-58: Simplify boolean return.Return the condition directly.
-def _is_batch_upload_needed(group_by: GroupBy) -> bool: - if group_by.metaData.online or group_by.backfillStartDate: - return True - else: - return False +def _is_batch_upload_needed(group_by: GroupBy) -> bool: + return group_by.metaData.online or group_by.backfillStartDate🧰 Tools
🪛 Ruff (0.8.2)
54-57: Return the condition directly
Inline condition
(SIM103)
356-358: Simplify nested if statements.Combine nested if statements into a single condition.
- if wild_card_derivation_included: - if derivation.expression in derived_columns: + if wild_card_derivation_included and derivation.expression in derived_columns:- if join.metaData.online: - if offline_included_group_bys: + if join.metaData.online and offline_included_group_bys:Also applies to: 421-423
🧰 Tools
🪛 Ruff (0.8.2)
356-357: Use a single
ifstatement instead of nestedifstatements(SIM102)
.github/workflows/test_python.yaml (1)
37-38: Remove trailing spaces.Line 38 contains trailing spaces.
- run: | + run: |🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 38-38: trailing spaces
(trailing-spaces)
.github/image/Dockerfile (1)
59-59: Pip Package Management:
Consider pinning package versions to ensure stability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (36)
.bazelrc(1 hunks).github/image/Dockerfile(3 hunks).github/workflows/test_python.yaml(1 hunks).github/workflows/test_scala_no_spark.yaml(1 hunks).github/workflows/test_scala_spark.yaml(1 hunks)api/py/ai/chronon/cli/compile/compile_context.py(1 hunks)api/py/ai/chronon/cli/compile/compiler.py(1 hunks)api/py/ai/chronon/cli/compile/conf_validator.py(1 hunks)api/py/ai/chronon/cli/compile/fill_templates.py(1 hunks)api/py/ai/chronon/cli/compile/parse_configs.py(1 hunks)api/py/ai/chronon/cli/compile/parse_teams.py(1 hunks)api/py/ai/chronon/cli/compile/serializer.py(1 hunks)api/py/ai/chronon/cli/entrypoint.py(1 hunks)api/py/ai/chronon/cli/git_utils.py(1 hunks)api/py/ai/chronon/cli/logger.py(1 hunks)api/py/ai/chronon/cli/logging.py(1 hunks)api/py/ai/chronon/group_by.py(0 hunks)api/py/ai/chronon/join.py(0 hunks)api/py/ai/chronon/repo/compile.py(1 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1(1 hunks)api/py/test/sample/teams.py(1 hunks)api/py/test/test_git_utils.py(1 hunks)api/py/test/test_utils.py(10 hunks)api/py/test/test_validator.py(3 hunks)api/py/tox.ini(1 hunks)api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala(2 hunks)api/thrift/api.thrift(4 hunks)api/thrift/common.thrift(1 hunks)hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala(8 hunks)hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala(9 hunks)online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala(12 hunks)spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala(1 hunks)tools/build_rules/thrift/thrift.bzl(0 hunks)
💤 Files with no reviewable changes (3)
- tools/build_rules/thrift/thrift.bzl
- api/py/ai/chronon/join.py
- api/py/ai/chronon/group_by.py
🚧 Files skipped from review as they are similar to previous changes (16)
- spark/src/test/scala/ai/chronon/spark/test/OfflineSubcommandTest.scala
- api/thrift/common.thrift
- hub/src/test/scala/ai/chronon/hub/handlers/DriftHandlerTest.scala
- .bazelrc
- api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
- .github/workflows/test_scala_no_spark.yaml
- api/py/tox.ini
- spark/src/test/scala/ai/chronon/spark/test/MetadataExporterTest.scala
- hub/src/test/scala/ai/chronon/hub/handlers/ConfHandlerTest.scala
- api/py/test/test_utils.py
- api/py/test/sample/production/joins/sample_team/sample_join_from_group_by_from_join.v1
- spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
- api/py/ai/chronon/cli/logging.py
- online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
- api/py/ai/chronon/cli/compile/fill_templates.py
- api/py/test/test_validator.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/test/sample/teams.py
8-8: Undefined name EnvironmentVariables
(F821)
45-45: Undefined name EnvironmentVariables
(F821)
66-66: Undefined name EnvironmentVariables
(F821)
api/py/ai/chronon/cli/compile/conf_validator.py
20-20: os imported but unused
Remove unused import: os
(F401)
54-57: Return the condition directly
Inline condition
(SIM103)
356-357: Use a single if statement instead of nested if statements
(SIM102)
421-422: Use a single if statement instead of nested if statements
(SIM102)
api/py/ai/chronon/cli/compile/compile_context.py
2-2: logging.error imported but unused
Remove unused import: logging.error
(F401)
107-107: Loop control variable sub_dirs not used within loop body
Rename unused sub_dirs to _sub_dirs
(B007)
api/py/ai/chronon/cli/compile/compiler.py
14-14: time imported but unused
Remove unused import: time
(F401)
36-36: Local variable status is assigned to but never used
Remove assignment to unused variable status
(F841)
api/py/ai/chronon/cli/compile/parse_configs.py
5-5: typing.Dict imported but unused
Remove unused import
(F401)
5-5: typing.Tuple imported but unused
Remove unused import
(F401)
api/py/ai/chronon/cli/compile/serializer.py
19-19: thrift.protocol.TJSONProtocol.TJSONProtocolFactory imported but unused
Remove unused import: thrift.protocol.TJSONProtocol.TJSONProtocolFactory
(F401)
78-81: Combine if branches using logical or operator
Combine if branches
(SIM114)
api/py/ai/chronon/cli/entrypoint.py
15-15: Do not use bare except
(E722)
api/py/ai/chronon/cli/git_utils.py
4-4: typing.Set imported but unused
Remove unused import: typing.Set
(F401)
33-33: Do not use bare except
(E722)
73-73: Do not use bare except
(E722)
api/py/test/test_git_utils.py
5-5: pathlib.Path imported but unused
Remove unused import: pathlib.Path
(F401)
🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml
[error] 38-38: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: no_spark_scala_tests
- GitHub Check: other_spark_tests
🔇 Additional comments (22)
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)
26-37: Verify if LocalDataLoaderTest object is still needed.The object contains cleanup code but might be unused now that tests are moved.
✅ Verification successful
LocalDataLoaderTest appears to be isolated to its own test file.
- Only self-references exist (including a commented-out test class).
- No other Scala files reference it.
It seems safe to remove it if its cleanup code is no longer required.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for references to LocalDataLoaderTest object rg "LocalDataLoaderTest\." --type scalaLength of output: 41
Script:
#!/bin/bash # Search all Scala files for any mention of LocalDataLoaderTest. rg "LocalDataLoaderTest" -g "*.scala"Length of output: 437
api/py/ai/chronon/cli/compile/compile_context.py (1)
111-112: Fix reversed condition.This logic skips non-hidden files and processes hidden ones, contradicting the comment.
-if not f.startswith("."): # ignore hidden files - such as .DS_Store - continue +if f.startswith("."): + continueapi/py/ai/chronon/cli/logger.py (1)
58-58: Specify exception type.Raise statement should specify an exception type as previously noted.
-raise +raise ValueError(message)api/py/ai/chronon/cli/entrypoint.py (1)
8-16:⚠️ Potential issueSpecify exception type.
def get_current_branch(): try: return ( subprocess.check_output(["git", "rev-parse", "--abbrev-ref", "HEAD"]) .decode("utf-8") .strip() ) - except: + except subprocess.CalledProcessError as e: + logger.warning(f"Failed to get branch: {e}") return "main"Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
15-15: Do not use bare
except(E722)
api/py/test/sample/teams.py (1)
1-1:⚠️ Potential issueAdd missing import.
-from ai.chronon.api.ttypes import Team +from ai.chronon.api.ttypes import Team, EnvironmentVariablesLikely invalid or redundant comment.
api/py/ai/chronon/cli/compile/parse_teams.py (1)
21-21:⚠️ Potential issueUse absolute import path.
- team_module = importlib.import_module("teams") + team_module = importlib.import_module(f"{os.path.basename(conf_root)}.teams")Likely invalid or redundant comment.
api/py/ai/chronon/repo/compile.py (1)
241-275: LGTM!The enhanced validation and error handling logic is well-implemented with proper error messages and warnings.
api/thrift/api.thrift (2)
441-446: LGTM: Well-structured environment variables.The
EnvironmentVariablesstruct provides a clear organization of environment variables across different contexts.
448-457: LGTM: Team metadata structure.The
Teamstruct effectively encapsulates team-related information.api/py/ai/chronon/cli/compile/conf_validator.py (3)
334-378: LGTM: Well-structured derivation validation.The derivation validation logic is comprehensive and handles all edge cases.
🧰 Tools
🪛 Ruff (0.8.2)
356-357: Use a single
ifstatement instead of nestedifstatements(SIM102)
379-440: LGTM: Thorough join validation.The join validation covers all necessary checks including production status and online/offline compatibility.
🧰 Tools
🪛 Ruff (0.8.2)
421-422: Use a single
ifstatement instead of nestedifstatements(SIM102)
441-515: LGTM: Comprehensive group by validation.The group by validation effectively handles all validation scenarios.
.github/workflows/test_scala_spark.yaml (2)
38-41: LGTM: Secure credential handling.Properly decodes and stores Bazel cache credentials.
42-47: LGTM: Efficient test configuration.Uses Bazel's remote caching for improved test performance.
.github/workflows/test_python.yaml (2)
49-49: LGTM: Updated linting configuration.Appropriate exclusion of cli directory from flake8 checks.
37-37: LGTM: Explicit shell specification.Clear specification of bash shell for command execution.
Also applies to: 43-43, 52-52
.github/image/Dockerfile (6)
6-6: Dependencies Check:
Ensurepython3.12-venvandopenjdk-17-jdkare required.
17-19: Bazel Setup:
Bazelisk binary is set correctly; verify that versionv1.18.0meets current needs.
23-56: Thrift Installation & Cleanup:
The build, install, and purge flow is solid. Confirm purged packages aren’t needed later.
61-61: Tool Installation:
Installingbuild-essentialandgitis clear and straightforward.
62-62: Java Symlink:
Symlink creation is concise; confirm that the target directory exists.
65-71: Installation Verification:
All key tools (Java, Thrift, Python, Pip, Bazel, Git) are verified.
| // Not needed since we are doing this via interactive.py | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Interactive.py misses equivalent test coverage.
- No test functions were found in
api/py/ai/chronon/repo/interactive.py. - No references to
LocalDataLoaderappear there.
It looks like the transition from Scala tests to Python testing isn’t fully complete. Please add tests in interactive.py or elsewhere to cover this functionality.
🔗 Analysis chain
Verify test coverage in interactive.py.
Please confirm that equivalent test coverage exists in interactive.py.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for corresponding test coverage in interactive.py
rg -l "test_table_1_data" | grep "interactive.py"Length of output: 47
Script:
#!/bin/bash
# List any interactive.py files in the repository.
echo "=== Listing interactive.py files ==="
fd interactive.py
# For each interactive.py file found, display any test functions and references to LocalDataLoader.
echo "=== Checking interactive.py for test coverage and LocalDataLoader references ==="
for file in $(fd interactive.py); do
echo "File: $file"
echo "-- Test functions --"
grep -H "def test_" "$file" || echo "No test functions found in $file"
echo "-- LocalDataLoader references --"
grep -H "LocalDataLoader" "$file" || echo "No LocalDataLoader references found in $file"
doneLength of output: 1016
| sample_team = Team( | ||
| namespace="test", | ||
| env=EnvironmentVariables( | ||
| common={ | ||
| "GCP_BIGTABLE_INSTANCE_ID": "test-instance" # example, custom bigtable instance | ||
| }, | ||
| backfill={ # smaller jobs for testing | ||
| "EXECUTOR_CORES": "2", | ||
| "DRIVER_MEMORY": "15G", | ||
| "EXECUTOR_MEMORY": "4G", | ||
| "PARALLELISM": "4", | ||
| "MAX_EXECUTORS": "4", | ||
| }, | ||
| upload={ | ||
| "PARALLELISM": "2", | ||
| "MAX_EXECUTORS": "4", | ||
| }, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove duplicate team configuration.
sample_team is identical to test team. Remove the duplicate.
🧰 Tools
🪛 Ruff (0.8.2)
66-66: Undefined name EnvironmentVariables
(F821)
| metadata = obj.metaData | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Standardize metadata access.
- metadata = obj.metaData
-
- require(obj.metaData is not None, "Cannot update empty metadata")
+ require(obj.metaData is not None, "Cannot update empty metadata")
+ metadata = obj.metaData📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metadata = obj.metaData | |
| require(obj.metaData is not None, "Cannot update empty metadata") | |
| metadata = obj.metaData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/test_python.yaml (1)
41-43: Remove trailing whitespace.
Extra whitespace in the run block header (line 43) should be removed.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 43-43: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
.github/workflows/test_python.yaml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/test_python.yaml
[error] 43-43: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: no_spark_scala_tests
- GitHub Check: other_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
.github/workflows/test_python.yaml (3)
36-40: Git config step approved.
Git user configuration enhances commit traceability.
47-55: Lint step updated.
Using bash and excluding the 'cli' directory is intentional and consistent.
57-67: Tests step updated.
The bash shell setting ensures a consistent test environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
project/ThriftGen.scala (1)
32-37: Use UTF-8 for writing and add proper resource management.- val writer = new java.io.PrintWriter(file) - try { - writer.write(newContent) - } finally { - writer.close() - } + val writer = new java.io.PrintWriter(file, "UTF-8") + try { + writer.write(newContent) + } finally { + writer.close() + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
.github/workflows/test_scala_fmt.yaml(1 hunks)project/ThriftGen.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/test_scala_fmt.yaml
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: no_spark_scala_tests
- GitHub Check: other_spark_tests
| val source = scala.io.Source.fromFile(file, "UTF-8") | ||
| val content = source.mkString | ||
| source.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Good addition of UTF-8 encoding, but needs consistent application.
While specifying UTF-8 for reading is good, the writer should also use UTF-8.
- val source = scala.io.Source.fromFile(file, "UTF-8")
- val content = source.mkString
- source.close()
+ var source: scala.io.Source = null
+ try {
+ source = scala.io.Source.fromFile(file, "UTF-8")
+ content = source.mkString
+ } finally {
+ if (source != null) source.close()
+ }Committable suggestion skipped: line range outside the PR's diff.
## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>
## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>
## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>
## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Checklist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, backfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>
## Summary Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification - [x] always compile the whole repo - [x] teams thrift and teams python - [x] compile context as its own object - [ ] integration test on sample - [ ] progress bar for compile - [ ] sync to remote w/ progress bar - [ ] display workflow - [ ] display workflow progress ## Cheour clientslist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New CLI Commands** - Introduced streamlined commands for synchronizing, baour clientsfilling, and deploying operations for easier management. - **Enhanced Logging** - Improved colored, structured log outputs for clearer real-time monitoring and debugging. - **Configuration & Validation Upgrades** - Strengthened configuration management and validation processes to ensure reliable operations. - Added a comprehensive validation framework for Chronon API thrift objects. - **Build & Infrastructure Improvements** - Transitioned to a new container base and modernized testing/build systems for better performance and stability. - **Team & Utility Enhancements** - Expanded team configuration options and refined utility processes to streamline overall workflows. - Introduced new data classes and methods for improved configuration and compilation context management. - Enhanced templating functionality for dynamic replacements based on object properties. - Improved handling of Git operations and enhanced error logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ken Morton <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]> Co-authored-by: Kumar Teja Chippala <[email protected]>
Summary
Newer version with better ergonomics, and more maintainable codebase - smaller files, simpler logic - with little indirection etc with user facing simplification
Checklist
Summary by CodeRabbit
New CLI Commands
Enhanced Logging
Configuration & Validation Upgrades
Build & Infrastructure Improvements
Team & Utility Enhancements