Skip to content

Conversation

@nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented Jul 11, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Build System & Dependencies

    • Migrated to Bazel's bzlmod module system with a new MODULE.bazel file and removed the legacy WORKSPACE setup.
    • Upgraded Bazel to 8.3.1 and Scala to 2.12.20; updated Bazel and Scala configuration files accordingly.
    • Overhauled Maven dependency management with reorganized and expanded artifact lists, version updates, and improved repository handling.
    • Removed custom shading logic and related build rules; simplified dependency imports and repository references.
    • Updated Spark dependency exports to explicit Maven targets for clarity.
  • Testing

    • Improved robustness of test resource path resolution and environment variable handling across multiple test suites.
    • Removed an obsolete Java test class and updated test resource paths for consistency.
    • Added verification steps in select tests to ensure expected interactions.
    • Adjusted test assertions to conditionally check exception keys based on response content.
  • Chores

    • Cleaned up unused files and updated .gitignore to include Bazel module files.
    • Added new Bazel build rules for configuration, scalafmt export, and custom dependencies management.
    • Removed or refactored deprecated internal build scripts and utility files.
    • Temporarily disabled Scalafmt check in CI workflow with plans to re-enable later.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 11, 2025

Walkthrough

This change migrates the build system from legacy Bazel WORKSPACE and shading rules to Bazel's bzlmod module system. It upgrades Bazel and Scala versions, removes shaded library logic, deletes legacy dependency management files, hardcodes Scala versions, centralizes resource references, and updates dependency declarations to use Maven with strict version pinning.

Changes

Cohort / File(s) Change Summary
Bazel Build System Migration
.bazeliskrc, .bazelrc, .bazelversion, MODULE.bazel, WORKSPACE, BUILD.bazel, .gitignore
Upgrades Bazel to 8.3.1, enables bzlmod, adds .bazelversion, rewrites MODULE.bazel for module-based deps, removes WORKSPACE, and updates .gitignore to track Bazel mod files.
Scala Version and Repository Handling
tools/build_rules/scala_config.bzl, tools/build_rules/common.bzl, tools/build_rules/artifact.bzl, tools/build_rules/jar_library.bzl, tools/build_rules/prelude_bazel, tools/build_rules/jvm_binary.bzl, tools/build_rules/scala_junit_test_suite.bzl
Hardcodes Scala version to 2.12.20, updates all references to use @rules_scala, removes legacy Scala repo logic, and updates build rules to align with bzlmod and new Scala config.
Dependency Management Refactor
tools/build_rules/dependencies/defs.bzl, tools/build_rules/dependencies/all_repositories.bzl, tools/build_rules/dependencies/load_dependencies.bzl, tools/build_rules/dependencies/scala_repository.bzl
Deletes all legacy dependency management files and functions for Maven/Scala repositories and artifact parsing.
Extensions for Custom Deps
tools/build_rules/extensions/BUILD.bazel, tools/build_rules/extensions/custom_deps.bzl
Adds Bazel extension and build rule for custom dependencies (e.g., zlib) for use with bzlmod.
Maven Dependency and Artifact Updates
maven_install.json, tools/build_rules/spark/BUILD
Updates Maven dependency graph, versions, and artifacts; hardcodes Spark jar exports in build rules.
Shaded Library Removal
cloud_gcp/BUILD.bazel, spark/BUILD.bazel, tools/build_rules/artifact.bzl
Removes all create_shaded_library rules and logic for Bigtable and SnakeYAML, switches to direct Maven artifacts.
Test Resource Path and Env Handling
api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala, api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala, api/src/test/scala/ai/chronon/api/test/planner/LocalRunnerTest.scala, api/src/test/scala/ai/chronon/api/test/planner/StagingQueryPlannerTest.scala, cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala, spark/src/test/scala/ai/chronon/spark/test/submission/JobSubmitterTest.scala, spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala
Updates resource paths to new Bazel module layout, improves environment variable handling for robustness.
Test and Mock Updates
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala, spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetchTypesTest.java
Removes explicit mock-maker config in tests; deletes Java fetcher test class.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant Bazel as Bazel (bzlmod)
    participant Maven as Maven Repo
    participant Scala as Scala Rules
    participant Custom as Custom Deps Ext

    Dev->>Bazel: Run build/test
    Bazel->>MODULE.bazel: Parse module/deps config
    Bazel->>Maven: Fetch pinned artifacts
    Bazel->>Scala: Setup Scala 2.12.20 toolchain
    Bazel->>Custom: Register custom deps (e.g., zlib)
    Bazel->>Dev: Build/test output
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Possibly related PRs

Suggested reviewers

  • kumar-zlai

Poem

🛠️ The build is reborn, the modules aligned,
Shading is gone, old configs resigned.
Scala’s new version, dependencies tight,
Bazel’s bright modules now build it right.
With zlib and Maven, the future is clear—
Code rabbits rejoice, for modernity’s here! 🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nikhil/bzlmod_migration

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
tools/build_rules/dependencies/all_repositories.bzl (1)

14-18: get_repository is broken – iterates over callables.

maven_repository / scala_repository are functions, so repository.name raises AttributeError. Return the symbol directly or store repo objects.

-    for repository in all_repositories:
-        if repository.name == repository_name:
-            return repository
+    return {
+        "maven_repository": maven_repository,
+        "scala_repository": scala_repository,
+    }.get(repository_name)
🧹 Nitpick comments (3)
tools/build_rules/scala_junit_test_suite.bzl (1)

8-11: Docstring param mismatch.

Signature uses srcs; docstring still says src_glob. Tiny but confusing.

-        src_glob: A glob pattern to locate the Scala test files (e.g., "src/test/scala/**/*.scala").
+        srcs: List / glob of Scala test files (e.g., ["src/test/scala/**/*.scala"]).
tools/build_rules/dependencies/all_repositories.bzl (1)

1-1: Scala version comment stale.

Line says “Hardcoded Scala versions removed” but file never referenced them anymore; can drop the comment.

.bazelrc (1)

8-10: Scala 2.12.20 bump – ensure shaded/ABI-sensitive jars rebuilt.

Old 2.12.18 artifacts will silently mismatch at runtime.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 436bed2 and 75c0220.

📒 Files selected for processing (13)
  • .bazeliskrc (1 hunks)
  • .bazelrc (1 hunks)
  • cloud_gcp/BUILD.bazel (0 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2 hunks)
  • spark/BUILD.bazel (4 hunks)
  • tools/build_rules/artifact.bzl (2 hunks)
  • tools/build_rules/common.bzl (2 hunks)
  • tools/build_rules/dependencies/all_repositories.bzl (1 hunks)
  • tools/build_rules/dependencies/scala_repository.bzl (2 hunks)
  • tools/build_rules/jvm_binary.bzl (1 hunks)
  • tools/build_rules/prelude_bazel (1 hunks)
  • tools/build_rules/scala_junit_test_suite.bzl (1 hunks)
  • tools/build_rules/spark/BUILD (1 hunks)
💤 Files with no reviewable changes (1)
  • cloud_gcp/BUILD.bazel
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
tools/build_rules/scala_junit_test_suite.bzl (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
tools/build_rules/dependencies/all_repositories.bzl (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
tools/build_rules/jvm_binary.bzl (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (13)
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#657
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:93-97
Timestamp: 2025-04-21T15:10:40.819Z
Learning: The BigTableKVStoreImpl in the chronon codebase only interacts with 4 BigTable tables total, so unbounded caching in tableToContext is not a concern.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#22
File: hub/test/controllers/TimeSeriesControllerSpec.scala:40-49
Timestamp: 2024-09-27T18:47:26.941Z
Learning: When testing with mocked data, it's acceptable to not add detailed assertions on the data contents.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file `hub/app/store/DynamoDBMonitoringStore.scala`, within the `makeLoadedConfs` method, the `.recover` method is correctly applied to the `Try` returned by `response.values` to handle exceptions from the underlying store.
spark/BUILD.bazel (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
tools/build_rules/prelude_bazel (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
tools/build_rules/common.bzl (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
tools/build_rules/dependencies/scala_repository.bzl (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
tools/build_rules/artifact.bzl (2)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
tools/build_rules/spark/BUILD (3)
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
🪛 GitHub Actions: Test non-spark modules on scala 2.12
tools/build_rules/prelude_bazel

[error] 14-14: Bazel error loading package 'service': Unable to find package for unknown repo 'rules_scala' requested from //scala/scalafmt:phase_scalafmt_ext.bzl. The repository '@rules_scala' could not be resolved. WORKSPACE file is disabled by default in Bazel 8 and will be removed in Bazel 9. Please migrate to Bzlmod.

🪛 GitHub Actions: Test Bazel Config
tools/build_rules/prelude_bazel

[error] 5-5: Unable to find package for unknown repo 'rules_jvm_external'. The repository '@rules_jvm_external' could not be resolved. The WORKSPACE file is disabled by default in Bazel 8 and will be removed in Bazel 9. Please migrate to Bzlmod as per https://bazel.build/external/migration.

tools/build_rules/common.bzl

[error] 1-1: Unable to find package for unknown repo 'rules_jvm_external'. The repository '@rules_jvm_external' could not be resolved. The WORKSPACE file is disabled by default in Bazel 8 and will be removed in Bazel 9. Please migrate to Bzlmod as per https://bazel.build/external/migration.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: analyzer_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
🔇 Additional comments (15)
tools/build_rules/scala_junit_test_suite.bzl (1)

1-1: Update looks right – double-check repo name exists.

@rules_scala must be declared in MODULE.bazel/WORKSPACE; a missing repo alias will break every Scala rule.

.bazeliskrc (1)

1-1: Version bump ok – verify CI image.

Ensure CI runners already have Bazel 8.3.1 cached; otherwise builds may stall on download.

tools/build_rules/jvm_binary.bzl (1)

2-5: Path change fine – confirm rule locations.

advanced_usage/scala.bzl and scalafmt:phase_scalafmt_ext.bzl moved between releases; make sure these paths still exist in rules_scala 7.

.bazelrc (1)

3-5: Enabling bzlmod is great – watch for transitive repo name drift.

Some third-party rules still rely on legacy WORKSPACE macros; test a clean build.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (1)

257-257: LGTM - Mock maker simplification aligns with shading removal.

Removing explicit mock maker specification is consistent with the broader elimination of shading infrastructure.

Also applies to: 279-279

tools/build_rules/common.bzl (2)

3-4: LGTM - Hardcoded Scala version for bzlmod migration.

Consistent with the migration approach across build files.


16-16: LGTM - Updated to use hardcoded version.

Correctly references the new hardcoded _SCALA_MAJOR_VERSION.

spark/BUILD.bazel (1)

193-193: LGTM - Corrected relative paths and added allow_empty flag.

Paths are now correctly relative to the BUILD file location, and allow_empty = True handles missing test resources gracefully.

Also applies to: 203-203, 214-214, 224-224

tools/build_rules/dependencies/scala_repository.bzl (2)

4-5: LGTM - Hardcoded Scala version for bzlmod migration.

Version 2.12.20 is explicitly set, ensuring deterministic builds.


16-17: LGTM - Updated to use hardcoded version.

Correctly references the new hardcoded _SCALA_VERSION.

tools/build_rules/prelude_bazel (2)

3-3: LGTM - Updated comment for bzlmod migration.

Correctly reflects the new approach of hardcoding versions.


9-9: LGTM - Updated to rules_scala repository.

Consistent with rules_scala 6->7 upgrade.

Also applies to: 10-10, 13-13, 14-14

tools/build_rules/artifact.bzl (2)

7-9: LGTM!

Standard approach for bzlmod migration.


53-54: Scala artifact resolution verified

  • No BUILD files reference SCALA_REPOSITORY_NAME (only its definition and load in .bzl remain).
  • Routing Scala artifacts to MAVEN_REPOSITORY_NAME is safe.
tools/build_rules/spark/BUILD (1)

67-79: LGTM!

Explicit deps improve clarity.

@@ -1,6 +1,8 @@
load("@io_bazel_rules_scala_config//:config.bzl", "SCALA_MAJOR_VERSION", "SCALA_VERSION")
load("@rules_jvm_external//:defs.bzl", "artifact")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix bzlmod migration - repository resolution failing.

Pipeline shows @rules_jvm_external cannot be resolved, indicating incomplete bzlmod migration.


🏁 Script executed:

#!/bin/bash
# Check if MODULE.bazel exists and contains rules_jvm_external
if [ -f "MODULE.bazel" ]; then
    echo "MODULE.bazel exists"
    grep -n "rules_jvm_external" MODULE.bazel || echo "rules_jvm_external not found in MODULE.bazel"
else
    echo "MODULE.bazel not found"
fi

# Check if .bazelrc has bzlmod enabled
grep -n "common.*--enable_bzlmod" .bazelrc || echo "bzlmod not enabled in .bazelrc"

Length of output: 150


Add a MODULE.bazel to complete the bzlmod migration

bzlmod is enabled in .bazelrc (line 4), but there’s no MODULE.bazel at the repo root, so @rules_jvm_external won’t be resolved.

• Create a MODULE.bazel in the project root
• Add a bazel_dep entry for rules_jvm_external with the correct version and repo_name
• Ensure other bzlmod deps (if any) are listed there as well

🧰 Tools
🪛 GitHub Actions: Test Bazel Config

[error] 1-1: Unable to find package for unknown repo 'rules_jvm_external'. The repository '@rules_jvm_external' could not be resolved. The WORKSPACE file is disabled by default in Bazel 8 and will be removed in Bazel 9. Please migrate to Bzlmod as per https://bazel.build/external/migration.

🤖 Prompt for AI Agents
In tools/build_rules/common.bzl at line 1, the repository uses bzlmod but lacks
a MODULE.bazel file at the root, causing @rules_jvm_external not to resolve. To
fix this, create a MODULE.bazel file in the project root directory, add a
bazel_dep entry specifying the rules_jvm_external dependency with the correct
version and repo_name, and include any other bzlmod dependencies used in the
project.

load("@contrib_rules_jvm//docs:stardoc-input.bzl", "java_test_suite")
load("@io_bazel_rules_scala//scala:scala.bzl", "scala_test", "scala_test_suite")
load("//tools/build_rules:artifact.bzl", "maven_artifact", "maven_artifact_with_suffix", "scala_artifact", "create_shaded_library")
load("@rules_scala//scala:scala.bzl", "scala_test", "scala_test_suite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Critical: @rules_scala repository not resolvable.

Pipeline shows repository cannot be resolved, indicating incomplete bzlmod migration.


🏁 Script executed:

#!/bin/bash
# Check MODULE.bazel for rules_scala configuration
if [ -f "MODULE.bazel" ]; then
    echo "Checking MODULE.bazel for rules_scala:"
    grep -n "rules_scala" MODULE.bazel || echo "rules_scala not found in MODULE.bazel"
else
    echo "MODULE.bazel not found - bzlmod migration incomplete"
fi

# Check if WORKSPACE is disabled
if [ -f "WORKSPACE" ]; then
    echo "WORKSPACE file still exists"
fi

Length of output: 228


Incomplete bzlmod migration: rules_scala missing and WORKSPACE still present

No MODULE.bazel found to declare rules_scala, so Bazel can’t resolve @rules_scala. The leftover WORKSPACE also blocks bzlmod.

• Add a MODULE.bazel at repo root and register rules_scala
• Remove or disable WORKSPACE once MODULE.bazel is in place
• Affected file: tools/build_rules/prelude_bazel line 9 (load("@rules_scala//scala:scala.bzl",…))

🤖 Prompt for AI Agents
In tools/build_rules/prelude_bazel at line 9, the load statement references
@rules_scala, but there is no MODULE.bazel declaring rules_scala, causing Bazel
to fail resolving it. To fix this, create a MODULE.bazel file at the repository
root that registers rules_scala as a module, then remove or disable the existing
WORKSPACE file to enable bzlmod. This will allow the load statement in
prelude_bazel to correctly resolve @rules_scala.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75c0220 and 8ca2bdd.

📒 Files selected for processing (207)
  • .bazelversion (1 hunks)
  • WORKSPACE.bzlmod (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (6 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala (5 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/UniqueOrderByLimit.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/BucketedColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (6 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/DirectColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (5 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregationBuffer.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (3 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (2 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (5 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala (1 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala (1 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (3 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala (2 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (2 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/UniqueTopKTest.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (13 hunks)
  • api/src/main/scala/ai/chronon/api/DataPointer.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/DataRange.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/DataType.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (21 hunks)
  • api/src/main/scala/ai/chronon/api/ParametricMacro.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Row.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/DependencyResolver.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala (5 hunks)
  • api/src/main/scala/ai/chronon/api/planner/LocalRunner.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/RelevantLeftForJoinPart.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/StagingQueryPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/DataPointerTest.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/DataTypeConversionTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/DateMacroSpec.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala (1 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/AwsApiImpl.scala (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala (9 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternal.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryNative.scala (8 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (8 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (16 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (4 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/Spark2BigTableLoader.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (6 hunks)
  • flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (11 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkKafkaItemEventDriver.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/RichMetricsOperators.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/CustomSchemaSerDe.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/FlinkSerDeProvider.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/SchemaRegistrySerDe.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSource.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSourceProvider.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/KafkaFlinkSource.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (4 hunks)
  • flink/src/main/scala/ai/chronon/flink/validation/SparkExprEvalComparisonFn.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/validation/ValidationFlinkJob.scala (7 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (7 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/Trigger.scala (4 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/FlinkPubSubItemEventDriver.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubFlinkSource.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubSchemaSerDe.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/DeserializationSchemaWrapper.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/PubSubSource.scala (2 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (3 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (4 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (4 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (10 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/SchemaRegistryDeSerSchemaProviderSpec.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/validation/SparkExprEvalComparisonTest.scala (2 hunks)
  • flink/src/test/scala/ai/chronon/flink/validation/ValidationFlinkJobIntegrationTest.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (6 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystTransformBuilder.scala (7 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/JoinCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (6 hunks)
  • online/src/main/scala/ai/chronon/online/SparkInternalRowConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (12 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherMain.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (12 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (12 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/LRUCache.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (17 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/InstrumentedThreadPoolExecutor.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/Metrics.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/MetricsReporter.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/TTLCache.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SerDe.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (6 hunks)
  • online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/stats/TileDriftCalculator.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/AvroCompatibilityTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (3 hunks)
  • online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala (4 hunks)
  • online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
  • spark/BUILD.bazel (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (16 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/Comparison.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (20 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (21 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (12 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinDerivationJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (12 hunks)
  • spark/src/main/scala/ai/chronon/spark/KvRdd.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/LabelJoin.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/LocalDataLoader.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/LocalTableExporter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogUtils.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (18 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/DefaultFormatProvider.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/DeltaLake.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Format.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/FormatProvider.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Iceberg.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (11 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/SawtoothUdf.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/DataServer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareBaseJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/ConsistencyJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/PartitionRunner.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/SparkSessionBuilder.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/StorageClient.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (5 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala (1 hunks)
⛔ Files not processed due to max files limit (58)
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/SnapshotAggregator.scala
  • spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/TableTestUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/TestUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/AnalyzerTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/EvalTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/ShortNamesTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/BootstrapUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/DifferentPartitionColumnsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalWithGBDerivation.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/KeyMappingOverlappingFieldsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfSpec.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SkipBloomFilterJoinBackfillTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinSpec.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/submission/JobSubmitterTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/ApproxDistinctTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/HistogramTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/NullnessCountersAggregatorTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/UDAFSQLUsageTest.scala
  • tools/build_rules/extensions/BUILD.bazel
  • tools/build_rules/extensions/custom_deps.bzl
  • tools/build_rules/scala_config.bzl
✅ Files skipped from review due to trivial changes (176)
  • .bazelversion
  • WORKSPACE.bzlmod
  • api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala
  • api/src/main/scala/ai/chronon/api/planner/StagingQueryPlanner.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala
  • api/src/main/scala/ai/chronon/api/planner/RelevantLeftForJoinPart.scala
  • flink/src/main/scala/ai/chronon/flink/deser/CustomSchemaSerDe.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala
  • flink/src/main/scala/ai/chronon/flink/deser/FlinkSerDeProvider.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubSchemaSerDe.scala
  • api/src/main/scala/ai/chronon/api/planner/LocalRunner.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/BucketedColumnAggregator.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/DirectColumnAggregator.scala
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSource.scala
  • online/src/main/scala/ai/chronon/online/metrics/MetricsReporter.scala
  • api/src/test/scala/ai/chronon/api/test/CollectionExtensionsTest.scala
  • flink/src/main/scala/ai/chronon/flink/source/KafkaFlinkSource.scala
  • api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala
  • api/src/main/scala/ai/chronon/api/DataPointer.scala
  • spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregationBuffer.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • spark/src/main/scala/ai/chronon/spark/batch/SourceJob.scala
  • online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala
  • spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/Spark2BigTableLoader.scala
  • flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/DefaultFormatProvider.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkKafkaItemEventDriver.scala
  • api/src/test/scala/ai/chronon/api/test/RelevantLeftForJoinPartSpec.scala
  • online/src/main/scala/ai/chronon/online/metrics/InstrumentedThreadPoolExecutor.scala
  • online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/AsyncKVStoreWriterTest.scala
  • spark/src/main/scala/ai/chronon/spark/submission/StorageClient.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
  • api/src/test/scala/ai/chronon/api/test/DataTypeConversionTest.scala
  • online/src/test/scala/ai/chronon/online/test/TileCodecTest.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/DataServer.scala
  • online/src/main/scala/ai/chronon/online/fetcher/LRUCache.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/FlinkPubSubItemEventDriver.scala
  • flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • api/src/main/scala/ai/chronon/api/DataType.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/FormatProvider.scala
  • online/src/main/scala/ai/chronon/online/SparkInternalRowConversions.scala
  • online/src/main/scala/ai/chronon/online/metrics/TTLCache.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
  • flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala
  • spark/src/main/scala/ai/chronon/spark/LocalTableExporter.scala
  • spark/src/main/scala/ai/chronon/spark/stats/ConsistencyJob.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • flink/src/main/scala/ai/chronon/flink/RichMetricsOperators.scala
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala
  • online/src/test/scala/ai/chronon/online/test/FetcherCacheTest.scala
  • spark/src/main/scala/ai/chronon/spark/stats/PartitionRunner.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/PubSubSource.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherMain.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/AwsApiImpl.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala
  • api/src/test/scala/ai/chronon/api/test/DateMacroSpec.scala
  • online/src/main/scala/ai/chronon/online/serde/SerDe.scala
  • spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSourceProvider.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala
  • spark/src/main/scala/ai/chronon/spark/Comparison.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/SchemaRegistryDeSerSchemaProviderSpec.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • api/src/main/scala/ai/chronon/api/planner/DependencyResolver.scala
  • online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala
  • spark/src/main/scala/ai/chronon/spark/LocalDataLoader.scala
  • flink/src/test/scala/ai/chronon/flink/validation/ValidationFlinkJobIntegrationTest.scala
  • spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • online/src/main/scala/ai/chronon/online/JoinCodec.scala
  • spark/src/main/scala/ai/chronon/spark/stats/CompareBaseJob.scala
  • spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternal.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubFlinkSource.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala
  • online/src/main/scala/ai/chronon/online/stats/TileDriftCalculator.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • flink/src/main/scala/ai/chronon/flink/validation/ValidationFlinkJob.scala
  • online/src/test/scala/ai/chronon/online/test/AvroCompatibilityTest.scala
  • spark/src/main/scala/ai/chronon/spark/LabelJoin.scala
  • flink/src/main/scala/ai/chronon/flink/window/Trigger.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • spark/src/main/scala/ai/chronon/spark/JoinDerivationJob.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
  • online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • spark/src/main/scala/ai/chronon/spark/LogUtils.scala
  • online/src/main/scala/ai/chronon/online/metrics/Metrics.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • flink/src/main/scala/ai/chronon/flink/deser/SchemaRegistrySerDe.scala
  • spark/src/main/scala/ai/chronon/spark/submission/SparkSessionBuilder.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • spark/src/main/scala/ai/chronon/spark/KvRdd.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/Format.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala
  • spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala
  • spark/src/main/scala/ai/chronon/spark/join/SawtoothUdf.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • api/src/main/scala/ai/chronon/api/Row.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/DeltaLake.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryNative.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/Iceberg.scala
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/BUILD.bazel
🧰 Additional context used
🧠 Learnings (29)
📓 Common learnings
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
aggregator/src/main/scala/ai/chronon/aggregator/base/UniqueOrderByLimit.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#192
File: spark/src/main/scala/ai/chronon/spark/GroupBy.scala:296-299
Timestamp: 2025-01-09T17:57:34.451Z
Learning: In Spark SQL date handling:
- date_format() converts dates to strings (used for partition columns which need string format)
- to_date() converts strings to DateType (used when date operations are needed)
These are opposites and should not be standardized to use the same function.
api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala (3)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala (3)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala (4)
Learnt from: david-zlai
PR: zipline-ai/chronon#439
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala:198-206
Timestamp: 2025-03-07T20:41:11.525Z
Learning: In AWS EMR, the term "job" is ambiguous and can refer to either a Step (single Spark execution) or a JobFlow/Cluster. EMR operations typically require both a clusterId and a stepId, while the JobSubmitter interface expects a single jobId parameter.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/DeserializationSchemaWrapper.scala (3)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#924
File: flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/FastAckGrpcPubSubSubscriber.scala:52-61
Timestamp: 2025-07-02T14:54:50.615Z
Learning: In FastAckGrpcPubSubSubscriber class in Flink connectors, exceptions are intentionally swallowed after retries to prevent Flink application restarts. The error handling strategy returns empty sequences instead of propagating exceptions to maintain application stability.
flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
api/src/main/scala/ai/chronon/api/ParametricMacro.scala (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#192
File: spark/src/main/scala/ai/chronon/spark/GroupBy.scala:296-299
Timestamp: 2025-01-09T17:57:34.451Z
Learning: In Spark SQL date handling:
- date_format() converts dates to strings (used for partition columns which need string format)
- to_date() converts strings to DateType (used when date operations are needed)
These are opposites and should not be standardized to use the same function.
aggregator/src/test/scala/ai/chronon/aggregator/test/UniqueTopKTest.scala (3)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (4)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (4)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala (2)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:46-50
Timestamp: 2024-10-08T16:18:45.669Z
Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:46-50
Timestamp: 2024-10-07T15:17:18.494Z
Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (10)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:46-50
Timestamp: 2024-10-08T16:18:45.669Z
Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:46-50
Timestamp: 2024-10-07T15:17:18.494Z
Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
api/src/test/scala/ai/chronon/api/test/DataPointerTest.scala (3)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala (10)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file `hub/app/store/DynamoDBMonitoringStore.scala`, within the `makeLoadedConfs` method, the `.recover` method is correctly applied to the `Try` returned by `response.values` to handle exceptions from the underlying store.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.
spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: david-zlai
PR: zipline-ai/chronon#439
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala:198-206
Timestamp: 2025-03-07T20:41:11.525Z
Learning: In AWS EMR, the term "job" is ambiguous and can refer to either a Step (single Spark execution) or a JobFlow/Cluster. EMR operations typically require both a clusterId and a stepId, while the JobSubmitter interface expects a single jobId parameter.
flink/src/test/scala/ai/chronon/flink/validation/SparkExprEvalComparisonTest.scala (10)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#17
File: spark/src/test/scala/ai/chronon/spark/test/stats/drift/PrepareData.scala:157-164
Timestamp: 2024-10-25T04:40:54.469Z
Learning: In this codebase, using `return` statements in Scala code is acceptable when it serves the intended logic.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (10)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:29-60
Timestamp: 2025-01-24T23:55:30.256Z
Learning: In BigQuery integration, table existence check is performed outside the BigQueryFormat.createTable method, at a higher level in TableUtils.createTable.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: david-zlai
PR: zipline-ai/chronon#222
File: cloud_gcp/src/main/resources/additional-confs.yaml:3-3
Timestamp: 2025-01-15T21:00:35.574Z
Learning: The GCS bucket configuration `spark.chronon.table.gcs.temporary_gcs_bucket: "zl-warehouse"` should remain in the main `additional-confs.yaml` file, not in dev-specific configs.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-07T15:21:50.787Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.
online/src/main/scala/ai/chronon/online/CatalystTransformBuilder.scala (4)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#17
File: spark/src/test/scala/ai/chronon/spark/test/stats/drift/PrepareData.scala:157-164
Timestamp: 2024-10-25T04:40:54.469Z
Learning: In this codebase, using `return` statements in Scala code is acceptable when it serves the intended logic.
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (5)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
flink/src/main/scala/ai/chronon/flink/validation/SparkExprEvalComparisonFn.scala (3)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#793
File: spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala:95-106
Timestamp: 2025-05-25T15:57:30.687Z
Learning: Spark SQL's array_sort function requires INT casting in comparator expressions, even for timestamp differences. LONG casting is not supported in this context despite potential overflow concerns.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala (2)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#726
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala:456-461
Timestamp: 2025-05-02T16:19:11.001Z
Learning: When using Map-based tags with metrics reporting in Scala, values that need to be evaluated (like object properties or method calls) should not be enclosed in quotes to ensure the actual value is used rather than the literal string.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (5)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (8)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:46-50
Timestamp: 2024-10-07T15:17:18.494Z
Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: online/src/main/scala/ai/chronon/online/Api.scala:46-50
Timestamp: 2024-10-08T16:18:45.669Z
Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file `hub/app/store/DynamoDBMonitoringStore.scala`, within the `makeLoadedConfs` method, the `.recover` method is correctly applied to the `Try` returned by `response.values` to handle exceptions from the underlying store.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/SearchController.scala:46-47
Timestamp: 2024-10-15T15:20:23.362Z
Learning: In `SearchController.scala`, the `monitoringStore.getModels` function retrieves items from an in-memory cache, so aside from the very first call, it should be a quick operation.
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala (2)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
🧬 Code Graph Analysis (9)
api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala (1)
api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (1)
  • foreach (13-21)
api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala (1)
api/src/main/scala/ai/chronon/api/CollectionExtensions.scala (1)
  • foreach (13-21)
api/src/main/scala/ai/chronon/api/ParametricMacro.scala (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • partitionSpec (1231-1236)
  • query (390-398)
aggregator/src/test/scala/ai/chronon/aggregator/test/UniqueTopKTest.scala (1)
api/src/main/scala/ai/chronon/api/DataType.scala (6)
  • StructType (240-269)
  • StructType (271-275)
  • StructField (224-224)
  • StringType (213-213)
  • LongType (201-201)
  • IntType (199-199)
online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (3)
online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (2)
  • OnlineDerivationUtil (13-140)
  • buildDerivationFunction (53-73)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (6)
  • toScala (16-22)
  • toScala (32-38)
  • toScala (41-43)
  • toScala (52-54)
  • toScala (62-68)
  • toScala (80-86)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
  • cleanName (160-160)
api/src/test/scala/ai/chronon/api/test/DataPointerTest.scala (1)
api/src/main/scala/ai/chronon/api/DataPointer.scala (3)
  • URIDataPointer (14-23)
  • DataPointer (4-12)
  • DataPointer (31-101)
flink/src/test/scala/ai/chronon/flink/validation/SparkExprEvalComparisonTest.scala (1)
api/src/main/java/ai/chronon/api/thrift/Option.java (1)
  • Some (93-111)
api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (5)
api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala (1)
  • buildPlan (111-125)
api/src/main/scala/ai/chronon/api/planner/MonolithJoinPlanner.scala (1)
  • buildPlan (76-85)
api/src/main/scala/ai/chronon/api/planner/StagingQueryPlanner.scala (1)
  • buildPlan (17-37)
api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (1)
  • buildPlan (239-239)
api/src/test/scala/ai/chronon/api/test/planner/ConfPlannerTest.scala (1)
  • buildPlan (15-15)
online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)
api/src/main/scala/ai/chronon/api/DataType.scala (2)
  • StructField (224-224)
  • IntType (199-199)
🔇 Additional comments (72)
aggregator/src/main/scala/ai/chronon/aggregator/base/UniqueOrderByLimit.scala (2)

11-15: LGTM - Formatting improvement.

Multi-line parameter formatting enhances readability.


18-24: LGTM - Consistent formatting.

Parameter alignment matches the State case class style.

api/src/main/scala/ai/chronon/api/QueryUtils.scala (1)

47-52: LGTM! Good formatting improvement.

Multi-line parameter formatting enhances readability.

flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/DeserializationSchemaWrapper.scala (1)

22-23: Formatting improvement looks good.

Clean multi-line formatting for the exception.

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (1)

177-191: LGTM! Clean formatting improvement.

The multi-line formatting enhances readability of the method chaining.

spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala (1)

19-25: LGTM - formatting improvement enhances readability.

Clean multi-line parameter formatting with consistent indentation.

online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (2)

25-25: Import reorganization looks good.

Clean import formatting aligns with the broader codebase standardization effort.


50-53: Improved readability with multi-line formatting.

The require statement reformatting enhances code readability while maintaining identical functionality.

api/src/test/scala/ai/chronon/api/test/DataPointerTest.scala (1)

17-25: LGTM! Formatting improvements enhance readability.

The multi-line formatting of URIDataPointer constructor calls improves code readability with no functional changes.

Also applies to: 44-47, 51-54

online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala (3)

6-6: LGTM: Import consolidation.

Clean single-line import formatting.


55-55: LGTM: Cleaner test assertions.

Removing unnecessary parentheses improves readability.

Also applies to: 58-58, 62-62, 82-82


125-135: LGTM: Better structure formatting.

Multi-line formatting significantly improves readability of nested structures.

api/src/main/scala/ai/chronon/api/ParametricMacro.scala (3)

68-80: LGTM - Clear documentation improvement.

Enhanced ScalaDoc with detailed parameter descriptions and usage examples.


112-124: LGTM - Comprehensive documentation with example.

The SQL query example clearly demonstrates macro usage patterns.


126-127: LGTM - Consistent formatting.

Multi-line parameter formatting aligns with codebase standards.

online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala (3)

24-24: Import reordering looks good.


46-50: Multi-line formatting improves readability.


124-125: Formatting enhancement approved.

api/src/main/scala/ai/chronon/api/DataRange.scala (1)

80-81: LGTM: Formatting improvement enhances readability.

The multi-line formatting makes the concatenation logic clearer.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (6)

3-3: Import consolidation improves readability.


21-36: ScalaDoc formatting enhancement.


87-88: Exception message formatting improved.


96-97: Exception message formatting improved.


113-118: Map construction formatting enhanced.


144-149: Method signature formatting improved.

api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (2)

11-15: LGTM - Improved Scaladoc formatting


20-24: LGTM - Better parameter list formatting

api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala (1)

146-146: LGTM - Cleaner lambda syntax

api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala (1)

79-79: LGTM - Cleaner lambda syntax

online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala (1)

62-82: LGTM! Formatting improvements enhance readability.

Multi-line StructType constructors are easier to read and maintain.

aggregator/src/test/scala/ai/chronon/aggregator/test/UniqueTopKTest.scala (1)

73-80: LGTM! Consistent formatting improvements.

Multi-line StructType constructors improve readability across test files.

Also applies to: 105-111

aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (1)

96-101: LGTM! Clean formatting improvements.

Multi-line constructor parameters and consistent indentation enhance readability.

Also applies to: 135-136

cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala (12)

55-62: LGTM - Parameter formatting improvement.

Multi-line parameter formatting with trailing commas enhances readability and version control diffs.


76-77: LGTM - Exception formatting improvement.

Clean multi-line exception formatting improves readability.


83-84: LGTM - Method call formatting improvement.

Consistent multi-line method call formatting.


88-93: LGTM - Properties formatting improvement.

Multi-line Map formatting with trailing commas improves maintainability.


104-108: LGTM - Method call formatting improvement.

Consistent multi-line getOrElse formatting enhances readability.


126-127: LGTM - Builder pattern formatting improvement.

Clean multi-line build method formatting.


147-152: LGTM - Method signature formatting improvement.

Multi-line parameter formatting with trailing commas follows Scala best practices.


180-186: LGTM - Override method formatting improvement.

Clean multi-line parameter formatting for the submit method.


201-202: LGTM - Method call formatting improvement.

Consistent multi-line method invocation formatting.


207-208: LGTM - String interpolation formatting improvement.

Clean multi-line string formatting improves readability.


224-225: LGTM - Print statement formatting improvement.

Consistent multi-line string formatting.


246-251: LGTM - Constructor call formatting improvement.

Multi-line constructor formatting with trailing commas enhances readability.

spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala (2)

73-78: Formatting improvement enhances readability.

Multi-line parameter formatting improves code clarity.


102-111: Consistent formatting applied to method signature and assertion.

Multi-line formatting improves readability of both method parameters and assertion logic.

spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala (6)

38-48: Constructor formatting improved for readability.

Multi-line parameter formatting enhances code clarity.


225-226: Log message formatting improved.

Multi-line formatting enhances readability of error messages.


242-246: For-comprehension formatting adopts idiomatic Scala style.

Multi-line formatting improves readability and follows Scala best practices.


302-307: Constructor formatting improved for readability.

Multi-line parameter formatting enhances code clarity.


351-352: Closing brackets properly formatted.

Consistent formatting style applied.


363-369: Method signature formatting improved.

Multi-line parameter formatting enhances readability.

api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1)

89-90: LGTM - formatting improvement.

The multi-line formatting enhances readability.

online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1)

55-58: LGTM - Clean formatting improvement

Multi-line format enhances readability of the assertion.

online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala (1)

19-20: LGTM - Consistent formatting

Multi-line parameter formatting improves readability.

online/src/main/scala/ai/chronon/online/CatalystTransformBuilder.scala (4)

5-6: LGTM - Clean import consolidation

Grouped imports improve readability.


27-29: LGTM - Consistent ScalaDoc formatting

Multi-line format enhances documentation readability.


169-171: LGTM - Improved logging readability

Multi-line string formatting makes the log message clearer.


380-404: LGTM - Consistent parameter formatting

Multi-line parameter lists improve readability for complex method signatures.

online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (4)

45-50: LGTM - Consistent case class formatting

Multi-line parameter formatting improves readability.


83-97: LGTM - Clean ScalaDoc formatting

Consistent documentation style enhances readability.


103-114: LGTM - Improved constructor formatting

Multi-line parameter lists make complex constructors more readable.


261-273: LGTM - Consistent method formatting

Multi-line parameter formatting improves code readability.

flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (1)

183-184: LGTM - formatting improvement.

Multi-line string formatting improves readability.

flink/src/test/scala/ai/chronon/flink/validation/SparkExprEvalComparisonTest.scala (2)

34-42: LGTM - test data formatting improvement.

Multi-line Map formatting enhances readability.

Also applies to: 44-52


106-114: LGTM - consistent formatting.

Multi-line Map formatting matches the pattern above.

Also applies to: 116-124

flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (3)

16-16: LGTM - import organization.

UUID added to import list for clarity.


78-84: LGTM - method signature formatting.

Multi-line parameter formatting improves readability.


95-102: LGTM - consistent test data formatting.

Multi-line Map formatting enhances readability across all test cases.

Also applies to: 110-116, 124-131, 139-147, 157-165, 172-180, 187-194, 201-208

flink/src/main/scala/ai/chronon/flink/validation/SparkExprEvalComparisonFn.scala (3)

8-14: LGTM - case class formatting.

Multi-line parameter formatting improves readability.


28-38: LGTM - ScalaDoc formatting.

Multi-line ScalaDoc style is more idiomatic.


39-43: LGTM - method signature formatting.

Multi-line parameter formatting enhances readability.

else
throw new UnsupportedOperationException(
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes")
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix calculation error in exception message.

The minutes calculation is incorrect: spanMillis / 60 * 1000 should be spanMillis / (60 * 1000).

-        s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes"
+        s"Partition Intervals should be either hour or day - found ${spanMillis / (60 * 1000)} minutes"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s"Partition Intervals should be either hour or day - found ${spanMillis / 60 * 1000} minutes"
s"Partition Intervals should be either hour or day - found ${spanMillis / (60 * 1000)} minutes"
🤖 Prompt for AI Agents
In api/src/main/scala/ai/chronon/api/PartitionSpec.scala at line 89, the
calculation for minutes in the exception message is incorrect because it divides
spanMillis by 60 and then multiplies by 1000, which does not convert
milliseconds to minutes properly. Fix this by changing the calculation to divide
spanMillis by (60 * 1000) to correctly convert milliseconds to minutes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca2bdd and a7b0d36.

⛔ Files ignored due to path filters (1)
  • MODULE.bazel.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • .gitignore (1 hunks)
  • MODULE.bazel (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
.gitignore (3)
Learnt from: chewy-zlai
PR: zipline-ai/chronon#46
File: docker-init/generate_anomalous_data.py:0-0
Timestamp: 2024-10-15T19:03:19.403Z
Learning: If `generate_anomalous_data.py` contains unintended changes due to accidental commits, disregard reviewing this file unless instructed otherwise.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user:37-38
Timestamp: 2024-10-03T17:12:58.693Z
Learning: In this project, certain JSON files are automatically generated, so suggestions to modify them manually may not be applicable.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/production/joins/risk/user_transactions.txn_join:217-218
Timestamp: 2024-10-03T17:15:03.325Z
Learning: The JSON files in this project are automatically generated and should not be manually modified or refactored.
MODULE.bazel (1)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: fetcher_tests
🔇 Additional comments (2)
.gitignore (1)

111-112: LGTM! Correctly enables tracking of MODULE.bazel files for bzlmod migration.

MODULE.bazel (1)

1-286: Comprehensive bzlmod configuration successfully replaces WORKSPACE dependencies.

The module configuration is well-structured with proper Scala versioning, extensive Maven dependencies, and conflict exclusions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
MODULE.bazel (2)

47-258: Clean up formatting inconsistencies in artifacts list.

The artifacts list has inconsistent empty line spacing that affects readability.

Consider standardizing the spacing between logical groups of dependencies for better maintainability.


273-276: Clarify the Scala artifacts inclusion comment.

The comment about including Scala artifacts in the main Maven repository could be more explicit about why this differs from the legacy approach.

-        # Note: Include scala artifacts in main maven repository for bzlmod
-        # "org.scala-lang:scala-library",
-        # "org.scala-lang:scala-reflect",
+        # Note: Scala artifacts are included in main Maven repository for bzlmod compatibility
+        # (previously excluded in WORKSPACE-based setup)
+        # "org.scala-lang:scala-library",
+        # "org.scala-lang:scala-reflect",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0336f6a and deda660.

⛔ Files ignored due to path filters (1)
  • MODULE.bazel.lock is excluded by !**/*.lock
📒 Files selected for processing (181)
  • BUILD.bazel (1 hunks)
  • MODULE.bazel (6 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/UniqueOrderByLimit.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/BucketedColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (5 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/DirectColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (6 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregationBuffer.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (3 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (2 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala (1 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala (1 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/Builders.scala (6 hunks)
  • api/src/main/scala/ai/chronon/api/DataPointer.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (16 hunks)
  • api/src/main/scala/ai/chronon/api/ParametricMacro.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Row.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/DependencyResolver.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala (5 hunks)
  • api/src/main/scala/ai/chronon/api/planner/LocalRunner.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/RelevantLeftForJoinPart.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/AwsApiImpl.scala (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala (9 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternal.scala (3 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryNative.scala (8 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (12 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (17 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (5 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/Spark2BigTableLoader.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala (5 hunks)
  • flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala (10 hunks)
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/SchemaRegistrySerDe.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSource.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSourceProvider.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/KafkaFlinkSource.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala (5 hunks)
  • flink/src/main/scala/ai/chronon/flink/validation/SparkExprEvalComparisonFn.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/validation/ValidationFlinkJob.scala (7 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (4 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/Trigger.scala (4 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubFlinkSource.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/PubSubSource.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala (3 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala (4 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala (4 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala (10 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/deser/SchemaRegistryDeSerSchemaProviderSpec.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/validation/ValidationFlinkJobIntegrationTest.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (6 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystTransformBuilder.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/JoinCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala (11 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherMain.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala (10 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala (12 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/LRUCache.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (16 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/InstrumentedThreadPoolExecutor.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/Metrics.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/MetricsReporter.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/TTLCache.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SerDe.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (6 hunks)
  • online/src/main/scala/ai/chronon/online/stats/TileDriftCalculator.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala (3 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (15 hunks)
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/Comparison.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (20 hunks)
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (20 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/Join.scala (12 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinDerivationJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (13 hunks)
  • spark/src/main/scala/ai/chronon/spark/KvRdd.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/LabelJoin.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/LocalDataLoader.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogUtils.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (17 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/DeltaLake.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Format.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/FormatProvider.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/Iceberg.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala (11 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/SawtoothUdf.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala (7 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/DataServer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareBaseJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/PartitionRunner.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala (8 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/SparkSessionBuilder.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/StorageClient.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (5 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/SnapshotAggregator.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala (6 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/TableTestUtils.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/TestUtils.scala (9 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/EvalTest.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (18 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala (15 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala (25 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala (2 hunks)
✅ Files skipped from review due to trivial changes (48)
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/Spark2BigTableLoader.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala
  • api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala
  • BUILD.bazel
  • online/src/main/scala/ai/chronon/online/metrics/InstrumentedThreadPoolExecutor.scala
  • online/src/main/scala/ai/chronon/online/stats/TileDriftCalculator.scala
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsTest.scala
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSourceProvider.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SelectedJoinPartsTest.scala
  • spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
  • spark/src/test/scala/ai/chronon/spark/test/StagingQueryTest.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/TableBootstrapTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/JoinUtilsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala
  • online/src/main/scala/ai/chronon/online/metrics/Metrics.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • spark/src/test/scala/ai/chronon/spark/test/TableTestUtils.scala
  • spark/src/main/scala/ai/chronon/spark/batch/StagingQuery.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/StreamingTest.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala
  • flink/src/main/scala/ai/chronon/flink/deser/SchemaRegistrySerDe.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
  • spark/src/test/scala/ai/chronon/spark/test/TableUtilsFormatTest.scala
  • api/src/main/scala/ai/chronon/api/Builders.scala
  • spark/src/test/scala/ai/chronon/spark/test/SnapshotAggregator.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByTest.scala
  • flink/src/main/scala/ai/chronon/flink/FlinkJob.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/FeatureWithLabelJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/streaming/MutationsTest.scala
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/EvalTest.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • spark/src/main/scala/ai/chronon/spark/join/SawtoothUdf.scala
  • spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/TestUtils.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherCache.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala
🚧 Files skipped from review as they are similar to previous changes (130)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala
  • online/src/main/scala/ai/chronon/online/metrics/MetricsReporter.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/PubSubSource.scala
  • api/src/main/scala/ai/chronon/api/planner/RelevantLeftForJoinPart.scala
  • api/src/main/scala/ai/chronon/api/DataPointer.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSource.scala
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala
  • flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetchContext.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEventIntegrationTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregationBuffer.scala
  • flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala
  • spark/src/main/scala/ai/chronon/spark/stats/CompareBaseJob.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
  • flink/src/main/scala/ai/chronon/flink/source/KafkaFlinkSource.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/UniqueOrderByLimit.scala
  • online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • online/src/main/scala/ai/chronon/online/serde/SerDe.scala
  • spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala
  • online/src/main/scala/ai/chronon/online/CatalystTransformBuilder.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/BucketedColumnAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala
  • online/src/main/scala/ai/chronon/online/JoinCodec.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/SchemaRegistryDeSerSchemaProviderSpec.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/DirectColumnAggregator.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/stats/PartitionRunner.scala
  • online/src/main/scala/ai/chronon/online/metrics/TTLCache.scala
  • api/src/main/scala/ai/chronon/api/planner/LocalRunner.scala
  • spark/src/main/scala/ai/chronon/spark/submission/StorageClient.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/DataServer.scala
  • online/src/test/scala/ai/chronon/online/test/stats/DriftMetricsTest.scala
  • flink/src/main/scala/ai/chronon/flink/deser/ChrononDeserializationSchema.scala
  • spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/AwsApiImpl.scala
  • spark/src/main/scala/ai/chronon/spark/stats/CompareJob.scala
  • spark/src/main/scala/ai/chronon/spark/Comparison.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/FormatProvider.scala
  • spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala
  • flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala
  • spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala
  • api/src/main/scala/ai/chronon/api/QueryUtils.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala
  • online/src/main/scala/ai/chronon/online/fetcher/LRUCache.scala
  • spark/src/main/scala/ai/chronon/spark/LocalDataLoader.scala
  • spark/src/main/scala/ai/chronon/spark/JoinDerivationJob.scala
  • api/src/main/scala/ai/chronon/api/planner/DependencyResolver.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala
  • spark/src/main/scala/ai/chronon/spark/submission/SparkSessionBuilder.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • flink/src/main/scala/ai/chronon/flink/window/Trigger.scala
  • online/src/main/scala/ai/chronon/online/ExternalSourceRegistry.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala
  • api/src/main/scala/ai/chronon/api/planner/ConfPlanner.scala
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • flink/src/test/scala/ai/chronon/flink/validation/ValidationFlinkJobIntegrationTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkJobEntityIntegrationTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/deser/CatalystUtilComplexAvroTest.scala
  • spark/src/main/scala/ai/chronon/spark/JoinBase.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
  • spark/src/main/scala/ai/chronon/spark/stats/CompareMetrics.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryExternal.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/CreationUtils.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinBootstrapJob.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByFetcher.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • spark/src/main/scala/ai/chronon/spark/LabelJoin.scala
  • spark/src/main/scala/ai/chronon/spark/batch/MergeJob.scala
  • online/src/main/scala/ai/chronon/online/fetcher/JoinPartFetcher.scala
  • spark/src/test/scala/ai/chronon/spark/test/CompareTest.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Expressions.scala
  • spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroConversions.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubFlinkSource.scala
  • flink/src/main/scala/ai/chronon/flink/validation/SparkExprEvalComparisonFn.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala
  • api/src/main/scala/ai/chronon/api/ParametricMacro.scala
  • spark/src/main/scala/ai/chronon/spark/BootstrapInfo.scala
  • online/src/main/scala/ai/chronon/online/fetcher/FetcherMain.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala
  • spark/src/main/scala/ai/chronon/spark/Extensions.scala
  • spark/src/main/scala/ai/chronon/spark/LogUtils.scala
  • spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/Format.scala
  • online/src/main/scala/ai/chronon/online/fetcher/Fetcher.scala
  • spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala
  • spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala
  • flink/src/main/scala/ai/chronon/flink/validation/ValidationFlinkJob.scala
  • flink/src/main/scala/ai/chronon/flink/types/FlinkTypes.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/DeltaLake.scala
  • spark/src/main/scala/ai/chronon/spark/KvRdd.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/Hive.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/Iceberg.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothOnlineAggregator.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
  • spark/src/main/scala/ai/chronon/spark/Join.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • api/src/main/scala/ai/chronon/api/Row.scala
  • spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala
  • flink/src/test/scala/ai/chronon/flink/test/FlinkTestUtils.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryNative.scala
  • online/src/main/scala/ai/chronon/online/fetcher/GroupByResponseHandler.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (6)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (9)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file `hub/app/store/DynamoDBMonitoringStore.scala`, within the `makeLoadedConfs` method, the `.recover` method is correctly applied to the `Try` returned by `response.values` to handle exceptions from the underlying store.
MODULE.bazel (4)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: batch_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: online_tests
  • GitHub Check: flink_tests
  • GitHub Check: api_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (14)
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (4)

272-274: Good formatting improvement.

Method parameter list reformatted to multiline style for better readability.


299-305: Good formatting improvement.

Method parameter list reformatted to multiline style for better readability.


319-320: Good formatting improvement.

Chain method call reformatted for better readability.


326-329: Good formatting improvement.

Require statement reformatted to multiline style for better readability.

spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (5)

225-226: LGTM - Clean formatting improvement


232-233: LGTM - Clean formatting improvement


249-250: LGTM - Clean formatting improvement


292-293: LGTM - Clean formatting improvement


296-297: LGTM - Clean formatting improvement

MODULE.bazel (5)

1-11: Module configuration looks good.

Core Bazel dependencies are properly declared with appropriate versions.


13-19: Python toolchain configuration is correct.

Setting ignore_root_user_error = True is appropriate for CI environments.


21-40: Scala configuration is properly structured.

Scala version 2.12.20 is consistently used throughout the configuration.


277-285: Maven repository configuration is well-structured.

The repository URLs, conflict policy, and fetch settings are appropriate for the migration.


289-293: Custom extensions setup is correct.

The legacy dependency handling through custom extensions maintains compatibility during the migration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
MODULE.bazel (1)

1-4: Consider using semantic versioning for the module.

Using "0.0.0" as the module version might not be ideal for tracking and dependency management.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between deda660 and 8fb1ba2.

⛔ Files ignored due to path filters (1)
  • MODULE.bazel.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • MODULE.bazel (7 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
MODULE.bazel (5)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: service_tests
  • GitHub Check: online_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: flink_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (5)
MODULE.bazel (5)

51-52: Scala version consistency looks good.

The Scala version 2.12.20 is consistently used throughout the file.


95-103: Spark version consistency resolved.

All Spark dependencies now use version 3.5.3 consistently, addressing the previous review concern.


261-278: Exclusions and Maven configuration look comprehensive.

The excluded artifacts properly avoid common conflicts, and the commented Scala artifacts inclusion for bzlmod is noted correctly.


279-287: Maven repository configuration follows best practices.

The strict version pinning, source fetching, and repository list are well-configured for reproducible builds.


164-169: Prometheus exporter v1.x is alpha-only

Maven Central shows that opentelemetry-exporter-prometheus only has stable releases up to 0.13.1; all 1.x versions (including 1.49.0) are published as alpha. Using 1.49.0-alpha alongside the core 1.49.0 modules is therefore expected. No changes needed.

@nikhil-zlai nikhil-zlai force-pushed the nikhil/bzlmod_migration branch 5 times, most recently from e4de28e to 950068c Compare July 18, 2025 02:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
api/python/test/canary/compiled/joins/gcp/item_event_join.canary_streaming_v1 (1)

9-9: Fix duplicate airflow dependency.

🧹 Nitpick comments (5)
spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationLoggingTest.scala (1)

60-181: Consider breaking down this test method.

The method is functional but quite long. Consider extracting setup logic into helper methods for better readability.

api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1 (1)

16-17: Address TODO placeholders.

Replace placeholder values with actual configuration.

Also applies to: 35-36

spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)

145-159: Simplify complex map transformation.

Consider extracting this logic to a helper method for clarity.

-      val inputTablesToRange = Option(metadata.executionInfo.getTableDependencies)
-        .map(_.asScala.toArray)
-        .getOrElse(Array.empty)
-        .map((td) => {
-          val inputPartSpec =
-            Option(td.getTableInfo).map(_.partitionSpec(tableUtils.partitionSpec)).getOrElse(tableUtils.partitionSpec)
-          td.getTableInfo.table -> DependencyResolver.computeInputRange(range, td).map(_.translate(inputPartSpec))
-        })
-        .toMap
+      val inputTablesToRange = computeInputTableRanges(metadata, range, tableUtils)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (2)

152-157: Add parameter documentation.

Document the purpose of each parameter, especially enableTiling.

+  /**
+   * Compute a join until endDs and compare the result of fetching the aggregations with the computed join values.
+   * @param joinConf Join configuration to test
+   * @param endDs End date string for the test
+   * @param namespace Namespace for test data
+   * @param consistencyCheck Whether to run consistency validation
+   * @param dropDsOnWrite Whether to drop ds column when writing
+   * @param enableTiling Whether to enable tiling functionality
+   */
   def compareTemporalFetch(joinConf: api.Join,

223-246: Consider extracting consistency check logic.

This section could be a separate method for clarity.

     if (consistencyCheck) {
-      val lagMs = -100000
-      val laggedRequests = buildRequests(lagMs)
-      // ... rest of the consistency check logic
+      runConsistencyCheck(spark, joinConf, buildRequests, mockApi, tableUtils, inMemoryKvStore, today)
     }

"common": {},
"modeClusterConfigs": {
"upload": {
"dataproc.config": "{\"gceClusterConfig\": {\"subnetworkUri\": \"default\", \"serviceAccount\": \"[email protected]\", \"serviceAccountScopes\": [\"https://www.googleapis.com/auth/cloud-platform\", \"https://www.googleapis.com/auth/cloud.useraccounts.readonly\", \"https://www.googleapis.com/auth/devstorage.read_write\", \"https://www.googleapis.com/auth/logging.write\"], \"metadata\": {\"hive-version\": \"3.1.2\", \"SPARK_BQ_CONNECTOR_URL\": \"gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar\", \"artifact_prefix\": \"gs://zipline-artifacts-canary\"}, \"tags\": []}, \"masterConfig\": {\"numInstances\": 1, \"machineTypeUri\": \"n2-highmem-8\", \"diskConfig\": {\"bootDiskType\": \"pd-standard\", \"bootDiskSizeGb\": 1024}}, \"workerConfig\": {\"numInstances\": 2, \"machineTypeUri\": \"n2-highmem-4\", \"diskConfig\": {\"bootDiskType\": \"pd-standard\", \"bootDiskSizeGb\": 64, \"numLocalSsds\": 2}}, \"softwareConfig\": {\"imageVersion\": \"2.2.50-debian12\", \"optionalComponents\": [\"FLINK\", \"JUPYTER\"], \"properties\": {}}, \"initializationActions\": [{\"executable_file\": \"gs://zipline-artifacts-canary/scripts/copy_java_security.sh\"}], \"endpointConfig\": {\"enableHttpPortAccess\": true}, \"lifecycleConfig\": {\"idleDeleteTtl\": \"7200s\"}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify Spark BigQuery connector version.

Ensure the connector version is compatible and up-to-date.


🌐 Web query:

What is the latest version of the Spark BigQuery connector for Spark 3.5?

💡 Result:

The latest version of the Spark BigQuery connector compatible with Spark 3.5 is 0.42.2. This version is available for download at:

  • Direct download link: gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.2.jar

Alternatively, you can include the connector in your project using Maven with the following dependency:

<dependency>
  <groupId>com.google.cloud.spark</groupId>
  <artifactId>spark-3.5-bigquery</artifactId>
  <version>0.42.2</version>
</dependency>

For more details and additional usage instructions, refer to the official GitHub repository: (github.com)

Citations:


Update Spark BigQuery connector to 0.42.2

The hardcoded connector URL is still at 0.42.1; please bump to the latest 0.42.2.

Locations:

  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1 (line 117)

Suggested diff:

- "SPARK_BQ_CONNECTOR_URL": "gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar"
+ "SPARK_BQ_CONNECTOR_URL": "gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.2.jar"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"dataproc.config": "{\"gceClusterConfig\": {\"subnetworkUri\": \"default\", \"serviceAccount\": \"[email protected]\", \"serviceAccountScopes\": [\"https://www.googleapis.com/auth/cloud-platform\", \"https://www.googleapis.com/auth/cloud.useraccounts.readonly\", \"https://www.googleapis.com/auth/devstorage.read_write\", \"https://www.googleapis.com/auth/logging.write\"], \"metadata\": {\"hive-version\": \"3.1.2\", \"SPARK_BQ_CONNECTOR_URL\": \"gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar\", \"artifact_prefix\": \"gs://zipline-artifacts-canary\"}, \"tags\": []}, \"masterConfig\": {\"numInstances\": 1, \"machineTypeUri\": \"n2-highmem-8\", \"diskConfig\": {\"bootDiskType\": \"pd-standard\", \"bootDiskSizeGb\": 1024}}, \"workerConfig\": {\"numInstances\": 2, \"machineTypeUri\": \"n2-highmem-4\", \"diskConfig\": {\"bootDiskType\": \"pd-standard\", \"bootDiskSizeGb\": 64, \"numLocalSsds\": 2}}, \"softwareConfig\": {\"imageVersion\": \"2.2.50-debian12\", \"optionalComponents\": [\"FLINK\", \"JUPYTER\"], \"properties\": {}}, \"initializationActions\": [{\"executable_file\": \"gs://zipline-artifacts-canary/scripts/copy_java_security.sh\"}], \"endpointConfig\": {\"enableHttpPortAccess\": true}, \"lifecycleConfig\": {\"idleDeleteTtl\": \"7200s\"}}"
"dataproc.config": "{\"gceClusterConfig\": {\"subnetworkUri\": \"default\", \"serviceAccount\": \"[email protected]\", \"serviceAccountScopes\": [\"https://www.googleapis.com/auth/cloud-platform\", \"https://www.googleapis.com/auth/cloud.useraccounts.readonly\", \"https://www.googleapis.com/auth/devstorage.read_write\", \"https://www.googleapis.com/auth/logging.write\"], \"metadata\": {\"hive-version\": \"3.1.2\", \"SPARK_BQ_CONNECTOR_URL\": \"gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.2.jar\", \"artifact_prefix\": \"gs://zipline-artifacts-canary\"}, \"tags\": []}, \"masterConfig\": {\"numInstances\": 1, \"machineTypeUri\": \"n2-highmem-8\", \"diskConfig\": {\"bootDiskType\": \"pd-standard\", \"bootDiskSizeGb\": 1024}}, \"workerConfig\": {\"numInstances\": 2, \"machineTypeUri\": \"n2-highmem-4\", \"diskConfig\": {\"bootDiskType\": \"pd-standard\", \"bootDiskSizeGb\": 64, \"numLocalSsds\": 2}}, \"softwareConfig\": {\"imageVersion\": \"2.2.50-debian12\", \"optionalComponents\": [\"FLINK\", \"JUPYTER\"], \"properties\": {}}, \"initializationActions\": [{\"executable_file\": \"gs://zipline-artifacts-canary/scripts/copy_java_security.sh\"}], \"endpointConfig\": {\"enableHttpPortAccess\": true}, \"lifecycleConfig\": {\"idleDeleteTtl\": \"7200s\"}}"
🤖 Prompt for AI Agents
In api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1 at
line 117, update the Spark BigQuery connector URL from version 0.42.1 to 0.42.2
by changing the URL string
"gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar" to
"gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.2.jar" within the
dataproc.config JSON configuration.

@nikhil-zlai nikhil-zlai force-pushed the nikhil/bzlmod_migration branch from 950068c to c2e5965 Compare July 18, 2025 02:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (6)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (2)

159-159: Consider making thread pool size configurable.

A fixed pool size of 1 may limit parallelism in tests.

-    implicit val executionContext: ExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1))
+    implicit val executionContext: ExecutionContext = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(
+      sys.props.getOrElse("chronon.test.thread.pool.size", "1").toInt
+    ))

223-246: Extract consistency check logic for better readability.

This section handles multiple responsibilities and could be clearer as a separate method.

spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationLoggingTest.scala (1)

148-149: Add descriptive assertion message.

-    assertTrue(baseColumns.forall(logDf.columns.contains))
+    assertTrue(s"Log table should contain all base columns: ${baseColumns.mkString(", ")}", 
+               baseColumns.forall(logDf.columns.contains))
api/python/test/sample/group_bys/walmart/customer_features.py (2)

1-2: Remove unused import and avoid wildcard imports

Dict is unused. Replace wildcard import with explicit imports.

-from typing import Dict
-from ai.chronon.types import *
+from ai.chronon.types import (
+    EventSource, EntitySource, GroupBy, Join, JoinPart, JoinSource,
+    Query, Aggregation, Operation, selects
+)

4-16: Minor comment improvements

Comments have typos and could be clearer.

-# step 1. create source with all necessary entity ids - we will enrich with advisory, product_type and pi_hash use_counts
+# Step 1: Create source with all necessary entity IDs - will enrich with advisory, product_type, and pi_hash use_counts
-            order_id="order_number", # we are going to enrich this with advisory
+            order_id="order_number",  # Will enrich with advisory
-            payment_instrument_id="pi_hash", # self join with previous use_counts                         
+            payment_instrument_id="pi_hash",  # Self-join with previous use_counts
-            items="items"  # we are going to enrich this array with a new "product_type_last" field inside the payload struct
+            items="items"  # Will enrich array with "product_type_last" field
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (1)

173-173: Consider timeout for KV store operations.

Using Duration.Inf could cause indefinite blocking if KV store is unresponsive.

-Await.result(kvStoreUpdates, Duration.Inf)
+Await.result(kvStoreUpdates, Duration(30, "seconds"))

Also applies to: 204-204

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 950068c and c2e5965.

⛔ Files ignored due to path filters (1)
  • MODULE.bazel.lock is excluded by !**/*.lock
📒 Files selected for processing (157)
  • .bazeliskrc (1 hunks)
  • .bazelrc (1 hunks)
  • .bazelversion (1 hunks)
  • .github/workflows/test_scala_2_12_non_spark.yaml (2 hunks)
  • .gitignore (1 hunks)
  • BUILD.bazel (1 hunks)
  • MODULE.bazel (6 hunks)
  • WORKSPACE (0 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (4 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala (1 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (2 hunks)
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregationBuffer.scala (1 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (2 hunks)
  • api/python/ai/chronon/repo/hub_runner.py (5 hunks)
  • api/python/ai/chronon/repo/hub_uploader.py (2 hunks)
  • api/python/ai/chronon/repo/hub_utils.py (2 hunks)
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1 (1 hunks)
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_combined_v1 (1 hunks)
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_streaming_v1 (1 hunks)
  • api/python/test/canary/joins/gcp/item_event_join.py (1 hunks)
  • api/python/test/sample/group_bys/walmart/customer_features.py (1 hunks)
  • api/src/main/scala/ai/chronon/api/Constants.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/DataPointer.scala (3 hunks)
  • api/src/main/scala/ai/chronon/api/DataRange.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/PartitionSpec.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/planner/DependencyResolver.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala (3 hunks)
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/planner/LocalRunner.scala (2 hunks)
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala (4 hunks)
  • api/src/main/scala/ai/chronon/api/planner/NodeRunner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/StagingQueryPlanner.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (4 hunks)
  • api/src/test/scala/ai/chronon/api/test/DateMacroSpec.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/LocalRunnerTest.scala (4 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/StagingQueryPlannerTest.scala (1 hunks)
  • api/thrift/orchestration.thrift (1 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/AwsApiImpl.scala (2 hunks)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
  • cloud_gcp/BUILD.bazel (0 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala (2 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala (1 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala (2 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (9 hunks)
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala (2 hunks)
  • flink/src/main/scala/ai/chronon/flink/deser/FlinkSerDeProvider.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSource.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala (3 hunks)
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubSchemaSerDe.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/DeserializationSchemaWrapper.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/PubSubSource.scala (1 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/SparkInternalRowConversions.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/TileCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/MetricsReporter.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (3 hunks)
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala (3 hunks)
  • online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
  • revert_whitespace_changes.py (1 hunks)
  • spark/BUILD.bazel (5 hunks)
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinDerivationJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/LabelJoin.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/LocalTableExporter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/catalog/FormatProvider.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/SawtoothUdf.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/kv_store/KVUploadNodeRunner.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/DataServer.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala (4 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala (0 hunks)
  • spark/src/main/scala/ai/chronon/spark/submission/StorageClient.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationBootstrapTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationLoggingTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/BatchNodeRunnerTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/EvalTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/ShortNamesTest.scala (5 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherDeterministicTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherGeneratedTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (0 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (13 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTiledTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherUniqueTopKTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetchTypesTest.java (0 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala (0 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfSpec.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinSpec.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/submission/JobSubmitterTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/ApproxDistinctTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/HistogramTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/NullnessCountersAggregatorTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/UDAFSQLUsageTest.scala (3 hunks)
  • tools/build_rules/artifact.bzl (2 hunks)
  • tools/build_rules/common.bzl (2 hunks)
  • tools/build_rules/dependencies/all_repositories.bzl (0 hunks)
  • tools/build_rules/dependencies/defs.bzl (0 hunks)
  • tools/build_rules/dependencies/load_dependencies.bzl (0 hunks)
  • tools/build_rules/dependencies/scala_repository.bzl (0 hunks)
  • tools/build_rules/extensions/BUILD.bazel (1 hunks)
  • tools/build_rules/extensions/custom_deps.bzl (1 hunks)
  • tools/build_rules/jar_library.bzl (1 hunks)
  • tools/build_rules/jvm_binary.bzl (1 hunks)
  • tools/build_rules/prelude_bazel (2 hunks)
  • tools/build_rules/scala_config.bzl (1 hunks)
  • tools/build_rules/scala_junit_test_suite.bzl (1 hunks)
  • tools/build_rules/spark/BUILD (1 hunks)
  • tools/build_rules/thrift/thrift.bzl (1 hunks)
💤 Files with no reviewable changes (11)
  • tools/build_rules/dependencies/all_repositories.bzl
  • cloud_gcp/BUILD.bazel
  • WORKSPACE
  • spark/src/main/scala/ai/chronon/spark/join/AggregationInfo.scala
  • spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala
  • tools/build_rules/dependencies/load_dependencies.bzl
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetchTypesTest.java
  • tools/build_rules/dependencies/defs.bzl
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala
  • tools/build_rules/dependencies/scala_repository.bzl
  • spark/src/test/scala/ai/chronon/spark/test/join/LabelJoinTest.scala
✅ Files skipped from review due to trivial changes (17)
  • aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala
  • api/src/main/scala/ai/chronon/api/planner/DependencyResolver.scala
  • tools/build_rules/scala_junit_test_suite.bzl
  • spark/src/test/scala/ai/chronon/spark/test/udafs/NullnessCountersAggregatorTest.scala
  • spark/src/main/scala/ai/chronon/spark/join/SawtoothUdf.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
  • flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala
  • flink/src/main/scala/ai/chronon/flink/MetricsSink.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
  • api/src/main/scala/ai/chronon/api/planner/MetaDataUtils.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala
  • api/src/test/scala/ai/chronon/api/test/DateMacroSpec.scala
  • spark/src/main/scala/ai/chronon/spark/JoinDerivationJob.scala
  • api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala
🚧 Files skipped from review as they are similar to previous changes (120)
  • .bazelversion
  • .bazeliskrc
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/DeserializationSchemaWrapper.scala
  • .gitignore
  • api/src/main/scala/ai/chronon/api/planner/StagingQueryPlanner.scala
  • flink/src/main/scala/ai/chronon/flink/window/KeySelectorBuilder.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • api/src/main/scala/ai/chronon/api/planner/NodeRunner.scala
  • api/src/main/scala/ai/chronon/api/Constants.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/PubSubSchemaSerDe.scala
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala
  • flink/src/main/scala/ai/chronon/flink/deser/FlinkSerDeProvider.scala
  • .github/workflows/test_scala_2_12_non_spark.yaml
  • api/thrift/orchestration.thrift
  • spark/src/main/scala/ai/chronon/spark/LocalTableExporter.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregationBuffer.scala
  • online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala
  • spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala
  • online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala
  • spark/src/main/scala/ai/chronon/spark/submission/StorageClient.scala
  • flink/src/main/scala/ai/chronon/flink/source/FlinkSource.scala
  • online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
  • online/src/main/scala/ai/chronon/online/metrics/MetricsReporter.scala
  • spark/src/main/scala/ai/chronon/spark/LabelJoin.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
  • flink/src/main/scala/ai/chronon/flink_connectors/pubsub/fastack/PubSubSource.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/NoHistoricalBackfillTest.scala
  • flink/src/main/scala/ai/chronon/flink/deser/DeserializationSchema.scala
  • tools/build_rules/jvm_binary.bzl
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala
  • spark/src/main/scala/ai/chronon/spark/JoinUtils.scala
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
  • online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinSpec.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/HistogramTest.scala
  • online/src/main/scala/ai/chronon/online/TileCodec.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala
  • online/src/main/scala/ai/chronon/online/SparkInternalRowConversions.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpFormatProvider.scala
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
  • spark/src/main/scala/ai/chronon/spark/GroupBy.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/AwsApiImpl.scala
  • .bazelrc
  • online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala
  • api/src/main/scala/ai/chronon/api/PartitionSpec.scala
  • online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEvalFn.scala
  • api/src/main/scala/ai/chronon/api/planner/LocalRunner.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala
  • tools/build_rules/common.bzl
  • online/src/main/scala/ai/chronon/online/CatalystUtil.scala
  • spark/src/main/scala/ai/chronon/spark/catalog/FormatProvider.scala
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/GcpApiImpl.scala
  • flink/src/main/scala/ai/chronon/flink/window/FlinkRowAggregators.scala
  • online/src/main/scala/ai/chronon/online/serde/AvroCodec.scala
  • spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/ApproxDistinctTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEntitiesSnapshotTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/HeterogeneousPartitionColumnsTest.scala
  • tools/build_rules/scala_config.bzl
  • online/src/main/scala/ai/chronon/online/OnlineDerivationUtil.scala
  • api/src/test/scala/ai/chronon/api/test/planner/StagingQueryPlannerTest.scala
  • tools/build_rules/thrift/thrift.bzl
  • api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala
  • tools/build_rules/extensions/BUILD.bazel
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala
  • spark/src/main/scala/ai/chronon/spark/stats/StatsCompute.scala
  • spark/src/main/scala/ai/chronon/spark/join/UnionJoin.scala
  • api/python/ai/chronon/repo/hub_runner.py
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigQueryCatalogTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
  • api/src/main/scala/ai/chronon/api/planner/GroupByPlanner.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/EvalTest.scala
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
  • spark/src/main/scala/ai/chronon/spark/batch/Eval.scala
  • spark/src/test/scala/ai/chronon/spark/test/batch/ModularJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfSpec.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/UDAFSQLUsageTest.scala
  • api/src/test/scala/ai/chronon/api/test/planner/LocalRunnerTest.scala
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala
  • tools/build_rules/jar_library.bzl
  • BUILD.bazel
  • api/src/main/scala/ai/chronon/api/DataRange.scala
  • spark/src/main/scala/ai/chronon/spark/Analyzer.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/MigrationCompareTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/submission/JobSubmitterTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala
  • api/src/main/scala/ai/chronon/api/planner/JoinPlanner.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherUniqueTopKTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/DataServer.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherDeterministicTest.scala
  • aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala
  • api/python/test/canary/joins/gcp/item_event_join.py
  • api/python/ai/chronon/repo/hub_utils.py
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherGeneratedTest.scala
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1
  • spark/src/main/scala/ai/chronon/spark/kv_store/KVUploadNodeRunner.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTiledTest.scala
  • api/src/main/scala/ai/chronon/api/Extensions.scala
  • tools/build_rules/extensions/custom_deps.bzl
  • spark/BUILD.bazel
  • api/src/main/scala/ai/chronon/api/DataPointer.scala
  • flink/src/main/scala/ai/chronon/flink/SparkExpressionEval.scala
  • api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • tools/build_rules/prelude_bazel
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinTest.scala
  • online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala
  • api/python/ai/chronon/repo/hub_uploader.py
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_streaming_v1
  • online/src/main/scala/ai/chronon/online/fetcher/MetadataStore.scala
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_combined_v1
  • spark/src/test/scala/ai/chronon/spark/test/batch/ShortNamesTest.scala
  • tools/build_rules/spark/BUILD
  • aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationBootstrapTest.scala (10)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit `ExecutionContext` parameter can cause serialization issues. In such cases, it's acceptable to use `scala.concurrent.ExecutionContext.Implicits.global`.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file `hub/app/store/DynamoDBMonitoringStore.scala`, within the `makeLoadedConfs` method, the `.recover` method is correctly applied to the `Try` returned by `response.values` to handle exceptions from the underlying store.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/KVStore.scala`, there are two `create` methods: `def create(dataset: String): Unit` and `def create(dataset: String, props: Map[String, Any]): Unit`. The version with `props` ignores the `props` parameter, and the simpler version without `props` is appropriate when `props` are not needed.
spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationLoggingTest.scala (4)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
tools/build_rules/artifact.bzl (2)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
spark/src/test/scala/ai/chronon/spark/test/batch/BatchNodeRunnerTest.scala (7)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.
Learnt from: tchow-zlai
PR: zipline-ai/chronon#263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
revert_whitespace_changes.py (1)
Learnt from: chewy-zlai
PR: zipline-ai/chronon#46
File: docker-init/generate_anomalous_data.py:0-0
Timestamp: 2024-10-15T19:03:19.403Z
Learning: If `generate_anomalous_data.py` contains unintended changes due to accidental commits, disregard reviewing this file unless instructed otherwise.
api/python/test/sample/group_bys/walmart/customer_features.py (4)
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/group_bys/risk/transaction_events.py:15-25
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the file `api/py/test/sample/group_bys/risk/transaction_events.py`, the function `create_transaction_source` is example code, and parameterizing the table name `"data.txn_events"` is unnecessary.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/group_bys/risk/transaction_events.py:15-25
Timestamp: 2024-10-03T19:00:27.898Z
Learning: In the file `api/py/test/sample/group_bys/risk/transaction_events.py`, the function `create_transaction_source` is example code, and parameterizing the table name `"data.txn_events"` is unnecessary.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/group_bys/risk/transaction_events.py:29-46
Timestamp: 2024-10-03T19:03:22.508Z
Learning: In `api/py/test/sample/group_bys/risk/transaction_events.py`, the code is example code and does not need parameterizing.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#30
File: api/py/test/sample/group_bys/risk/transaction_events.py:29-46
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `api/py/test/sample/group_bys/risk/transaction_events.py`, the code is example code and does not need parameterizing.
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (11)
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In `hub/test/store/DynamoDBMonitoringStoreTest.scala`, the current implementation of the `generateListResponse` method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In `MockKVStore` located at `spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala`, the `multiPut` method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file `hub/app/store/DynamoDBMonitoringStore.scala`, within the `makeLoadedConfs` method, the `.recover` method is correctly applied to the `Try` returned by `response.values` to handle exceptions from the underlying store.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In `MockKVStore.scala`, the `create` method should reset the dataset even if the dataset already exists.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:19-47
Timestamp: 2024-11-03T14:51:40.825Z
Learning: In Scala, the `grouped` method on collections returns an iterator, allowing for efficient batch processing without accumulating all records in memory.
Learnt from: piyush-zlai
PR: zipline-ai/chronon#44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
MODULE.bazel (4)
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Learnt from: chewy-zlai
PR: zipline-ai/chronon#789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.
🧬 Code Graph Analysis (2)
spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationBootstrapTest.scala (1)
spark/src/main/scala/ai/chronon/spark/submission/SparkSessionBuilder.scala (1)
  • SparkSessionBuilder (31-206)
spark/src/test/scala/ai/chronon/spark/test/batch/BatchNodeRunnerTest.scala (7)
spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (3)
  • batch (53-65)
  • BatchNodeRunner (44-236)
  • runFromArgs (136-208)
api/src/main/scala/ai/chronon/api/planner/TableDependencies.scala (2)
  • TableDependencies (9-140)
  • fromTable (111-138)
spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala (1)
  • MockKVStore (9-35)
spark/src/test/scala/ai/chronon/spark/test/TableTestUtils.scala (1)
  • TableTestUtils (6-41)
spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala (1)
  • MockApi (51-151)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (1)
  • at (46-49)
api/src/main/scala/ai/chronon/api/ThriftJsonCodec.scala (2)
  • ThriftJsonCodec (41-137)
  • toJsonStr (51-53)
🔇 Additional comments (28)
tools/build_rules/artifact.bzl (2)

4-9: LGTM - Clean bzlmod migration.

Hardcoding constants and repository names is the correct approach for bzlmod.


49-54: Approve repository consolidation.

Routing all artifacts through the main maven repository simplifies dependency management in bzlmod.

MODULE.bazel (5)

1-4: Solid bzlmod module declaration.

Clean project setup with appropriate versioning.


6-22: Comprehensive core dependencies.

Python toolchain configuration with CI-friendly root user handling is well done.


24-44: Excellent Scala integration.

Proper Scala version pinning and extension configuration.


46-288: Thorough dependency management.

Comprehensive artifact list with logical grouping and proper conflict resolution via exclusions.


291-294: Clean custom dependency handling.

Legacy zlib extension properly integrated.

revert_whitespace_changes.py (3)

36-67: LGTM!

Clean diff parsing implementation.


69-90: LGTM!

Correctly identifies whitespace-only changes.


1-198: Is this file related to the Bazel upgrade?

This utility script seems unrelated to the PR objectives.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (4)

3-50: LGTM!

Import reorganization and additions support the new temporal fetch comparison functionality.


51-61: Good formatting improvement.

Multi-line parameters enhance readability.


337-365: Clean schema formatting.

Multi-line field definitions improve readability.


604-672: Well-structured aggregation definitions.

Consistent formatting makes complex configurations easier to understand.

spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationLoggingTest.scala (2)

234-240: Excellent test documentation.

Clear comments explain the expected behavior for each scenario.


309-344: Well-structured group-by derivation test.

Properly validates derived column computation.

spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationBootstrapTest.scala (1)

36-42: Good test separation.

Class rename reflects focused bootstrap testing scope.

api/python/test/sample/group_bys/walmart/customer_features.py (4)

36-45: LGTM!

Clean implementation for tracking payment instrument usage.


47-66: Well-structured entity source

Properly configured with snapshot and mutation sources.


68-86: Clean reward tracking setup

Good separation of source and aggregation.


134-141: Fix syntax error - missing comma

Add comma after electronic_amt expression.

                         )
-                    """,            
+                    """,

Likely an incorrect or invalid review comment.

spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala (4)

3-19: LGTM!

Imports are appropriate for the enhanced functionality.


98-124: Good enhancement to pass stepDays parameter.

Correctly extracts stepDays from metadata and passes it to join computation.


136-208: Excellent partition validation implementation.

The enhanced logic properly validates input partitions before execution and tracks output partitions after completion.


145-153: Validate PartitionSpec compatibility in BatchNodeRunner.scala

The call to _.translate(inputPartSpec) (spark/src/main/scala/ai/chronon/spark/batch/BatchNodeRunner.scala:151) may produce incorrect ranges if inputPartSpec doesn’t match the spec used by DependencyResolver.computeInputRange. Please ensure:

  • The PartitionSpec returned by td.getTableInfo.partitionSpec aligns in unit/format with the one used to compute ranges.
  • Add a runtime check or document the supported spec conversions to fail fast on mismatches.

Snippet for review:

  td.getTableInfo.table ->
    DependencyResolver.computeInputRange(range, td)
      .map(_.translate(inputPartSpec)) // ← verify spec compatibility here
spark/src/test/scala/ai/chronon/spark/test/batch/BatchNodeRunnerTest.scala (3)

39-56: Well-designed mock for tracking KV store operations.

Clean implementation for test verification.


58-194: Comprehensive test setup and helpers.

Good use of local Spark session and proper lifecycle management.


195-431: Excellent test coverage for partition validation logic.

Tests cover all critical scenarios including edge cases and partition translation.

Comment on lines 165 to 167
with open(filename, 'w') as f:
f.write(new_content)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for file write

File operations can fail.

-    with open(filename, 'w') as f:
-        f.write(new_content)
+    try:
+        with open(filename, 'w') as f:
+            f.write(new_content)
+    except IOError as e:
+        print(f"  Error writing file: {e}")
+        return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(filename, 'w') as f:
f.write(new_content)
try:
with open(filename, 'w') as f:
f.write(new_content)
except IOError as e:
print(f" Error writing file: {e}")
return False
🤖 Prompt for AI Agents
In revert_whitespace_changes.py around lines 165 to 167, the file write
operation lacks error handling which can cause the program to crash if the write
fails. Wrap the file open and write calls in a try-except block to catch
exceptions like IOError or OSError, and handle them appropriately, such as
logging the error or raising a custom exception.

@nikhil-zlai nikhil-zlai force-pushed the nikhil/bzlmod_migration branch 2 times, most recently from d370da6 to 97b41f3 Compare July 30, 2025 17:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
MODULE.bazel (1)

96-104: Spark version consistency resolved.

All Spark dependencies now use 3.5.3. Good fix!

Also applies to: 247-248

🧹 Nitpick comments (1)
tools/build_rules/artifact.bzl (1)

8-9: Consider centralizing Scala version configuration.

Hardcoding versions here creates maintenance overhead. Consider loading from //tools/build_rules:scala_config.bzl instead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2e5965 and e90530c.

⛔ Files ignored due to path filters (1)
  • MODULE.bazel.lock is excluded by !**/*.lock
📒 Files selected for processing (70)
  • .bazeliskrc (1 hunks)
  • .bazelrc (1 hunks)
  • .bazelversion (1 hunks)
  • .gitignore (1 hunks)
  • BUILD.bazel (1 hunks)
  • MODULE.bazel (6 hunks)
  • WORKSPACE (0 hunks)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (2 hunks)
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1 (1 hunks)
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_combined_v1 (1 hunks)
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_streaming_v1 (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/LocalRunnerTest.scala (1 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala (3 hunks)
  • api/src/test/scala/ai/chronon/api/test/planner/StagingQueryPlannerTest.scala (2 hunks)
  • api/thrift/orchestration.thrift (1 hunks)
  • cloud_gcp/BUILD.bazel (0 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (9 hunks)
  • flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala (3 hunks)
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala (3 hunks)
  • online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala (1 hunks)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (12 hunks)
  • spark/BUILD.bazel (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/BatchNodeRunnerTest.scala (9 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherDeterministicTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherGeneratedTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala (16 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTiledTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherUniqueTopKTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetchTypesTest.java (0 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala (4 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfSpec.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinSpec.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/submission/JobSubmitterTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/ApproxDistinctTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/HistogramTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/NullnessCountersAggregatorTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/udafs/UDAFSQLUsageTest.scala (3 hunks)
  • tools/build_rules/artifact.bzl (2 hunks)
  • tools/build_rules/common.bzl (2 hunks)
  • tools/build_rules/dependencies/all_repositories.bzl (0 hunks)
  • tools/build_rules/dependencies/defs.bzl (0 hunks)
  • tools/build_rules/dependencies/load_dependencies.bzl (0 hunks)
  • tools/build_rules/dependencies/scala_repository.bzl (0 hunks)
  • tools/build_rules/extensions/BUILD.bazel (1 hunks)
  • tools/build_rules/extensions/custom_deps.bzl (1 hunks)
  • tools/build_rules/jar_library.bzl (1 hunks)
  • tools/build_rules/jvm_binary.bzl (1 hunks)
  • tools/build_rules/prelude_bazel (2 hunks)
  • tools/build_rules/scala_config.bzl (1 hunks)
  • tools/build_rules/scala_junit_test_suite.bzl (1 hunks)
  • tools/build_rules/spark/BUILD (1 hunks)
💤 Files with no reviewable changes (7)
  • tools/build_rules/dependencies/all_repositories.bzl
  • cloud_gcp/BUILD.bazel
  • tools/build_rules/dependencies/load_dependencies.bzl
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/JavaFetchTypesTest.java
  • WORKSPACE
  • tools/build_rules/dependencies/scala_repository.bzl
  • tools/build_rules/dependencies/defs.bzl
✅ Files skipped from review due to trivial changes (12)
  • spark/src/test/scala/ai/chronon/spark/test/join/NoAggTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsTemporalTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfSpec.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/VersioningTest.scala
  • tools/build_rules/scala_junit_test_suite.bzl
  • spark/src/test/scala/ai/chronon/spark/test/udafs/ApproxDistinctTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/NullnessCountersAggregatorTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/groupby/GroupByUploadTest.scala
  • api/src/test/scala/ai/chronon/api/test/planner/MonolithJoinPlannerTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/UDAFSQLUsageTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EventsEventsCumulativeTest.scala
🚧 Files skipped from review as they are similar to previous changes (45)
  • .bazelversion
  • .gitignore
  • spark/src/test/scala/ai/chronon/spark/test/join/StructJoinTest.scala
  • aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala
  • online/src/test/scala/ai/chronon/online/test/ThriftDecodingTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/EntitiesEntitiesTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/udafs/HistogramTest.scala
  • .bazeliskrc
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala
  • tools/build_rules/scala_config.bzl
  • online/src/test/scala/ai/chronon/online/test/ListJoinsTest.scala
  • api/src/test/scala/ai/chronon/api/test/planner/LocalRunnerTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/DynamicPartitionOverwriteTest.scala
  • .bazelrc
  • tools/build_rules/jvm_binary.bzl
  • spark/src/test/scala/ai/chronon/spark/test/join/UnionJoinSpec.scala
  • flink/src/test/scala/ai/chronon/flink/test/window/FlinkRowAggregationFunctionTest.scala
  • online/src/test/scala/ai/chronon/online/test/FetcherBaseTest.scala
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/join/MigrationTest.scala
  • online/src/test/scala/ai/chronon/online/test/GroupByDerivationsTest.scala
  • tools/build_rules/common.bzl
  • spark/src/test/scala/ai/chronon/spark/test/StatsComputeTest.scala
  • BUILD.bazel
  • api/src/test/scala/ai/chronon/api/test/planner/GroupByPlannerTest.scala
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/submission/JobSubmitterTest.scala
  • api/src/test/scala/ai/chronon/api/test/planner/StagingQueryPlannerTest.scala
  • online/src/test/scala/ai/chronon/online/test/DataStreamBuilderTest.scala
  • spark/BUILD.bazel
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala
  • api/src/test/scala/ai/chronon/api/test/TileSeriesSerializationTest.scala
  • tools/build_rules/jar_library.bzl
  • tools/build_rules/extensions/custom_deps.bzl
  • tools/build_rules/extensions/BUILD.bazel
  • api/thrift/orchestration.thrift
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_batch_v1
  • tools/build_rules/prelude_bazel
  • spark/src/test/scala/ai/chronon/spark/test/join/SawtoothUdfPerformanceTest.scala
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_combined_v1
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherMetadataTest.scala
  • api/python/test/canary/compiled/joins/gcp/item_event_join.canary_streaming_v1
  • spark/src/test/scala/ai/chronon/spark/test/batch/BatchNodeRunnerTest.scala
  • tools/build_rules/spark/BUILD
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTestUtil.scala
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherDeterministicTest.scala (8)

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within DynamoDBKVStoreTest.scala is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In DynamoDBKVStoreImpl.scala, refactoring methods like extractTimedValues and extractListValues to eliminate code duplication is discouraged if it would make the code more convoluted.

Learnt from: piyush-zlai
PR: #44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to MockDataService in hub/test/controllers/SearchControllerSpec.scala and hub/test/controllers/ModelControllerSpec.scala are needed for tests and should not be removed.

Learnt from: chewy-zlai
PR: #62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit ExecutionContext parameter can cause serialization issues. In such cases, it's acceptable to use scala.concurrent.ExecutionContext.Implicits.global.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In MockKVStore located at spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala, the multiPut method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Learnt from: chewy-zlai
PR: #47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the KVStore trait located at online/src/main/scala/ai/chronon/online/KVStore.scala, there are two create methods: def create(dataset: String): Unit and def create(dataset: String, props: Map[String, Any]): Unit. The version with props ignores the props parameter, and the simpler version without props is appropriate when props are not needed.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In MockKVStore.scala, the create method should reset the dataset even if the dataset already exists.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherGeneratedTest.scala (8)

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within DynamoDBKVStoreTest.scala is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: piyush-zlai
PR: #44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to MockDataService in hub/test/controllers/SearchControllerSpec.scala and hub/test/controllers/ModelControllerSpec.scala are needed for tests and should not be removed.

Learnt from: piyush-zlai
PR: #43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In hub/app/controllers/TimeSeriesController.scala, the generateMockTimeSeriesPercentilePoints method contains placeholder code that will be replaced with the actual implementation soon.

Learnt from: piyush-zlai
PR: #44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In hub/test/store/DynamoDBMonitoringStoreTest.scala, the current implementation of the generateListResponse method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.

Learnt from: chewy-zlai
PR: #62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit ExecutionContext parameter can cause serialization issues. In such cases, it's acceptable to use scala.concurrent.ExecutionContext.Implicits.global.

Learnt from: piyush-zlai
PR: #53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in hub/app/controllers/TimeSeriesController.scala, potential NumberFormatException exceptions due to parsing errors (e.g., when using val featureId = name.split("_").last.toInt) are acceptable and will be addressed when adding the concrete backend.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In MockKVStore located at spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala, the multiPut method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTiledTest.scala (7)

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within DynamoDBKVStoreTest.scala is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: chewy-zlai
PR: #47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the KVStore trait located at online/src/main/scala/ai/chronon/online/KVStore.scala, there are two create methods: def create(dataset: String): Unit and def create(dataset: String, props: Map[String, Any]): Unit. The version with props ignores the props parameter, and the simpler version without props is appropriate when props are not needed.

Learnt from: chewy-zlai
PR: #62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit ExecutionContext parameter can cause serialization issues. In such cases, it's acceptable to use scala.concurrent.ExecutionContext.Implicits.global.

Learnt from: piyush-zlai
PR: #33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the KVStore trait located at online/src/main/scala/ai/chronon/online/Api.scala, the default implementation of the create method (def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)) doesn't leverage the props parameter, but subclasses like DynamoDBKVStoreImpl use the props parameter in their overridden implementations.

Learnt from: piyush-zlai
PR: #33
File: online/src/main/scala/ai/chronon/online/Api.scala:69-69
Timestamp: 2024-10-07T15:21:50.787Z
Learning: In the KVStore trait located at online/src/main/scala/ai/chronon/online/Api.scala, the default implementation of the create method (def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)) doesn't leverage the props parameter, but subclasses like DynamoDBKVStoreImpl use the props parameter in their overridden implementations.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In MockKVStore located at spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala, the multiPut method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherUniqueTopKTest.scala (9)

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within DynamoDBKVStoreTest.scala is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In MockKVStore located at spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala, the multiPut method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In DynamoDBKVStoreImpl.scala, refactoring methods like extractTimedValues and extractListValues to eliminate code duplication is discouraged if it would make the code more convoluted.

Learnt from: piyush-zlai
PR: #44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to MockDataService in hub/test/controllers/SearchControllerSpec.scala and hub/test/controllers/ModelControllerSpec.scala are needed for tests and should not be removed.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In the codebase, the KVStore implementation provides an implicit ExecutionContext in scope, so it's unnecessary to import another.

Learnt from: chewy-zlai
PR: #62
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:9-10
Timestamp: 2024-11-06T21:54:56.160Z
Learning: In Spark applications, when defining serializable classes, passing an implicit ExecutionContext parameter can cause serialization issues. In such cases, it's acceptable to use scala.concurrent.ExecutionContext.Implicits.global.

Learnt from: chewy-zlai
PR: #47
File: online/src/main/scala/ai/chronon/online/MetadataStore.scala:232-0
Timestamp: 2024-10-17T00:12:09.763Z
Learning: In the KVStore trait located at online/src/main/scala/ai/chronon/online/KVStore.scala, there are two create methods: def create(dataset: String): Unit and def create(dataset: String, props: Map[String, Any]): Unit. The version with props ignores the props parameter, and the simpler version without props is appropriate when props are not needed.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In MockKVStore.scala, the create method should reset the dataset even if the dataset already exists.

tools/build_rules/artifact.bzl (2)

Learnt from: tchow-zlai
PR: #393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

MODULE.bazel (4)

Learnt from: tchow-zlai
PR: #393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: chewy-zlai
PR: #47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files spark-assembly-0.1.0-SNAPSHOT.jar and cloud_aws-assembly-0.1.0-SNAPSHOT.jar are generated by sbt and located in the target directory after the build.

Learnt from: chewy-zlai
PR: #789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: batch_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: flink_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: service_tests
  • GitHub Check: online_tests
  • GitHub Check: api_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (9)
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherGeneratedTest.scala (2)

33-33: LGTM! Consistent implicit parameter pattern.

Making tableUtils implicit aligns with the refactoring across fetcher tests.


46-46: LGTM! Simplified method call.

Removing explicit spark parameter leverages implicit resolution.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherDeterministicTest.scala (2)

33-33: LGTM! Consistent implicit parameter pattern.

Making tableUtils implicit aligns with the refactoring across fetcher tests.


43-43: LGTM! Simplified method call.

Removing explicit spark parameter leverages implicit resolution.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherUniqueTopKTest.scala (2)

33-33: LGTM! Consistent implicit parameter pattern.

Making tableUtils implicit aligns with the refactoring across fetcher tests.


43-43: LGTM! Simplified method call.

Removing explicit spark parameter leverages implicit resolution.

spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTiledTest.scala (2)

33-33: LGTM! Consistent implicit parameter pattern.

Making tableUtils implicit aligns with the refactoring across fetcher tests.


44-44: LGTM! Simplified method call.

Removing explicit spark parameter leverages implicit resolution.

tools/build_rules/artifact.bzl (1)

4-5: LGTM!

Clean consolidation to single Maven repository for bzlmod.

Also applies to: 46-46, 53-54

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
MODULE.bazel (1)

63-63: Consider removing mockito-core.

mockito-inline includes mockito-core functionality.

Also applies to: 70-70

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36b2473 and 674ebde.

📒 Files selected for processing (1)
  • MODULE.bazel (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
MODULE.bazel (4)

Learnt from: tchow-zlai
PR: #393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: chewy-zlai
PR: #47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files spark-assembly-0.1.0-SNAPSHOT.jar and cloud_aws-assembly-0.1.0-SNAPSHOT.jar are generated by sbt and located in the target directory after the build.

Learnt from: chewy-zlai
PR: #789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: service_tests
  • GitHub Check: scala_compile_fmt_fix


# Delta Lake and Iceberg
"io.delta:delta-spark_2.12:3.2.0",
"org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate Hudi dependency.

The artifact appears at both lines.

-        "org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.0",

Also applies to: 183-183

🤖 Prompt for AI Agents
In MODULE.bazel at lines 128 and 183, the dependency
"org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.0" is duplicated. Remove one of
these duplicate entries to avoid redundancy and potential conflicts in
dependency resolution.

@nikhil-zlai nikhil-zlai force-pushed the nikhil/bzlmod_migration branch from d68ce7e to 5a6bf89 Compare July 30, 2025 21:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (3)
maven_install.json (3)

1814-1944: Netty versions inconsistent (4.1.111 vs 4.1.117).
Runtime class clashes likely. Pin all Netty artifacts to one version.


1260-1280: Both okhttp 2.x and okhttp3 present.
Different APIs + transitive deps → conflict & CVEs. Drop 2.x unless truly needed.

Also applies to: 6338-6344, 11305-11312


229-300: Jackson 2.18.2 mixed with 2.13.4.
Incompatible core/annotations; consolidate to one series.

Also applies to: 264-285

♻️ Duplicate comments (1)
MODULE.bazel (1)

127-127: Remove duplicate Hudi dependency.

org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.0 appears twice.

-        "org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.0",

Also applies to: 183-183

🧹 Nitpick comments (2)
scripts/distribution/run_gcp_hub_quickstart.sh (1)

145-149: LGTM! Consider simplifying the curl command.

The conditional authentication is correctly implemented.

-        if [[ -z "$GCP_ID_TOKEN" ]]; then
-          response=$(curl -s -X GET "$hub_url")
-        else
-          response=$(curl -s -X GET "$hub_url" -H "Authorization: Bearer $GCP_ID_TOKEN")
-        fi
+        auth_header=""
+        [[ -n "$GCP_ID_TOKEN" ]] && auth_header="-H \"Authorization: Bearer $GCP_ID_TOKEN\""
+        response=$(eval curl -s -X GET "$hub_url" $auth_header)
maven_install.json (1)

22140-22153: Lockfile ballooning with unused libs (bnd, caliper, picocli, etc.).
Consider maven_install --exclude_artifacts to shrink fetch & build times.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d68ce7e and 5a6bf89.

⛔ Files ignored due to path filters (1)
  • MODULE.bazel.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • .github/workflows/test_python.yaml (0 hunks)
  • .github/workflows/test_scala_2_12_non_spark.yaml (0 hunks)
  • .github/workflows/test_scala_fmt.yaml (0 hunks)
  • MODULE.bazel (6 hunks)
  • api/python/ai/chronon/repo/zipline_hub.py (2 hunks)
  • api/thrift/hub.thrift (0 hunks)
  • api/thrift/orchestration.thrift (0 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala (2 hunks)
  • maven_install.json (125 hunks)
  • scripts/distribution/run_gcp_hub_quickstart.sh (2 hunks)
💤 Files with no reviewable changes (5)
  • .github/workflows/test_python.yaml
  • api/thrift/hub.thrift
  • .github/workflows/test_scala_fmt.yaml
  • .github/workflows/test_scala_2_12_non_spark.yaml
  • api/thrift/orchestration.thrift
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreTest.scala
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
MODULE.bazel (5)

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: tchow-zlai
PR: #393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In DynamoDBKVStoreImpl.scala, refactoring methods like extractTimedValues and extractListValues to eliminate code duplication is discouraged if it would make the code more convoluted.

Learnt from: chewy-zlai
PR: #47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files spark-assembly-0.1.0-SNAPSHOT.jar and cloud_aws-assembly-0.1.0-SNAPSHOT.jar are generated by sbt and located in the target directory after the build.

Learnt from: chewy-zlai
PR: #789
File: api/python/ai/chronon/repo/cluster.py:16-16
Timestamp: 2025-05-23T22:52:27.605Z
Learning: The project uses terraform to provision the required "dataproc@${project_id}.iam.gserviceaccount.com" service account, making hardcoded service account patterns in Dataproc configurations reliable and acceptable.

maven_install.json (4)

Learnt from: tchow-zlai
PR: #393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: chewy-zlai
PR: #47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files spark-assembly-0.1.0-SNAPSHOT.jar and cloud_aws-assembly-0.1.0-SNAPSHOT.jar are generated by sbt and located in the target directory after the build.

Learnt from: piyush-zlai
PR: #44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to MockDataService in hub/test/controllers/SearchControllerSpec.scala and hub/test/controllers/ModelControllerSpec.scala are needed for tests and should not be removed.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: spark_tests
  • GitHub Check: batch_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: service_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: online_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: flink_tests
  • GitHub Check: python_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
scripts/distribution/run_gcp_hub_quickstart.sh (1)

185-185: Correct error handling.

Good fix - returning 1 on failure is the proper convention.

maven_install.json (1)

26-38: ASM 3.1 and 7.2 both listed.
Old 3.x shaded packages collide with 7.x; verify consumers or exclude.

Also applies to: 4378-4400

Comment on lines 15 to 26
print("Using Google Cloud authentication for ZiplineHub.")
credentials, project_id = google.auth.default()
credentials.refresh(Request())
self.id_token = credentials.id_token

# First try to get ID token from environment (GitHub Actions)
self.id_token = os.getenv('GCP_ID_TOKEN')
if self.id_token:
print("Using ID token from environment")
else:
# Fallback to Google Cloud authentication
print("Generating ID token from default credentials")
credentials, project_id = google.auth.default()
credentials.refresh(Request())
self.id_token = credentials.id_token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove authentication method disclosure from logs.

The print statements reveal authentication details that could aid attackers.

-            print("Using Google Cloud authentication for ZiplineHub.")
-
             # First try to get ID token from environment (GitHub Actions)
             self.id_token = os.getenv('GCP_ID_TOKEN')
             if self.id_token:
-                print("Using ID token from environment")
+                pass  # Token loaded from environment
             else:
                 # Fallback to Google Cloud authentication
-                print("Generating ID token from default credentials")
                 credentials, project_id = google.auth.default()
                 credentials.refresh(Request())
                 self.id_token = credentials.id_token
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("Using Google Cloud authentication for ZiplineHub.")
credentials, project_id = google.auth.default()
credentials.refresh(Request())
self.id_token = credentials.id_token
# First try to get ID token from environment (GitHub Actions)
self.id_token = os.getenv('GCP_ID_TOKEN')
if self.id_token:
print("Using ID token from environment")
else:
# Fallback to Google Cloud authentication
print("Generating ID token from default credentials")
credentials, project_id = google.auth.default()
credentials.refresh(Request())
self.id_token = credentials.id_token
# First try to get ID token from environment (GitHub Actions)
self.id_token = os.getenv('GCP_ID_TOKEN')
if self.id_token:
pass # Token loaded from environment
else:
# Fallback to Google Cloud authentication
credentials, project_id = google.auth.default()
credentials.refresh(Request())
self.id_token = credentials.id_token
🤖 Prompt for AI Agents
In api/python/ai/chronon/repo/zipline_hub.py between lines 15 and 26, remove or
replace the print statements that disclose authentication methods and token
usage to avoid revealing sensitive information. Instead of printing these
details, use logging at a debug level or remove them entirely to prevent
exposing authentication mechanisms to potential attackers.

nikhil-zlai and others added 7 commits July 30, 2025 15:07
…tibility

Bazel 6 used chronon/ prefix while Bazel 8 requires _main/ prefix for RunFiles
paths in test files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Keep only the essential Bazel migration changes (build files, test files for
RunFiles prefix updates) and revert all other code changes to maintain
compatibility with existing functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Remove unnecessary test file changes that don't contain the essential
_main/ RunFiles prefix updates needed for Bazel 8 compatibility.

Keep only test files that have actual RunFiles path changes:
- GroupByPlannerTest.scala
- LocalRunnerTest.scala
- MonolithJoinPlannerTest.scala
- StagingQueryPlannerTest.scala
- DataprocSubmitterTest.scala
- FetcherMetadataTest.scala
- JobSubmitterTest.scala

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…pec files

- Revert orchestration.thrift to pre-migration state
- Remove incorrectly named Python canary compiled files
- Restore original Python canary files with proper __0 suffix
- Revert SawtoothUdfSpec.scala and UnionJoinSpec.scala to pre-migration state

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…taTest

Remove references to NameByTeamEndPointName and team metadata assertions
that don't exist in current codebase, while preserving the essential
RunFiles path change (_main/) needed for Bazel 8 compatibility.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@nikhil-zlai nikhil-zlai force-pushed the nikhil/bzlmod_migration branch from 5a6bf89 to 422694a Compare July 30, 2025 22:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9386951 and 0868673.

📒 Files selected for processing (1)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherFailureTest.scala (11)

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:19-28
Timestamp: 2024-10-31T18:29:45.027Z
Learning: In MockKVStore located at spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala, the multiPut method is intended to be a simple implementation without dataset existence validation, duplicate validation logic elimination, or actual storage of key-value pairs for verification.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175
Timestamp: 2024-10-07T15:09:51.567Z
Learning: Hardcoding future timestamps in tests within DynamoDBKVStoreTest.scala is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.

Learnt from: piyush-zlai
PR: #44
File: hub/test/store/DynamoDBMonitoringStoreTest.scala:69-86
Timestamp: 2024-10-15T15:33:22.265Z
Learning: In hub/test/store/DynamoDBMonitoringStoreTest.scala, the current implementation of the generateListResponse method is acceptable as-is, and changes for resource handling and error management are not necessary at this time.

Learnt from: piyush-zlai
PR: #44
File: hub/app/store/DynamoDBMonitoringStore.scala:98-143
Timestamp: 2024-10-15T15:30:15.514Z
Learning: In the Scala file hub/app/store/DynamoDBMonitoringStore.scala, within the makeLoadedConfs method, the .recover method is correctly applied to the Try returned by response.values to handle exceptions from the underlying store.

Learnt from: piyush-zlai
PR: #53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in hub/app/controllers/TimeSeriesController.scala, potential NumberFormatException exceptions due to parsing errors (e.g., when using val featureId = name.split("_").last.toInt) are acceptable and will be addressed when adding the concrete backend.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260
Timestamp: 2024-10-08T16:18:45.669Z
Learning: In DynamoDBKVStoreImpl.scala, refactoring methods like extractTimedValues and extractListValues to eliminate code duplication is discouraged if it would make the code more convoluted.

Learnt from: chewy-zlai
PR: #50
File: spark/src/test/scala/ai/chronon/spark/test/MockKVStore.scala:13-16
Timestamp: 2024-10-31T18:27:44.973Z
Learning: In MockKVStore.scala, the create method should reset the dataset even if the dataset already exists.

Learnt from: chewy-zlai
PR: #47
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:294-0
Timestamp: 2024-10-17T01:02:40.431Z
Learning: In DynamoDBKVStoreImpl.scala, when handling errors indicating that a row is missing a field in the database, it's acceptable to use the generic Exception type in methods like extractListValues.

Learnt from: nikhil-zlai
PR: #70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import scala.util.ScalaVersionSpecificCollectionsConverter in service/src/main/java/ai/chronon/service/ApiProvider.java is correct and should not be flagged in future reviews.

Learnt from: piyush-zlai
PR: #33
File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:223-242
Timestamp: 2024-10-07T15:39:43.506Z
Learning: In DynamoDBKVStoreImpl.scala, handling ProvisionedThroughputExceededException covers the scenarios for ThrottlingException in the handleDynamoDbOperation method.

Learnt from: tchow-zlai
PR: #263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: cloud_aws_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: online_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: streaming_tests
  • GitHub Check: python_tests
  • GitHub Check: flink_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: batch_tests
  • GitHub Check: bazel_config_tests
  • GitHub Check: enforce_triggered_workflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants