-
Notifications
You must be signed in to change notification settings - Fork 8
Clean history #779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean history #779
Conversation
## Summary Changed the baour clientsend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Changes to support builds/tests with both scala 2.12 and 2.13 versions. By default we build against 2.12 version, pass "--config scala_2.13" option to "bazel build/test" to override it. ScalaFmt seems to be breaking for 2.13 using bazel rules_scala paour clientsage, [fix](bazel-contrib/rules_scala#1631) is already deployed but a release with that change is not available yet, so temporarily disabled ScalaFmt cheour clientss for 2.13 will enable later once the fix is released. ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enabled flexible Scala version selection (2.12 and 2.13) for smoother builds and enhanced compatibility. - Introduced a default Scala version constant and a repository rule for improved version management. - Added support for additional Scala 2.13 dependencies in the build configuration. - **Refactor and Improvements** - Streamlined build and dependency management for increased stability and performance. - Consolidated collection conversion utilities to boost reliability in tests and runtime processing. - Enhanced type safety and clarity in collection handling across various modules. - Improved handling of Scala collections and maps throughout the codebase for better type consistency and safety. - Updated method implementations to ensure explicit type conversions, enhancing clarity and preventing runtime errors. - Modified method signatures and internal logic to utilize `Seq` for improved type clarity and consistency. - Enhanced the `maven_artifact` function to accept an optional version parameter for better dependency management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - #381 introduced the ability to configure a partition column at the node-level. This PR simply fixes a missed spot on the plumbing of the new StagingQuery attribute. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced the query builder to support specifying a partition column, providing greater customization for query formation and partitioning. - **Improvements** - Improved handling of partition columns by introducing a fallbaour clients mechanism to ensure valid values are used when necessary. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to traour clients the status of staour clientss when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
## Summary To add CI cheour clientss for making sure we are able to build and test all modules on both scala 2.12 and 2.13 versions. ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated automated testing workflows to support Scala 2.12 and added new workflows for Scala 2.13, ensuring consistent testing for both Spark and non-Spark modules. - **Documentation** - Enhanced build instructions with updated commands for creating Uber Jars and new automation shortcuts to streamline code formatting, committing, and pushing changes. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Added pinning support for both our maven and spark repositories so we don't have to resolve them during builds. Going forward whenever we make any updates to the artifacts in either maven or spark repositories, we would need to re-pin the changed repos using following commands and cheour clients-in the updated json files. ``` REPIN=1 bazel run @maven//:pin REPIN=1 bazel run @spark//:pin ``` ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Integrated enhanced repository management for Maven and Spark, providing improved dependency installation. - Added support for JSON configuration files for Maven and Spark installations. - **Chores** - Updated documentation to include instructions on pinning Maven artifacts and managing dependency versions effectively. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
A VSCode plugin for feature authoring that detects errors and uses data sampling in order to speed up the iteration cycle. The goal is to reduce the amount of memorizing commands, typing / cliour clientsing, waiting for clusters to be spun up, and jobs to finish. In this example, we have a complex expression operating on nested data. The eval button appears above Chronon types. When you cliour clients on the Eval button, it samples your data, runs your code and shows errors or transformed result within seconds.  ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested (see above) - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new Visual Studio Code extension that enhances Python development. - The extension displays an evaluation button alongside specific assignment statements in Python files, allowing users to trigger evaluation commands directly in the terminal. - Added a command to execute evaluation actions related to Zipline AI configurations. - **Documentation** - Added a new LICENSE file containing the MIT License text. - **Configuration** - Introduced new configuration files for TypeScript and Webpaour clients to support the extension's development and build processes. - **Exclusions** - Updated `.gitignore` and added `.vscodeignore` to streamline version control and paour clientsaging processes. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Moved scala dependencies to separate scala_2_12 and scala_2_13 repositories so we can load the right repo based on config instead of loading both. ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **Chores** - Upgraded Scala dependencies to newer versions with updated verification, ensuring improved stability. - Removed outdated paour clientsage references to streamline dependency management. - Introduced new repository configurations for Scala 2.12 and 2.13 to enhance dependency management. - Added `.gitignore` entry to exclude `node_modules` in the `authoring/vscode` path. - Created `LICENSE` file with MIT License text for the new extension. - **New Features** - Introduced a Visual Studio Code extension with a CodeLens provider for Python files, allowing users to evaluate variables directly in the editor. - **Refactor** - Updated dependency declarations to utilize a new method for handling Scala artifacts, improving consistency across the project. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Nikhil Simha <[email protected]>
## Summary Adds AWS build and push commands to the distribution script. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quiour clientsstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quiour clientsstart on GCP. - **Refactor** - Updated the AWS quiour clientsstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…FilePath and replacing `/` to `.` in MetaData names (#398) ## Summary ^^^ Tested on the our clients laptop. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved error handling to explicitly report when configuration values are missing. - **New Features** - Introduced standardized constants for various configuration types, ensuring consistent key naming. - **Refactor** - Unified metadata processing by using direct metadata names instead of file paths. - Enhanced type safety in configuration options for clearer and more reliable behavior. - **Tests** - Updated test cases and parameters to reflect the improved metadata and configuration handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary building join output schema should belong to metadata store - and also reduces the size of fetcher. ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an optimized caching mechanism for data join operations, resulting in improved performance and reliability. - Added new methods to facilitate the creation and management of join codecs. - **Bug Fixes** - Enhanced error handling for join codec operations, ensuring clearer context for failures. - **Documentation** - Improved code readability and clarity through updated comments and method signatures. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…#422) ## Summary Add support to run the fetcher service in doour clientser. Also add rails to publish to doour clientser hub as a private image - [ziplineai/chronon-fetcher](https://hub.doour clientser.com/repository/doour clientser/ziplineai/chronon-fetcher) I wasn't able to sort out logbaour clients / log4j2 logging as there's a lot of deps messing things up - Vert.x supports JUL configs and that is seemingly working so starting with that for now. Tested with: ``` doour clientser run -v ~/.config/gcloud/application_default_credentials.json:/gcp/credentials.json \ -p 9000:9000 \ -e "GCP_PROJECT_ID=canary-443022" \ -e "GOOGLE_CLOUD_PROJECT=canary-443022" \ -e "GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance" \ -e "STATSD_HOST=127.0.0.1" \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/credentials.json \ ziplineai/chronon-fetcher ``` And then you can `curl http://localhost:9000/ping` On our clients side just need to swap out the project and bt instance id and then can curl the actual join: ``` curl -X POST http://localhost:9000/v1/fetch/join/search.ranking.v1_web_zipline_cdc_and_beacon_external -H 'Content-Type: application/json' -d '[{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"}]' {"results":[{"status":"Success","entityKeys":{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"},"features":{... ``` ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added an automation script that streamlines the container image build and publication process with improved error handling. - Introduced a new container configuration that installs essential dependencies, sets environment variables, and incorporates a health cheour clients for enhanced reliability. - Implemented a robust logging setup that standardizes console and file outputs with log rotation. - Provided a startup script for the service that verifies required settings and applies platform-specific options for seamless execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adds the ability to push artifacts to aws in addition to gcp. Also adds ability to specify specific customer ids to push to. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new automation script that streamlines the process of building artifacts and deploying them to both AWS and GCP with improved error handling and user confirmation. - **Chores** - Removed a legacy artifact upload script that previously handled only GCP deployments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Supporting StagingQueries for configurable compute engines. To support
BigQuery, the simplest way is to just write bigquery sql and run it on
bq to create the final table. Let's first make the API change.
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Summary by CodeRabbit
- **New Features**
- Added an option for users to specify the compute engine when
processing queries, offering choices such as Spark and BigQuery.
- Introduced validation to ensure that queries run only with the
designated engine.
- **Style**
- Streamlined code organization for enhanced readability.
- Consolidated and reordered import statements for improved clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: Thomas Chow <[email protected]>
## Summary fetcher has grown over time into a large file with many large functions that are hard to work with. This refactoring doesn't change any functionality - just placement. Made some of the scala code more idiomatic - if(try.isFailed) - vs try.recoverWith Made Metadata methods more explicit FetcherBase -> JoinPartFetcher + GroupByFetcher + GroupByResponseHandler Added fetch context - to replace 10 constructor params ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a unified configuration context that enhances data fetching, including improved group-by and join operations with more robust error handling. - Added a new `FetchContext` class to manage fetching operations and execution contexts. - Implemented a new `GroupByFetcher` class for efficient group-by data retrieval. - **Refactor** - Upgraded serialization and deserialization to use a more efficient, compact protocol. - Standardized API definitions and type declarations across modules to improve clarity and maintainability. - Enhanced error handling in various methods to provide more informative messages. - **Chores** - Removed outdated utilities and reorganized dependency imports. - Updated test suites to align with the refactored architecture. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- Staging query should in theory already work for external tables
without additional code changes as long as we do some setup work to pin
up a view first.
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: Thomas Chow <[email protected]>
## Summary The existing aggregations configure the items sketch incorrectly. Split it into two one that works purely with skewed data, and one that tries to best-effort collect most frequent items. ## Cheour clientslist - [x] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced new utility functions to streamline expression composition and cleanup. - Enhanced aggregation descriptions for clearer operation choices. - Added new aggregation types for improved data analysis. - **Refactor** - Revamped frequency analysis logic with improved error handling and optimized sizing. - Replaced legacy histogram approaches with a more robust frequent item detection mechanism. - **Tests** - Added tests to validate heavy hitter detection and skewed data scenarios, while removing obsolete histogram tests. - Updated existing tests to reflect changes in aggregation parameters. - **Chores** - Removed deprecated interactive modules for a leaner deployment. - **Configuration** - Adjusted default aggregation parameters for more consistent processing, including changes to the `k` value in multiple configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Add a couple of APIs to help with the our clients ***REMOVED*** integration. One is to
list out all online joins and the second is to retrieve the join schema
details for a given Join.
As part of wiring up list support, I tweaked a couple of properties like
the list pagination key / list call limit to make things consistent
between DynamoDB and BigTable.
For the BT implementation we issue a range query under the 'joins/'
prefix. Subsequent calls (in case of pagination) continue off this range
(verified this via unit tests and also basic sanity cheour clientss on our clients).
APIs added are:
* /v1/joins -> Return the list of online joins
* /v1/join/schema/join-name -> Return a payload consisting of
{"joinName": "..", "keySchema": "avro schema", "valueSchema": "avro
schema", "schemaHash": "hash"}
Tested by dropping the doour clientser container and confirming things on the
our clients side:
```
$ curl http://localhost:9000/v1/joins
{"joinNames":["search.ranking.v1_web_zipline_cdc_and_beacon_external" ...}
```
And
```
curl http://localhost:9000/v1/join/schema/search.ranking.v1_web_zipline_cdc_and_beacon_external
{ big payload }
```
## Cheour clientslist
- [X] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced new API endpoints that let users list available joins and
retrieve detailed join schema information.
- Added enhanced configuration options to support complex join
workflows.
- New test cases for validating join listing and schema retrieval
functionalities.
- Added new constants for pagination and entity type handling.
- **Improvements**
- Standardized pagination and entity handling across cloud integrations,
ensuring a consistent and reliable data listing experience.
- Enhanced error handling and response formatting for join-related
requests.
- Expanded testing capabilities with additional dependencies and
resource inclusion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary #398 updated the module path from `"/"` to `"."`, but not all code was migrated to the new convention, causing frontend API calls to fail when retrieving joins. @david-zlai – Can you review the code to ensure it fully aligns with the new convention? @sean-zlai – Can you tear down all Doour clientser images and rebuild on this branch to confirm observability works as expected? ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined how configuration names are handled in observability views. Names are now displayed as originally provided without extra formatting, ensuring a consistent and straightforward presentation. The fallbaour clients label remains “Unknown” when a name is not available. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - Everywhere else we want to handle partitions that could be non-string types. This is similar to the change in: https://github.com/zipline-ai/chronon/blob/0d78a99e44f97f95d05e528a749837bc9a38b32e/cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala#L122-L128 ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced partition date display by introducing configurable date formatting. - Partition dates are now consistently formatted based on user configuration, ensuring reliable and predictable output across the system. - Improved retrieval of partition format for BigQuery operations, allowing for broader usage across different paour clientsages. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to traour clients the status of staour clientss when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: Thomas Chow <[email protected]>
## Summary Enable batch IR caching by default & fix an issue where our Vertx init code tries to connect to BT at startup and takes a second or two on the worker threads (and results in the warning - 'Thread Thread[vert.x-eventloop-thread-1,5,main] has been bloour clientsed for 2976 ms, time limit is 2000 ms'). ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined caching configuration and logic with a consistent default setting for improved behavior. - Enhanced service startup by shifting to asynchronous initialization with better error handling for a more robust launch. - **Tests** - Removed an outdated test case that validated previous caching behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary This PR allows the frontend to specify which percentiles it retrieves from the baour clientsend. The percentiles can be passed as a query parameter: ``` percentiles=p0,p10,p90 ``` If omitted, the default percentiles are used: ``` percentiles=p5,p50,p95 ``` ### Example Requests *(App must be running)* #### Default (uses `p5,p50,p95`) ```sh curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000" ``` #### Equivalent Explicit Default ```sh curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000&percentiles=p5,p50,p95" ``` #### Custom Percentiles (`p0,p10,p90`) ```sh curl "http://localhost:5173/api/v1/join/risk.user_transactions.txn_join/column/txn_by_user_transaction_amount_count_1h/summary?startTs=1672531200000&endTs=1677628800000&percentiles=p0,p10,p90" ``` ### Notes - Omitting the `percentiles` parameter is the same as explicitly setting `percentiles=p5,p50,p95`. - You can test using `curl` or Postman. - We need to let users change these percentiles via cheour clientsboxes or another UI control. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added support for customizable percentile parameters in summary data requests, with a default setting of "p5, p50, p95". - Enhanced the ability to retrieve detailed statistical summaries by allowing users to specify percentile values when querying data. - Introduced two new optional dependencies for improved functionality. - **Bug Fixes** - Adjusted method signatures to ensure compatibility with the new percentile parameters in various components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary I noticed we were missing the core chronon fetcher logs during feature lookup requests. As we anyway wanted to rip out the JUL & logbaour clients, I went ahead and dropped those for a log4j2 properties file. Confirmed that I am seeing the relevant fetcher logs from classes like the SawtoothOnlineAggregator etc when I hit the service with a feature look up request. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Consolidated service deployment paths and streamlined startup configuration. - Improved metrics handling by conditionally enabling reporting based on environment settings. - **Chores** - Optimized resource paour clientsaging and removed legacy dependencies. - Upgraded logging configuration to enhance performance and log management. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
#438) ## Summary 1. added offset and bound support to staging query macros `{{ start_date }}` is valid as before, now `{{ start_date(offset=-10, lower_bound='2023-01-01') }}` is also valid 2. Previously we required users to pass in quotes around the macro separately. This pr removes the need for it `{{ start_date }}` used to become `2023-01-01`, it now becomes `'2023-01-01'` 2. added a unified top level module `api.chronon.types` that contain everything that users need. 3. added wrappers on source sub types to directly return sources ```py ttypes.Source(events=ttypes.EventSource(...)) # now becomes EventSource(...) ``` ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added new functions for creating event, entity, and join data sources. - Introduced enhanced date macro utilities to enable flexible SQL query substitutions. - **Refactor** - Streamlined naming conventions and standardized parameter formatting. - Consolidated and simplified import structures for improved consistency. - Updated method signatures and calls from `select` to `selects` across various components. - Removed reliance on `ttypes` for source definitions and standardized parameter naming conventions. - Simplified macro substitution logic in the `StagingQuery` object. - **Tests** - Implemented comprehensive tests for date manipulation features to ensure robust behavior. - Updated existing tests to reflect changes in method names and query formatting. - Adjusted data generation parameters in tests to increase transaction volumes. - **Documentation** - Updated configuration descriptions to clearly illustrate new date template options and parameter adjustments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary cleaning up top level dir ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Refined version control and build settings by updating ignored paths and tool versions. - Removed obsolete internal configurations, tooling, and Doour clientser build files for a cleaner project structure. - **Documentation** - Updated installation guidance links for clearer setup instructions. - Eliminated legacy contributor, governance, and quiour clientsstart guides to reduce clutter. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary No turning baour clients now ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Removed legacy internal components from workflow orchestration and task management to streamline operations. - **Documentation** - Updated deployment guidance by removing outdated references. These internal improvements enhance maintainability and performance without altering your current user experience. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary move OSS docsite release scripts ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Made behind‑the‑scenes updates to streamline our internal release management processes. There are no visible changes to functionality for end-users in this release. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Consolidated and streamlined build dependencies for improved
integration with AWS services and data processing libraries.
- Expanded the set of supported third-party libraries, including new
artifacts for enhanced performance and compatibility.
- Added new dependencies for Hudi, Jaour clientsson, and Zookeeper to enhance
functionality.
- Introduced additional Hudi artifacts for Scala 2.12 and 2.13 to
broaden available functionalities.
- **Tests**
- Added a new test class to verify reliable write/read operations on
Hudi tables using a Spark session.
- **Refactor**
- Enhanced serialization registration to support a broader range of data
types, improving overall processing stability.
- Introduced a new variable for shared library dependencies to simplify
dependency management.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: Thomas Chow <[email protected]>
## Summary ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Improved the internal setup for fetch operations by reorganizing the underlying structure. This update streamlines baour clientsground processing and enhances overall maintainability while keeping user-facing functionality unchanged. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Style**
- Reorganized import statements for improved readability.
- **Chores**
- Removed debugging print statements from partition insertion to clean
up console output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: thomaschow <[email protected]>
## Summary Run push_to_platform on pull request merge only. Also use default message ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated workflow to run only after a pull request is merged into the main branch, instead of on every push. - Adjusted the commit message behavior for subtree updates to use the default message. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Removed the synthetic dataset generation script for browser and device
fingerprinting data.
- Removed related test configurations and documentation for AWS Zipline
and Plaid data processing.
- Updated AWS release workflow to exclude the "our clients" customer ID from
S3 uploads.
- Cleaned up commented-out AWS S3 and Glue deletion commands in
deployment scripts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
---------
Co-authored-by: thomaschow <[email protected]>
## Summary ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Removed references to "our clients" as a customer ID from workflows, scripts, and documentation. - Deleted test and configuration files related to "our clients" and sample teams. - Updated Avro schema namespaces and default values from "com.our clients" to "com.customer" and related URLs. - Improved indentation and formatting in sample configuration files. - **Tests** - Updated test arguments and removed obsolete test data related to "our clients". <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…-passing-candidate to line up with publish_release (#760) ## Summary ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated storage paths for artifact uploads to cloud storage in deployment workflows. - **Documentation** - Corrected a type annotation in the documentation for a query parameter. - **Tests** - Enhanced a test to include and verify a new query parameter. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…mapping (#728) ## Summary Updating the JoinSchemaResponse to include a mapping from feature -> listing key. This PR updates our JoinSchemaResponse to include a value info case class with these details. ## Cheour clientslist - [X] Added Unit Tests - [X] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Added detailed metadata for join value fields, including feature names, group names, prefixes, left keys, and schema descriptions, now available in join schema responses. - **Bug Fixes** - Improved consistency and validation between join configuration keys and value field metadata. - **Tests** - Enhanced and added tests to validate the presence and correctness of value field metadata in join schema responses. - Introduced new test suites covering fetcher failure scenarios and metadata store functionality. - Refactored existing fetcher tests to use external utility methods for data generation. - Added utility methods for generating deterministic, random, and event-only test data configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved the handling of the `--mode` command-line option to ensure all available choices are displayed as strings. This enhances compatibility and usability when selecting modes. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary As we will be publishing from platform for now, delete this workflow from chronon. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Removed the automated release publishing workflow, including all related build, validation, artifact promotion, and cleanup steps. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary ## Cheour clientslist - [ ] Added Unit Tests - [X] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Updated test cases to use a new event schema with revised field names and structure. - Renamed and adjusted test data and helper methods to align with the new schema and naming conventions. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Pulling out from PR - #751 as we're waiting on an r there and it shows up as noise in various places so lets just fix. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved handling of metrics exporter URL configuration to prevent errors when the URL is not defined. - Ensured metrics are only initialized when both metrics are enabled and an exporter URL is present. - **Refactor** - Enhanced internal logic for safer initialization of metrics reporting, reducing the risk of misconfiguration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add Cloud GCP Embedded Jar to canary build process. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Enhanced CI/CD workflow to build, upload, and manage a new embedded GCP jar artifact throughout the deployment process. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…rfaces (#751) ## Summary Refactor some of the schema provider shaped code to - * Use the existing SerDe class interfaces we have * Work with Mutation types via the SerDe classes * Primary shuffling is around pulling the Avro deser out of the existing BaseAvroDeserializationSchema and delegating that to the SerDe to get a Mutation baour clients as well as shifting things a bit to call CatalystUtil with the Mutation Array[Any] types. * Provide rails for users to provide a custom schema provider. I used this to test a version of the beacon app out in canary - I'll put up a separate PR for the test job in a follow up. * Other misc piled up fixes - Cheour clients that GBUs don't compute empty results; fix our Otel metrics code to be turned off by default to reduce log spam. ## Cheour clientslist - [X] Added Unit Tests - [X] Covered by existing CI - [X] Integration tested -- Tested via canary on our env / cust env and confirmed we pass the validation piece as well as see the jobs come up and write out data to BT. - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added Avro serialization and deserialization support for online data processing. - Introduced flexible schema registry and custom schema provider selection for Flink streaming sources. - **Refactor** - Unified and renamed the serialization/deserialization interface to `SerDe` across modules. - Centralized and simplified schema provider and deserialization logic for Flink jobs. - Improved visibility and type safety for internal utilities. - **Bug Fixes** - Enhanced error handling and robustness in metrics initialization and deserialization workflows. - **Tests** - Added and updated tests for Avro deserialization and schema registry integration. - Removed outdated or redundant test suites. - **Chores** - Updated external dependencies to include Avro support. - Cleaned up unused files and legacy code. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Builds on top of PR: #751. This PR adds a streaming GroupBy that can be run as a canary to sanity cheour clients and test things out while making Flink changes. I used this to sanity cheour clients the creation & use of a Moour clients schema serde that some users have been asking for. Can be submitted via: ``` $ CHRONON_ROOT=`pwd`/api/python/test/canary $ zipline compile --chronon-root=$CHRONON_ROOT $ zipline run --repo=$CHRONON_ROOT --version $VERSION --mode streaming --conf compiled/group_bys/gcp/item_event_canary.actions_v1 --kafka-bootstrap=bootstrap.zipline-kafka-cluster.us-central1.managedkafka.canary-443022.cloud.goog:9092 --groupby-name gcp.item_event_canary.actions_v1 --validate ``` (Needs the Flink event driver to be running - triggered via DataProcSubmitterTest) ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new group-by aggregation for item event actions, supporting real-time analytics by listing ID with data sourced from GCP Kafka and BigQuery. - Added a moour clients schema provider for testing item event ingestion. - **Bug Fixes** - Updated test configurations to use new event schemas, topics, and data paths for improved accuracy in Flink Kafka ingest job tests. - **Refactor** - Renamed and restructured the event driver to focus on item events, with a streamlined schema and updated job naming. - **Chores** - Added new environment variable for Flink state storage configuration. - Updated build configuration to reference the renamed event driver. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adding a field `LogicalType` to `conf` thrift, and fixing a typo. ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added an optional field for logical type classification to the configuration in the orchestration service API. - **Style** - Updated a parameter name in a method signature for improved clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: ezvz <[email protected]>
) ## Summary This is the command we expect users to run in their Airflow setup ``` zipline run --mode streaming deploy --kafka-bootstrap=<KAFKA_BOOTSTRAP> --conf <CONF> --version-cheour clients --latest-savepoint --disable-cloud-logging ``` - This command first does a version cheour clients that compares the local zipline version with the zipline version of the running flink app. If they're equal, no-op. - If they're different we proceed with deploying. We get the latest savepoint/cheour clientspoint and then deploy the Flink app with that. Then in the CLI, we proceed to poll for the manifest file that will be written out by the Flink app to update with the updated Flink app id + new dataproc id. In addition to `--latest-savepoint`, we're going to support `--no-savepoint` and `--custom-savepoint` deployment strategies. In addition we're also going to supporting: ``` zipline run --mode streaming cheour clients-if-job-is-running --conf <CONF> ``` To cheour clients if there is a running Flink job. We implement this by using the Dataproc client to filter active jobs with custom labels we set on job-type and metadata-name. ## Cheour clientslist - [x] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added Google Cloud Storage client with file listing, existence cheour clientss, and in-memory downloads. - Enhanced Flink streaming job management with cheour clientspointing, savepoint strategies, version cheour clientss, and deployment verification. - Extended CLI and environment variables to support advanced Flink and Spark job deployment controls. - Introduced new configuration templates and test resources for quiour clientsstart and team metadata. - Added new Flink job option to write internal manifest linking Flink job ID and parent job ID. - **Improvements** - Upgraded Python and Scala dependencies for improved compatibility and security. - Improved logging consistency, error handling, and job state traour clientsing for Dataproc deployments. - Refactored job submission logic for better modularity and streaming support. - Enhanced deployment scripts with optional git cheour clients skipping. - **Bug Fixes** - Standardized logging and refined error detection in deployment scripts. - Improved error handling during streaming job polling and deployment verification. - **Tests** - Added extensive tests for GCS client, Dataproc submitter, job submission workflows, and configuration handling. - **Chores** - Updated build scripts and Bazel files to include new dependencies and test resources. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary It seems when I copied the workflow to push_to_platform.yaml, I forgot to delete the trigger workflow. They are now racing with each other since both repos are currently private. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Removed the automated workflow that triggered platform subtree updates on new changes to the main branch. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ons (#771) ## Summary ^^^ Currently, we'll face unexpected behavior if multiple people are working and iterating on the same GroupBy/Join and changing the conf because we'll upload to the same GCS path. This change adds the job id to the destination GCS path. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined job submission to upload a single metadata configuration file, simplifying the process. - Enhanced job ID management by requiring and propagating a job ID argument, improving job traour clientsing and consistency. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…partition (#772) ## Summary - Fix partition sensor cheour clients, it needs to cheour clients that the primary partition value is present. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced logging to show detailed partition keys and values during partition cheour clientss for improved transparency. - **Style** - Improved organization and grouping of import statements for clarity and consistency. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to traour clients the status of staour clientss when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> Co-authored-by: thomaschow <[email protected]>
## Summary Adding a flag so that airflow integration knows whether to schedule a join or not ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced join metadata to include a flag indicating the presence of label parts. - **Tests** - Updated sample join test to include label part information in join instantiation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary
- We should be running setups regardless of whether things are
partitioned.
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to traour clients
the status of staour clientss when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Refactor**
- Adjusted the timing of SQL setup command execution to occur earlier in
the staging query process, ensuring setups run before any query
execution or partition cheour clientss. No changes to user-facing features or
functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: thomaschow <[email protected]>
…gInfo (#774) ## Summary When we add fields in our API, we can run into baour clientswards / forwards compat issues depending on when the json updates make their way out to the GroupByServingInfo (on orch side / serving side). Turning off the round trip cheour clients to help cut the noise on these issues. If we can deserialize the thrift json we proceed else this code will throw a JsonException. Some details - [slaour clients thread](https://zipline-2kh4520.slaour clients.com/archives/C08345NBWH4/p1747092844340579) ## Cheour clientslist - [ ] Added Unit Tests - [X] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved compatibility when loading certain configuration data by relaxing validation during data processing. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
Caution Review failedAn error occurred during the review process. Please try again later. WalkthroughThis change introduces Bazel build and GitHub Actions CI configuration, migrates Python API compilation and runtime logic to a new modular structure, adds GCP/AWS runner support, implements a new CLI, restructures sample/test data, and removes legacy Airflow DAG/operator code. It also updates aggregation APIs and window handling, and modernizes test and requirements files. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Compiler
participant Validator
participant Runner
participant CloudProvider
User->>CLI: zipline compile/run/init
CLI->>Compiler: CompileContext setup
Compiler->>Validator: Validate GroupBy/Join/StagingQuery
Validator-->>Compiler: Validation results
Compiler->>CLI: Write compiled objects / errors
User->>CLI: zipline run --mode ...
CLI->>Runner: Prepare job, set env, download jars
Runner->>CloudProvider: Submit Spark/Flink job (GCP/AWS/Local)
CloudProvider-->>Runner: Job status/results
Runner-->>CLI: Output results/status
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
## Summary replace partition spec with column -> partiton spec ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Simplified partition specification handling across planners and utilities by removing the custom partition spec wrapper and standardizing on a single partition spec type. - Updated related methods and class constructors to use the new partition spec approach, streamlining partition metadata access. - Removed unused fields and imports related to the old partition spec wrapper. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: ezvz <[email protected]>
## Summary Setting the partition spec correctly in table dependencies ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Corrected the partition column used for snapshot table dependencies to ensure accurate data partitioning. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: ezvz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 53
🔭 Outside diff range comments (6)
api/python/test/sample/joins/sample_team/sample_chaining_join.py (1)
1-34: 💡 Verification agent🧩 Analysis chain
File deletion seems appropriate.
Part of broader cleanup of sample configurations.
🏁 Script executed:
#!/bin/bash # Check if any other files reference this deleted sample rg -l "sample_chaining_join" --type pyLength of output: 134
Remove stale references to deleted sample_chaining_join.py
- In api/python/test/test_utils.py: remove the sample/joins/sample_team/sample_chaining_join.py entry
- In api/python/test/sample/group_bys/sample_team/chaining_group_by.py: strip out any doc-comment links to sample_chaining_join.py
api/python/ai/chronon/cli/logger.py (1)
54-62:⚠️ Potential issueFix
done()signature. Thecondparameter is unused; simplify todef done(message):.api/python/ai/chronon/cli/compile/compiler.py (1)
151-161:⚠️ Potential issueReturn type mismatch
_write_objectis annotated to returnOptional[List[BaseException]]but returns nothing. Either return collected errors or change the annotation toNone.api/python/ai/chronon/repo/validator.py (1)
159-176:⚠️ Potential issue
build_derived_columnsmutates caller input & assumesSetAPIs onlist➜ runtime crash
output_columns = pre_derived_columnskeeps a reference to the caller’s collection and later calls.clear(),.add(),.remove().
Becauseget_join_output_columnsnow passes a list (183-188), the first call to.add()will raiseAttributeError.
Fix: copy into asetup-front and only sort when returning.- output_columns = pre_derived_columns + # Work on an internal set copy – never mutate the caller. + output_columns = set(pre_derived_columns)Also replace
.remove(...)with.discard(...)to avoidKeyError.api/python/ai/chronon/join.py (1)
65-91: 🛠️ Refactor suggestionAvoid global
__import__monkey-patchingMutating
__builtins__["__import__"]even temporarily is risky in multithreaded processes; any concurrent import during this window may break. Wrap this in a small context-manager (or avoid entirely).api/python/ai/chronon/group_by.py (1)
608-610:⚠️ Potential issue
{source}not interpolated.Second string isn’t an f-string, so
{source}is printed literally.- "in source {source}. Please specify only the `timeColumn`" + f"in source {source}. Please specify only the `timeColumn`"
♻️ Duplicate comments (2)
.github/workflows/push_to_canary.yaml (2)
392-397: Repeat of path/quoting issue inpush_to_aws_passing
Fix as above for wheel & jars.
Also add quotes around${{ needs.build_artifacts.outputs.version }}to eliminate SC2086 warnings.
452-462: Duplicate GCP passing step – path + quoting
Same corrections required for GCP promotion step. Consider extracting an action/step template to DRY.
🧹 Nitpick comments (87)
api/python/README.md (1)
134-136: Add language specifier to code fence.Markdown linter flags missing language identifier.
-``` +```bash🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
134-134: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
api/python/test/sample/README.md (1)
1-7: Duplicate of canary README.Content identical to canary README. Consider consolidating or differentiating content.
.github/pull_request_template.md (1)
9-9: Remove stray character.Line contains only "9" which appears to be unintentional.
-9.github/release.yml (2)
9-9: Fix trailing spaces.Remove trailing space after "Minor features".
- - title: Minor features + - title: Minor features🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 9-9: trailing spaces
(trailing-spaces)
17-17: Add newline at EOF.Add newline character at end of file.
- "*" +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 17-17: no new line character at the end of file
(new-line-at-end-of-file)
api/python/ai/chronon/eval/query_parsing.py (1)
1-20: SQL table extraction utility is well-implemented.Function cleanly extracts table names from SQL queries using sqlglot with BigQuery dialect. Consider adding error handling for malformed queries.
def get_tables_from_query(sql_query) -> List[str]: import sqlglot - - # Parse the query - parsed = sqlglot.parse_one(sql_query, dialect="bigquery") + try: + # Parse the query + parsed = sqlglot.parse_one(sql_query, dialect="bigquery") - # Extract all table references - tables = parsed.find_all(sqlglot.exp.Table) + # Extract all table references + tables = parsed.find_all(sqlglot.exp.Table) - table_names = [] - for table in tables: - name_parts = [part for part in [table.catalog, table.db, table.name] if part] - table_name = ".".join(name_parts) - table_names.append(table_name) + table_names = [] + for table in tables: + name_parts = [part for part in [table.catalog, table.db, table.name] if part] + table_name = ".".join(name_parts) + table_names.append(table_name) - return table_names + return table_names + except Exception as e: + # Return empty list or raise a more specific error + return []aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala (1)
25-25: Consider better test description."basic" is vague. Suggest more descriptive name like "correctly calculate edit distances" to improve test clarity.
api/python/test/sample/joins/sample_team/sample_chaining_join_parent.py (2)
1-24: Add docstring.Missing explanatory docstring.
12-19: Extract repeated key_mapping.Duplicate key_mapping. Consider creating constant.
api/python/pyproject.toml (1)
35-35: Remove commented line.Decide on E402 rule or remove comment.
api/python/ai/chronon/cli/compile/display/diff_result.py (2)
14-21: Move signage methods up.Define at class level if reused elsewhere.
5-6: Remove extra blank line.Consistency improvement.
api/python/ai/chronon/resources/gcp/joins/test/data.py (2)
1-1: Use absolute import forgroup_bys.test.data.Use an absolute import path to avoid potential import resolution issues.
-from group_bys.test.data import group_by_v1 +from api.python.ai.chronon.resources.gcp.group_bys.test.data import group_by_v1
23-28: Consider using more descriptive variable names.Generic variable names like
v1lack context.-v1 = Join( +checkout_features_join = Join( left=source, right_parts=[ JoinPart(group_by=group_by_v1) ], ).github/workflows/build_and_push_docker.yaml (1)
35-35: Add newline at end of fileFile should end with a newline.
push: true tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
api/python/ai/chronon/repo/explore.py (1)
76-76: Key renamed for consistencyChanged from "output_namespace" to 'outputNamespace' to align with camelCase convention.
.github/image/Dockerfile (2)
59-59: Use--no-cache-dirwith pip.
Avoid inflating the final image layer by adding--no-cache-dirtopip3 install ….
61-63: Combine the secondapt update.
You runapt updatetwice; merge into the earlier layer to shrink image & speed builds.api/python/test/canary/group_bys/aws/purchases.py (2)
19-19:dayshadowing is fine but lint-unfriendly.
Minor: usedaysorlengthto appease linters/readers.
21-46: Duplicate definitions inflate config.
v1_devandv1_testdiffer only in name; build one helper and clone it to cut noise..github/workflows/push_to_platform.yaml (1)
34-34: Fix formatting issues.Remove trailing spaces and add newline at end of file.
- chmod 600 ~/.ssh/id_rsa - + chmod 600 ~/.ssh/id_rsa + - ssh-keyscan github.com >> ~/.ssh/known_hosts - + ssh-keyscan github.com >> ~/.ssh/known_hosts + - ssh-add ~/.ssh/id_rsa - + ssh-add ~/.ssh/id_rsa + - EOF - + EOF + - git remote add chronon [email protected]:zipline-ai/chronon.git || true - + git remote add chronon [email protected]:zipline-ai/chronon.git || true + - run: git push origin main + run: git push origin main +Also applies to: 37-37, 41-41, 49-49, 52-52, 57-57
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 34-34: trailing spaces
(trailing-spaces)
api/python/ai/chronon/windows.py (1)
50-51: Consider supporting minutes and weeks.Function only supports hours and days. Consider adding minutes and weeks for greater flexibility.
.github/workflows/test_scala_fmt.yaml (2)
50-50: Missing newline at end of fileAdd a newline at the end to follow standard coding practices.
{}.format-test +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 50-50: no new line character at the end of file
(new-line-at-end-of-file)
42-43: Add error handling for credential decodingBase64 decoding may fail silently.
- name: Setup Bazel cache credentials run: | - echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json + if ! echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json; then + echo "Failed to decode BAZEL_CACHE_CREDENTIALS" + exit 1 + fi.github/workflows/test_bazel_config.yaml (2)
22-23: Remove extra blank lineFor consistency, remove the extra newline.
- 'WORKSPACE' - concurrency:
43-45: Add error handling for credential decodingBase64 decoding may fail silently.
- name: Setup Bazel cache credentials run: | - echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json + if ! echo "${{ secrets.BAZEL_CACHE_CREDENTIALS }}" | base64 -d > bazel-cache-key.json; then + echo "Failed to decode BAZEL_CACHE_CREDENTIALS" + exit 1 + fiapi/python/test/canary/teams.py (1)
111-111: Consider additional AWS configurationAWS team has minimal configuration compared to GCP.
Would you like me to suggest additional AWS-specific configurations to match GCP's comprehensive setup?
api/python/ai/chronon/cli/plan/controller_iface.py (1)
23-23: Fix typo in method name. Renameupload_branch_mappsingtoupload_branch_mapping.api/python/ai/chronon/resources/gcp/teams.py (2)
27-32: Placeholders in config. Values like<customer_id>remain; add runtime validation or template injection.
41-48: Unfilled environment placeholders. Consider fail-fast on missing values.aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala (1)
92-100: Clean up dead code. Remove or re-enable the commented-out naive aggregator block.api/python/ai/chronon/eval/__init__.py (2)
50-54: String replacement can mangle nested names
str.replace()might inadvertently rewrite substrings inside other identifiers (e.g.project.tablevsproject.table2). Use word-boundary regex orsqlglotrewriting instead.-import re -... -clean_query = re.sub(rf'\b{re.escape(table_name)}\b', clean_name, clean_query)
102-108: Optional typing for_sparkInitialising
_spark: SparkSession = Noneviolates the declared non-nullable type. Declare asOptional[SparkSession](Python 3.10+:SparkSession | None) to appease type-checkers.aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (2)
137-139: Box helpers already exist
java.lang.Long.valueOf/Double.valueOfare used directly two lines below, makingtoJLong/toJDoubleredundant. Consider inlining to cut noise.
268-290: FrequentItems mapper covers boxed numerics only
FrequentItemsgeneric type must be boxed; the conversions provided handle this, good. Be mindful thatkdefault 8 may be too low for realistic heavy-hitter detection—verify with domain use-cases.README.md (1)
15-16: Tighten wordingReplace “on a regular basis” with “regularly” for brevity.
🧰 Tools
🪛 LanguageTool
[style] ~15-~15: ‘on a regular basis’ might be wordy. Consider a shorter alternative.
Context: ...on are picked and merged into this repo on a regular basis, and improvements made to this reposito...(EN_WORDINESS_PREMIUM_ON_A_REGULAR_BASIS)
api/python/ai/chronon/repo/zipline.py (4)
1-3: Consolidate importlib imports. Combine to a single line, e.g.:from importlib.metadata import version as ver, PackageNotFoundError
11-27: Extract large ASCII logo. Consider movingLOGOto a separate text/resource file or trimming inline art.
30-36: Cache package version. You callver("zipline-ai")twice; store its result in a module‐level constant to avoid duplicate lookups.
39-45: Avoid duplicate version retrieval._set_package_version()is invoked in both the decorator and function body—call it once and reuse the value.api/python/ai/chronon/types.py (1)
5-11: Reorder and simplify imports. Follow PEP8: standard libs first, then local; usefrom ai.chronon.api.common import ttypes as common..github/workflows/test_scala_2_12_non_spark.yaml (2)
47-59: Pin checkout action. Consider usingactions/checkout@v3for a stable major release instead of@v4.
35-203: DRY up repeated jobs. All module tests share identical steps; use a matrix or reusable workflow to minimize duplication..github/workflows/test_scala_2_13_non_spark.yaml (2)
45-46: Upgrade checkout. Recommendactions/checkout@v3over@v4for major‐release stability.
33-209: Centralize repeated definitions. Leveraging a matrix or composite action will make this DRY and easier to maintain.api/python/ai/chronon/cli/logger.py (3)
1-4: Reorder imports. Follow stdlib → third-party → local convention: e.g.import logging, sys; from datetime import datetime.
17-29: Include exception details. Currentformat()omitsrecord.exc_info/stack_info; extend it to capture tracebacks when present.
46-52: Leverage click styles. Replace custom ANSI wrappers withclick.style(text, fg='red')for portability..github/workflows/test_scala_2_12_spark.yaml (2)
33-46: Trim duplication with a matrix / reusable stepSeven jobs repeat identical checkout + cache-credential + Bazel test logic. A
strategy.matrixor a reusable workflow would shrink ~100 lines and ease maintenance.Also applies to: 59-71, 85-97, 111-123, 137-149, 163-175, 189-201
201-201: Missing trailing newlineAdd a final newline to keep linters quiet.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 201-201: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/test_python.yaml (1)
71-71: Add trailing newlineEnds-with-newline = happier YAML linters.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 71-71: no new line character at the end of file
(new-line-at-end-of-file)
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (1)
398-406: Magic constants – extract for clarity
0.75 * 0.5comes from internal purge + load-factor assumptions. Consider:private val PurgeFactor = 0.5 private val LoadFactor = 0.75 val sketchSize = nextPowerOfTwo(math.ceil(mapSize /(PurgeFactor*LoadFactor)).toInt max 2)Easier to tune later.
api/python/ai/chronon/eval/sample_tables.py (2)
13-14: Simplified variable naming needed.Variable
raw_scan_queryis redundant since it just storesqueryparameter.- raw_scan_query = query - print(f"Sampling {table} with query: {raw_scan_query}") + print(f"Sampling {table} with query: {query}")
58-59: Remove duplicate import.
osalready imported at module level.- import os -api/python/ai/chronon/resources/gcp/README.md (6)
33-34: Fix incomplete code block.Missing closing backtick.
```bash ./zipline-cli-install.sh - + ```
98-98: Fix heading style.Remove trailing period per markdown standards.
-## 🧪 Running a GroupBy upload (GBU) job. +## 🧪 Running a GroupBy upload (GBU) job🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
98-98: Trailing punctuation in heading
Punctuation: '.'(MD026, no-trailing-punctuation)
111-111: Fix heading style.Remove trailing period.
-## 🧪 Upload the GBU values to online KV store. +## 🧪 Upload the GBU values to online KV store🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
111-111: Trailing punctuation in heading
Punctuation: '.'(MD026, no-trailing-punctuation)
122-122: Fix heading style.Remove trailing period.
-## 🧪 Upload the metadata of Chronon GroupBy or Join to online KV store for serving. +## 🧪 Upload the metadata of Chronon GroupBy or Join to online KV store for serving🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
122-122: Trailing punctuation in heading
Punctuation: '.'(MD026, no-trailing-punctuation)
140-140: Fix heading style.Remove trailing period.
-## 🧪 Fetch feature values from Chronon GroupBy or Join. +## 🧪 Fetch feature values from Chronon GroupBy or Join🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
140-140: Trailing punctuation in heading
Punctuation: '.'(MD026, no-trailing-punctuation)
167-167: Update GitHub link.Link points to airbnb/chronon but should be zipline-ai/chronon.
-[GitHub](https://github.com/airbnb/chronon) +[GitHub](https://github.com/zipline-ai/chronon)api/python/ai/chronon/repo/aws.py (1)
49-61: Prefer logger over print
UseLOG.info/exceptionto keep output consistent and structured.api/python/ai/chronon/cli/compile/display/compile_status.py (1)
32-40: Key may beNone
UsingNoneas dict key for trackers is error-prone; derive key fromcompiled.obj_typewhen absent.api/python/ai/chronon/cli/compile/display/class_tracker.py (1)
68-72: Initializeclosedflag
self.closedfirst used here; define in__init__for clarity.def __init__(self): ... self.deleted_names: List[str] = [] + self.closed: bool = Falseapi/python/ai/chronon/cli/compile/parse_configs.py (1)
29-30: Consider checking file existence before import.Add file existence check before attempting to import to prevent unexpected errors.
- try: - results_dict = from_file(f, cls, input_dir) + try: + if not os.path.isfile(f): + raise FileNotFoundError(f"File {f} does not exist") + results_dict = from_file(f, cls, input_dir)api/python/ai/chronon/source.py (1)
59-61: Documentation mix-up in EntitySource.Parameter descriptions appear to be misaligned - the description for
mutationTopicseems to describe whatqueryshould do.- - mutationTopic: The logic used to scan both the table and the topic. Contains row level transformations - and filtering expressed as Spark SQL statements. - - query: If each new hive partition contains not just the current day's events but the entire set + - mutationTopic: Kafka topic containing mutation events for the entity. + - query: The logic used to scan both the table and the topic. Contains row level transformations + and filtering expressed as Spark SQL statements.api/python/ai/chronon/repo/run.py (2)
74-74: Remove commented-out fetch_online_jar reference.Since there's a clear note explaining why it's not used, consider removing the commented code.
- # NOTE: We don't want to ever call the fetch_online_jar.py script since we're working - # on our internal zipline fork of the chronon repo - # "online_jar_fetch": os.path.join(chronon_repo_path, "scripts/fetch_online_jar.py"),
239-239: Enhance error message for missing configuration.Provide more actionable guidance when config file is missing.
- raise ValueError(f"Conf file {conf_path} does not exist.") + raise ValueError(f"Conf file {conf_path} does not exist. Ensure the file path is correct relative to {repo}.")api/python/ai/chronon/repo/default_runner.py (1)
186-193: Shell-string command building invites injection & quoting bugs
You interpolate user-supplied strings straight into a shell command, then execute viautils.check_call. Prefersubprocess.run([...])with an arg list orshlex.quoteevery piece.api/python/ai/chronon/cli/compile/compiler.py (1)
7-14: Duplicate import
ai.chronon.cli.compile.display.compiled_objis imported twice (line 7 and 13). Drop one..github/workflows/test_scala_2_13_spark.yaml (2)
36-39: Securely handling credentials, but consider GitHub OIDCThe credential handling works but GitHub OIDC is more secure for GCP auth.
Also applies to: 63-66, 90-93, 117-120, 144-147, 171-174, 198-201
209-209: Missing newline at end of fileAdd a newline at the end of the file to fix the YAMLlint warning.
//spark:streaming_test +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 209-209: no new line character at the end of file
(new-line-at-end-of-file)
api/python/ai/chronon/staging_query.py (1)
18-22: Uselistdefault_factory for safety
additional_partitionswill be shared if later changed in-place.- additional_partitions: Optional[List[str]] = None + additional_partitions: Optional[List[str]] = field(default_factory=list)api/python/ai/chronon/cli/compile/parse_teams.py (2)
42-48: Don’t shadow built-in
Parameter name hides the global; rename toverbose.
28-38: Handle missing loader
specorspec.loadercan beNone, causingAttributeError. Add a guard.api/python/ai/chronon/repo/compile.py (2)
89-91: Path split is OS-specific
Hard-coding"/"breaks on Windows. Useos.seporpathlib.Path(input_path).parts.
238-239: Avoid mutating loop varname
Reassigningnamehampers traceability; use a new variable.api/python/ai/chronon/eval/table_scan.py (1)
64-68: Side-effect free helper wanted.
coalesce(self.query.reversalColumn, "is_before")mutates nothing; yet laterbase_selects["is_before"]is added even if original key exists – overwriting silently. Considersetdefault.api/python/ai/chronon/cli/compile/compile_context.py (1)
142-156: File handle leak.
open(full_path)without context will keep FDs open; usePath.read_text().api/python/ai/chronon/repo/utils.py (1)
50-53: Param style
ignoreErrordeviates from snake_case (ignore_error). Rename for consistency.api/python/ai/chronon/cli/git_utils.py (3)
61-68: Potential large-file load
git showreads full file into memory; for big binaries this may explode. Consider size cap or streaming.
36-41: Use logger, not print
Directlogger.error(...).
130-149: Minor inefficiency
real_changesbuilding could use list-comp with predicate instead of loop.api/python/ai/chronon/repo/validator.py (1)
389-393: Variable shadowing hurts readability
The list-comprehension reuses the outer nameerrors, shadowing the surrounding list.- for errors in group_by_errors - for error in errors + for gb_errs in group_by_errors + for error in gb_errsapi/python/ai/chronon/utils.py (1)
223-228:dict_to_bash_commandsdoes not quote values
If a value contains spaces or shell-special characters the generated CLI string will break.- cmd = ( - f"--{key.replace('_', '-')}={value}" - if value - else f"--{key.replace('_', '-')}" - ) + safe_val=${value@Q} # bash-style quoting + cmd = f"--{key.replace('_', '-')}" + (f"={safe_val}" if value else "")api/python/ai/chronon/repo/gcp.py (1)
248-276:final_argstemplate may swallow{}insideuser_args
str.formattreats braces literally; ifuser_argsalready contains{}, the format call can fail. Considerstring.Templateor double-brace escaping.api/python/ai/chronon/group_by.py (2)
226-238: Avoid relying on private helpers.
window_utils._from_stris underscored ⇒ internal API; risk of breakage. Expose a publicfrom_str()wrapper or inline the parsing logic.
183-185: Use public enum names.
common.TimeUnit._VALUES_TO_NAMESis private. Prefercommon.TimeUnit._NAMES_TO_VALUESreverse lookup, or storewindow.timeUnit.name.lower().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
⛔ Files ignored due to path filters (5)
api/python/test/sample/data/checkouts.csvis excluded by!**/*.csvapi/python/test/sample/data/purchases.csvis excluded by!**/*.csvapi/python/test/sample/data/purchases_new.csvis excluded by!**/*.csvapi/python/test/sample/data/returns.csvis excluded by!**/*.csvapi/python/test/sample/data/users.csvis excluded by!**/*.csv
📒 Files selected for processing (207)
.bazelignore(1 hunks).bazeliskrc(1 hunks).bazelproject(1 hunks).bazelrc(1 hunks).github/ISSUE_TEMPLATE/bug_report.md(1 hunks).github/ISSUE_TEMPLATE/feature_request.md(1 hunks).github/image/Dockerfile(1 hunks).github/pull_request_template.md(1 hunks).github/release.yml(1 hunks).github/workflows/build_and_push_docker.yaml(1 hunks).github/workflows/push_to_canary.yaml(1 hunks).github/workflows/push_to_platform.yaml(1 hunks).github/workflows/require_triggered_status_checks.yaml(1 hunks).github/workflows/test_bazel_config.yaml(1 hunks).github/workflows/test_python.yaml(1 hunks).github/workflows/test_scala_2_12_non_spark.yaml(1 hunks).github/workflows/test_scala_2_12_spark.yaml(1 hunks).github/workflows/test_scala_2_13_non_spark.yaml(1 hunks).github/workflows/test_scala_2_13_spark.yaml(1 hunks).github/workflows/test_scala_fmt.yaml(1 hunks).gitignore(2 hunks).plugin-versions(1 hunks).scalafix.conf(0 hunks).scalafmt.conf(1 hunks).tool-versions(1 hunks)AUTHORS(0 hunks)CONTRIBUTING.md(0 hunks)GOVERNANCE.md(0 hunks)LICENSE(0 hunks)README.md(1 hunks)WORKSPACE(1 hunks)aggregator/BUILD.bazel(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala(3 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala(5 hunks)aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/Resolution.scala(2 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala(1 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala(4 hunks)aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala(2 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxDistinctTest.scala(2 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxHistogramTest.scala(0 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala(3 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala(4 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala(1 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala(7 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/MinHeapTest.scala(1 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/MomentTest.scala(2 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala(1 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/RowAggregatorTest.scala(4 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala(3 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala(2 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/TwoStackLiteAggregatorTest.scala(5 hunks)aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala(2 hunks)airflow/constants.py(0 hunks)airflow/decorators.py(0 hunks)airflow/group_by_dag_constructor.py(0 hunks)airflow/helpers.py(0 hunks)airflow/join_dag_constructor.py(0 hunks)airflow/online_offline_consistency_dag_constructor.py(0 hunks)airflow/operators.py(0 hunks)airflow/readme.md(0 hunks)airflow/staging_query_dag_constructor.py(0 hunks)api/BUILD.bazel(1 hunks)api/py/ai/__init__.py(0 hunks)api/py/ai/chronon/__init__.py(0 hunks)api/py/ai/chronon/repo/run.py(0 hunks)api/py/ai/chronon/scheduler/adapters/airflow_adapter.py(0 hunks)api/py/ai/chronon/scheduler/interfaces/flow.py(0 hunks)api/py/ai/chronon/scheduler/interfaces/node.py(0 hunks)api/py/ai/chronon/scheduler/interfaces/orchestrator.py(0 hunks)api/py/example.py(0 hunks)api/py/requirements/base.in(0 hunks)api/py/requirements/base.txt(0 hunks)api/py/test/sample/group_bys/risk/merchant_data.py(0 hunks)api/py/test/sample/group_bys/risk/user_data.py(0 hunks)api/py/test/sample/joins/risk/user_transactions.py(0 hunks)api/py/test/sample/joins/sample_team/sample_chaining_join.py(0 hunks)api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_merchant(0 hunks)api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user(0 hunks)api/py/test/sample/production/joins/sample_team/sample_join_from_shorthand.v1(0 hunks)api/py/test/sample/production/models/quickstart/test.v1(0 hunks)api/py/test/sample/production/models/risk/transaction_model.v1(0 hunks)api/py/test/sample/sources/test_sources.py(0 hunks)api/py/test/sample/teams.json(0 hunks)api/py/test/scheduler/test_flow.py(0 hunks)api/py/test/test_join.py(0 hunks)api/py/test/test_run.py(0 hunks)api/py/tox.ini(0 hunks)api/python/README.md(1 hunks)api/python/ai/chronon/airflow_helpers.py(1 hunks)api/python/ai/chronon/cli/compile/compile_context.py(1 hunks)api/python/ai/chronon/cli/compile/compiler.py(1 hunks)api/python/ai/chronon/cli/compile/conf_validator.py(1 hunks)api/python/ai/chronon/cli/compile/display/class_tracker.py(1 hunks)api/python/ai/chronon/cli/compile/display/compile_status.py(1 hunks)api/python/ai/chronon/cli/compile/display/compiled_obj.py(1 hunks)api/python/ai/chronon/cli/compile/display/console.py(1 hunks)api/python/ai/chronon/cli/compile/display/diff_result.py(1 hunks)api/python/ai/chronon/cli/compile/fill_templates.py(1 hunks)api/python/ai/chronon/cli/compile/parse_configs.py(1 hunks)api/python/ai/chronon/cli/compile/parse_teams.py(1 hunks)api/python/ai/chronon/cli/compile/serializer.py(1 hunks)api/python/ai/chronon/cli/git_utils.py(1 hunks)api/python/ai/chronon/cli/logger.py(1 hunks)api/python/ai/chronon/cli/plan/controller_iface.py(1 hunks)api/python/ai/chronon/eval/__init__.py(1 hunks)api/python/ai/chronon/eval/query_parsing.py(1 hunks)api/python/ai/chronon/eval/sample_tables.py(1 hunks)api/python/ai/chronon/eval/table_scan.py(1 hunks)api/python/ai/chronon/group_by.py(14 hunks)api/python/ai/chronon/join.py(9 hunks)api/python/ai/chronon/model.py(1 hunks)api/python/ai/chronon/query.py(3 hunks)api/python/ai/chronon/repo/__init__.py(1 hunks)api/python/ai/chronon/repo/aws.py(1 hunks)api/python/ai/chronon/repo/compile.py(8 hunks)api/python/ai/chronon/repo/compilev2.py(1 hunks)api/python/ai/chronon/repo/compilev3.py(1 hunks)api/python/ai/chronon/repo/constants.py(1 hunks)api/python/ai/chronon/repo/default_runner.py(1 hunks)api/python/ai/chronon/repo/explore.py(6 hunks)api/python/ai/chronon/repo/extract_objects.py(2 hunks)api/python/ai/chronon/repo/gcp.py(1 hunks)api/python/ai/chronon/repo/hub_uploader.py(1 hunks)api/python/ai/chronon/repo/init.py(1 hunks)api/python/ai/chronon/repo/run.py(1 hunks)api/python/ai/chronon/repo/runner.py(1 hunks)api/python/ai/chronon/repo/serializer.py(4 hunks)api/python/ai/chronon/repo/team_json_utils.py(2 hunks)api/python/ai/chronon/repo/utils.py(1 hunks)api/python/ai/chronon/repo/validator.py(15 hunks)api/python/ai/chronon/repo/zipline.py(1 hunks)api/python/ai/chronon/resources/gcp/README.md(1 hunks)api/python/ai/chronon/resources/gcp/group_bys/test/data.py(1 hunks)api/python/ai/chronon/resources/gcp/joins/test/data.py(1 hunks)api/python/ai/chronon/resources/gcp/sources/test/data.py(1 hunks)api/python/ai/chronon/resources/gcp/teams.py(1 hunks)api/python/ai/chronon/resources/gcp/zipline-cli-install.sh(1 hunks)api/python/ai/chronon/source.py(1 hunks)api/python/ai/chronon/staging_query.py(1 hunks)api/python/ai/chronon/types.py(1 hunks)api/python/ai/chronon/utils.py(10 hunks)api/python/ai/chronon/windows.py(1 hunks)api/python/pyproject.toml(1 hunks)api/python/requirements/base.in(1 hunks)api/python/requirements/base.txt(1 hunks)api/python/requirements/dev.in(1 hunks)api/python/requirements/dev.txt(3 hunks)api/python/setup.py(3 hunks)api/python/test/canary/README.md(1 hunks)api/python/test/canary/deprecated_teams.json(1 hunks)api/python/test/canary/group_bys/aws/purchases.py(1 hunks)api/python/test/canary/group_bys/gcp/item_event_canary.py(1 hunks)api/python/test/canary/group_bys/gcp/purchases.py(1 hunks)api/python/test/canary/joins/gcp/training_set.py(1 hunks)api/python/test/canary/teams.py(1 hunks)api/python/test/conftest.py(1 hunks)api/python/test/sample/README.md(1 hunks)api/python/test/sample/aws/teams.json(1 hunks)api/python/test/sample/deprecated_teams.json(1 hunks)api/python/test/sample/group_bys/kaggle/clicks.py(2 hunks)api/python/test/sample/group_bys/kaggle/outbrain.py(3 hunks)api/python/test/sample/group_bys/quickstart/purchases.py(2 hunks)api/python/test/sample/group_bys/quickstart/returns.py(1 hunks)api/python/test/sample/group_bys/quickstart/schema.py(2 hunks)api/python/test/sample/group_bys/quickstart/users.py(2 hunks)api/python/test/sample/group_bys/risk/merchant_data.py(1 hunks)api/python/test/sample/group_bys/risk/transaction_events.py(2 hunks)api/python/test/sample/group_bys/risk/user_data.py(1 hunks)api/python/test/sample/group_bys/sample_team/chaining_group_by.py(1 hunks)api/python/test/sample/group_bys/sample_team/entity_sample_group_by_from_module.py(1 hunks)api/python/test/sample/group_bys/sample_team/event_sample_group_by.py(1 hunks)api/python/test/sample/group_bys/sample_team/group_by_with_kwargs.py(2 hunks)api/python/test/sample/group_bys/sample_team/label_part_group_by.py(1 hunks)api/python/test/sample/group_bys/sample_team/mutation_sample_group_by.py(1 hunks)api/python/test/sample/group_bys/sample_team/sample_chaining_group_by.py(2 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by.py(3 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by_from_join_part.py(1 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by_from_module.py(2 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by_group_by.py(1 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by_missing_input_column.py(1 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by_with_derivations.py(1 hunks)api/python/test/sample/group_bys/sample_team/sample_group_by_with_incorrect_derivations.py(1 hunks)api/python/test/sample/group_bys/sample_team/sample_non_prod_group_by.py(1 hunks)api/python/test/sample/joins/kaggle/outbrain.py(1 hunks)api/python/test/sample/joins/quickstart/training_set.py(1 hunks)api/python/test/sample/joins/risk/user_transactions.py(1 hunks)api/python/test/sample/joins/sample_team/sample_backfill_mutation_join.py(1 hunks)api/python/test/sample/joins/sample_team/sample_chaining_join.py(2 hunks)api/python/test/sample/joins/sample_team/sample_chaining_join_parent.py(1 hunks)api/python/test/sample/joins/sample_team/sample_join.py(2 hunks)api/python/test/sample/joins/sample_team/sample_join_bootstrap.py(3 hunks)api/python/test/sample/joins/sample_team/sample_join_derivation.py(1 hunks)api/python/test/sample/joins/sample_team/sample_join_external_parts.py(1 hunks)api/python/test/sample/joins/sample_team/sample_join_from_group_by_from_join.py(1 hunks)api/python/test/sample/joins/sample_team/sample_join_from_module.py(1 hunks)api/python/test/sample/joins/sample_team/sample_join_from_module_skipped.py(1 hunks)api/python/test/sample/joins/sample_team/sample_join_with_derivations_on_external_parts.py(1 hunks)api/python/test/sample/joins/sample_team/sample_label_join.py(1 hunks)api/python/test/sample/joins/sample_team/sample_label_join_with_agg.py(1 hunks)api/python/test/sample/joins/sample_team/sample_online_join.py(1 hunks)
💤 Files with no reviewable changes (40)
- api/py/ai/init.py
- airflow/readme.md
- api/py/tox.ini
- api/py/requirements/base.in
- AUTHORS
- api/py/test/sample/joins/sample_team/sample_chaining_join.py
- .scalafix.conf
- api/py/requirements/base.txt
- api/py/ai/chronon/scheduler/interfaces/node.py
- api/py/test/sample/production/models/quickstart/test.v1
- CONTRIBUTING.md
- api/py/test/sample/group_bys/risk/merchant_data.py
- airflow/decorators.py
- LICENSE
- api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_merchant
- api/py/test/sample/joins/risk/user_transactions.py
- aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxHistogramTest.scala
- api/py/ai/chronon/scheduler/interfaces/flow.py
- api/py/test/test_join.py
- airflow/staging_query_dag_constructor.py
- api/py/ai/chronon/scheduler/interfaces/orchestrator.py
- airflow/constants.py
- api/py/test/sample/group_bys/risk/user_data.py
- api/py/ai/chronon/init.py
- api/py/test/sample/production/group_bys/risk/transaction_events.txn_group_by_user
- api/py/test/sample/production/joins/sample_team/sample_join_from_shorthand.v1
- airflow/online_offline_consistency_dag_constructor.py
- api/py/test/scheduler/test_flow.py
- airflow/join_dag_constructor.py
- airflow/group_by_dag_constructor.py
- airflow/helpers.py
- api/py/test/sample/teams.json
- api/py/ai/chronon/scheduler/adapters/airflow_adapter.py
- api/py/test/sample/production/models/risk/transaction_model.v1
- api/py/example.py
- api/py/test/test_run.py
- api/py/test/sample/sources/test_sources.py
- api/py/ai/chronon/repo/run.py
- airflow/operators.py
- GOVERNANCE.md
🧰 Additional context used
🧬 Code Graph Analysis (47)
aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala (1)
aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala (1)
EditDistance(19-112)
api/python/test/sample/group_bys/sample_team/chaining_group_by.py (3)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/source.py (1)
JoinSource(74-88)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/test/sample/group_bys/kaggle/outbrain.py (2)
api/python/test/sample/sources/kaggle/outbrain.py (1)
outbrain_left_events(28-40)api/python/ai/chronon/group_by.py (2)
Accuracy(56-57)Operation(60-146)
aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala (1)
api/src/main/scala/ai/chronon/api/ScalaJavaConversions.scala (1)
ScalaJavaConversions(6-97)
api/python/ai/chronon/eval/query_parsing.py (3)
online/src/main/scala/ai/chronon/online/connectors/Catalog.scala (1)
Table(8-13)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DelegatingBigQueryMetastoreCatalog.scala (1)
name(172-172)api/python/ai/chronon/eval/table_scan.py (1)
table_name(36-37)
api/python/ai/chronon/resources/gcp/sources/test/data.py (2)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/python/ai/chronon/query.py (1)
selects(103-126)
aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (2)
api/src/main/scala/ai/chronon/api/Row.scala (1)
Row(72-126)api/src/main/scala/ai/chronon/api/TsUtils.scala (1)
TsUtils(23-42)
api/python/test/sample/joins/kaggle/outbrain.py (1)
api/python/test/sample/sources/kaggle/outbrain.py (1)
outbrain_left_events(28-40)
aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala (2)
api/src/main/scala/ai/chronon/api/Row.scala (1)
Row(72-126)api/src/main/scala/ai/chronon/api/TsUtils.scala (1)
TsUtils(23-42)
api/python/test/sample/joins/quickstart/training_set.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/src/main/scala/ai/chronon/api/Builders.scala (1)
Source(106-140)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/test/sample/group_bys/sample_team/mutation_sample_group_by.py (1)
api/python/ai/chronon/group_by.py (1)
Accuracy(56-57)
aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (3)
api/src/main/scala/ai/chronon/api/Builders.scala (1)
AggregationPart(68-85)api/src/main/scala/ai/chronon/api/Row.scala (1)
Row(72-126)api/src/main/scala/ai/chronon/api/TsUtils.scala (1)
TsUtils(23-42)
api/python/test/sample/group_bys/sample_team/sample_group_by_from_module.py (1)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)
api/python/test/sample/joins/sample_team/sample_join_bootstrap.py (2)
api/python/ai/chronon/query.py (1)
selects(103-126)api/python/ai/chronon/utils.py (1)
get_join_output_table_name(317-332)
api/python/test/sample/group_bys/sample_team/sample_group_by.py (1)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)
api/python/test/sample/group_bys/risk/user_data.py (4)
api/python/ai/chronon/source.py (1)
EntitySource(38-71)api/src/main/scala/ai/chronon/api/Builders.scala (1)
Source(106-140)api/src/main/scala/ai/chronon/api/Extensions.scala (1)
query(382-390)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/ai/chronon/resources/gcp/teams.py (1)
api/python/ai/chronon/repo/constants.py (1)
RunMode(4-30)
api/python/test/sample/group_bys/sample_team/sample_chaining_group_by.py (3)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/source.py (1)
JoinSource(74-88)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/ai/chronon/resources/gcp/group_bys/test/data.py (1)
api/python/ai/chronon/group_by.py (3)
Operation(60-146)TimeUnit(178-180)Window(244-245)
api/python/test/sample/group_bys/quickstart/purchases.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/test/sample/group_bys/quickstart/schema.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/test/sample/group_bys/sample_team/sample_non_prod_group_by.py (1)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)
api/python/ai/chronon/eval/sample_tables.py (2)
api/python/ai/chronon/eval/__init__.py (1)
eval(17-42)api/python/ai/chronon/eval/table_scan.py (2)
output_path(30-31)raw_scan_query(52-58)
api/python/test/sample/group_bys/quickstart/returns.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/test/sample/joins/risk/user_transactions.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/src/main/scala/ai/chronon/api/Builders.scala (1)
Source(106-140)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/ai/chronon/resources/gcp/zipline-cli-install.sh (1)
scripts/distribution/build_and_upload_artifacts.sh (1)
print_usage(3-12)
api/python/test/sample/joins/sample_team/sample_label_join.py (1)
api/python/ai/chronon/join.py (1)
LabelParts(252-287)
api/python/test/sample/group_bys/risk/transaction_events.py (3)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/ai/chronon/repo/explore.py (1)
api/python/ai/chronon/cli/compile/parse_teams.py (1)
load_teams(42-69)
api/python/ai/chronon/windows.py (1)
api/python/ai/chronon/group_by.py (2)
Window(244-245)TimeUnit(178-180)
api/python/test/sample/group_bys/kaggle/clicks.py (4)
api/python/ai/chronon/source.py (1)
EventSource(8-35)api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/python/ai/chronon/query.py (1)
selects(103-126)api/python/ai/chronon/utils.py (1)
get_staging_query_output_table_name(304-309)
api/python/test/sample/joins/sample_team/sample_label_join_with_agg.py (1)
api/python/ai/chronon/join.py (1)
LabelParts(252-287)
api/python/test/sample/group_bys/risk/merchant_data.py (3)
api/python/ai/chronon/source.py (1)
EntitySource(38-71)api/src/main/scala/ai/chronon/api/Builders.scala (1)
Source(106-140)api/python/ai/chronon/query.py (1)
selects(103-126)
api/python/test/sample/joins/sample_team/sample_join.py (2)
api/python/ai/chronon/repo/constants.py (1)
RunMode(4-30)api/python/ai/chronon/join.py (1)
LabelParts(252-287)
api/python/ai/chronon/eval/__init__.py (3)
api/python/ai/chronon/eval/query_parsing.py (1)
get_tables_from_query(4-19)api/python/ai/chronon/eval/sample_tables.py (2)
sample_tables(20-24)sample_with_query(7-17)api/python/ai/chronon/eval/table_scan.py (10)
TableScan(23-86)clean_table_name(12-13)table_scans_in_group_by(151-155)table_scans_in_join(158-186)table_scans_in_source(113-139)table_name(36-37)output_path(30-31)raw_scan_query(52-58)view_name(33-34)scan_query(60-86)
api/python/test/sample/group_bys/sample_team/group_by_with_kwargs.py (1)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)
api/python/ai/chronon/cli/compile/display/compile_status.py (2)
api/python/ai/chronon/cli/compile/display/class_tracker.py (7)
ClassTracker(10-107)add(26-42)add_existing(23-24)close(68-71)to_status(73-93)to_errors(95-103)diff(106-107)api/python/ai/chronon/cli/compile/display/compiled_obj.py (1)
CompiledObj(6-12)
api/python/test/sample/group_bys/sample_team/event_sample_group_by.py (1)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)
aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (3)
api/python/ai/chronon/group_by.py (1)
Operation(60-146)api/src/main/scala/ai/chronon/api/Extensions.scala (1)
getInt(215-223)api/src/main/scala/ai/chronon/api/DataType.scala (6)
IntType(138-138)LongType(140-140)ShortType(146-146)DoubleType(142-142)FloatType(144-144)StringType(152-152)
api/python/test/canary/teams.py (1)
api/python/ai/chronon/repo/constants.py (1)
RunMode(4-30)
api/python/ai/chronon/types.py (7)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)
query(382-390)api/python/ai/chronon/query.py (1)
selects(103-126)api/src/main/scala/ai/chronon/api/Builders.scala (2)
Source(106-140)MetaData(261-315)api/python/ai/chronon/source.py (3)
EventSource(8-35)EntitySource(38-71)JoinSource(74-88)api/python/ai/chronon/group_by.py (5)
Operation(60-146)Window(244-245)TimeUnit(178-180)DefaultAggregation(157-175)Accuracy(56-57)api/python/ai/chronon/join.py (1)
LabelParts(252-287)api/python/ai/chronon/staging_query.py (1)
TableDependency(18-21)
api/python/ai/chronon/repo/aws.py (3)
api/python/ai/chronon/repo/default_runner.py (3)
Runner(19-284)run(173-247)_gen_final_args(249-284)api/python/ai/chronon/repo/utils.py (5)
JobType(20-22)check_call(66-68)extract_filename_from_path(62-63)get_customer_id(58-59)split_date_range(442-467)api/python/ai/chronon/repo/gcp.py (1)
run(363-563)
api/python/ai/chronon/source.py (2)
api/src/main/scala/ai/chronon/api/Extensions.scala (3)
query(382-390)topic(453-463)isCumulative(447-451)api/src/main/scala/ai/chronon/api/Builders.scala (2)
Source(106-140)joinSource(132-139)
api/python/ai/chronon/airflow_helpers.py (3)
api/python/ai/chronon/group_by.py (1)
GroupBy(429-674)api/python/ai/chronon/join.py (1)
Join(355-554)api/python/ai/chronon/utils.py (1)
get_query(135-136)
aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (2)
api/src/main/scala/ai/chronon/api/Constants.scala (1)
Constants(23-100)api/src/main/scala/ai/chronon/api/DataType.scala (2)
FloatType(144-144)LongType(140-140)
api/python/ai/chronon/repo/compilev2.py (5)
api/python/ai/chronon/repo/serializer.py (1)
thrift_simple_json_protected(127-141)api/python/ai/chronon/repo/validator.py (1)
ChrononRepoValidator(195-485)api/python/ai/chronon/repo/extract_objects.py (1)
from_folderV2(44-65)api/python/ai/chronon/repo/team_json_utils.py (1)
get_team_conf(38-46)api/python/ai/chronon/utils.py (2)
log_table_name(300-301)output_table_name(241-248)
api/python/ai/chronon/cli/git_utils.py (2)
api/python/ai/chronon/repo/utils.py (1)
check_output(71-73)api/python/ai/chronon/repo/runner.py (1)
info(189-205)
🪛 markdownlint-cli2 (0.17.2)
api/python/README.md
134-134: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
api/python/ai/chronon/resources/gcp/README.md
98-98: Trailing punctuation in heading
Punctuation: '.'
(MD026, no-trailing-punctuation)
111-111: Trailing punctuation in heading
Punctuation: '.'
(MD026, no-trailing-punctuation)
122-122: Trailing punctuation in heading
Punctuation: '.'
(MD026, no-trailing-punctuation)
140-140: Trailing punctuation in heading
Punctuation: '.'
(MD026, no-trailing-punctuation)
🪛 LanguageTool
README.md
[style] ~15-~15: ‘on a regular basis’ might be wordy. Consider a shorter alternative.
Context: ...on are picked and merged into this repo on a regular basis, and improvements made to this reposito...
(EN_WORDINESS_PREMIUM_ON_A_REGULAR_BASIS)
api/python/requirements/base.txt
[duplication] ~31-~31: Possible typo: you repeated a word.
Context: ...irements/base.in face==24.0.0 # via glom glom==24.11.0 # via -r requirements/base...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~48-~48: Possible typo: you repeated a word.
Context: ...e.in google-cloud-core==2.4.3 # via google-cloud-storage google-cloud-storage==2.19.0 # via -r requirements/base....
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~54-~54: Possible typo: you repeated a word.
Context: ...ia # google-cloud-storage # google-resumable-media google-resumable-media==2.7.2 # via google-cloud-storage g...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~64-~64: Possible typo: you repeated a word.
Context: ... # via # google-api-core # grpcio-status grpcio-status==1.71.0 # via google-api-core idna=...
(ENGLISH_WORD_REPEAT_RULE)
api/python/requirements/dev.txt
[duplication] ~63-~63: Possible typo: you repeated a word.
Context: ...1 # via tox pytest==8.3.5 # via pytest-cov pytest-cov==6.1.1 # via -r requirements/dev.in...
(ENGLISH_WORD_REPEAT_RULE)
🪛 YAMLlint (1.35.1)
.github/workflows/test_scala_fmt.yaml
[error] 50-50: no new line character at the end of file
(new-line-at-end-of-file)
.github/release.yml
[error] 9-9: trailing spaces
(trailing-spaces)
[error] 17-17: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/build_and_push_docker.yaml
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/test_python.yaml
[error] 71-71: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/test_scala_2_12_spark.yaml
[error] 201-201: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/push_to_platform.yaml
[error] 34-34: trailing spaces
(trailing-spaces)
[error] 37-37: trailing spaces
(trailing-spaces)
[error] 41-41: trailing spaces
(trailing-spaces)
[error] 49-49: trailing spaces
(trailing-spaces)
[error] 52-52: trailing spaces
(trailing-spaces)
[error] 57-57: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/test_scala_2_13_spark.yaml
[error] 209-209: no new line character at the end of file
(new-line-at-end-of-file)
.github/workflows/push_to_canary.yaml
[error] 79-79: trailing spaces
(trailing-spaces)
[error] 81-81: trailing spaces
(trailing-spaces)
[error] 102-102: trailing spaces
(trailing-spaces)
[error] 105-105: trailing spaces
(trailing-spaces)
[error] 108-108: trailing spaces
(trailing-spaces)
[warning] 287-287: too many spaces after colon
(colons)
[warning] 336-336: too many spaces after colon
(colons)
🪛 actionlint (1.7.4)
.github/workflows/build_and_push_docker.yaml
24-24: the runner of "docker/login-action@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
.github/workflows/test_scala_2_12_spark.yaml
23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
48-48: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
74-74: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
100-100: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
126-126: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
152-152: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
178-178: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
.github/workflows/test_scala_2_13_spark.yaml
23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
50-50: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
77-77: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
104-104: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
131-131: label "ubuntu_32_core_128gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
158-158: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
185-185: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
.github/workflows/push_to_canary.yaml
58-58: shellcheck reported issue in this script: SC2086:info:9:20: Double quote to prevent globbing and word splitting
(shellcheck)
74-74: shellcheck reported issue in this script: SC2086:info:13:46: Double quote to prevent globbing and word splitting
(shellcheck)
74-74: shellcheck reported issue in this script: SC2086:info:14:38: Double quote to prevent globbing and word splitting
(shellcheck)
503-503: shellcheck reported issue in this script: SC2086:info:2:36: Double quote to prevent globbing and word splitting
(shellcheck)
503-503: shellcheck reported issue in this script: SC2086:info:4:16: Double quote to prevent globbing and word splitting
(shellcheck)
503-503: shellcheck reported issue in this script: SC2086:info:8:19: Double quote to prevent globbing and word splitting
(shellcheck)
503-503: shellcheck reported issue in this script: SC2086:info:10:17: Double quote to prevent globbing and word splitting
(shellcheck)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: groupby_tests
- GitHub Check: spark_tests
- GitHub Check: batch_tests
- GitHub Check: spark_tests
- GitHub Check: batch_tests
🔇 Additional comments (300)
api/python/requirements/base.txt (1)
1-126: Autogenerated pinned deps
Approved; ensures reproducible builds.🧰 Tools
🪛 LanguageTool
[duplication] ~31-~31: Possible typo: you repeated a word.
Context: ...irements/base.in face==24.0.0 # via glom glom==24.11.0 # via -r requirements/base...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~48-~48: Possible typo: you repeated a word.
Context: ...e.in google-cloud-core==2.4.3 # via google-cloud-storage google-cloud-storage==2.19.0 # via -r requirements/base....(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~54-~54: Possible typo: you repeated a word.
Context: ...ia # google-cloud-storage # google-resumable-media google-resumable-media==2.7.2 # via google-cloud-storage g...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~64-~64: Possible typo: you repeated a word.
Context: ... # via # google-api-core # grpcio-status grpcio-status==1.71.0 # via google-api-core idna=...(ENGLISH_WORD_REPEAT_RULE)
api/python/requirements/base.in (1)
1-12: Remove lingering refs
Ensure CI, docs, and scripts no longer referencebase.in.api/python/README.md (1)
137-137: Path reference updated correctly.Updated directory reference from
api/pytoapi/pythonin pre-commit setup instructions.aggregator/src/test/scala/ai/chronon/aggregator/test/DataGen.scala (8)
68-71: Formatting improvement.Method signature formatting improved for readability.
97-102: Improved chain formatting.Chain of method calls reformatted for better readability.
166-168: Simplified API by removing redundant parameter.Removed partitionColumn parameter since it can be derived from partitionSpec.
175-176: Local derivation of partitionColumn.Now deriving partitionColumn locally instead of passing as parameter.
185-185: Alignment fix.Fixed alignment with other case patterns.
192-192: Updated method call.Removed redundant partitionColumn parameter from genImpl call.
194-196: Updated method calls.Removed redundant partitionColumn parameter from genImpl calls.
201-202: Simplified method signature.Simplified gen method to only take partitionSpec.
api/python/test/sample/joins/sample_team/sample_backfill_mutation_join.py (1)
20-22: Import reordering.Imports reordered for better organization.
api/python/test/sample/joins/sample_team/sample_join_from_module_skipped.py (1)
20-20: Reordered import statement.Import reordering is consistent with style conventions.
api/python/ai/chronon/model.py (1)
3-4: Import reordering looks good.Added spacing improves readability.
aggregator/src/main/scala/ai/chronon/aggregator/base/MinHeap.scala (1)
77-77: Comment formatting fixed.Added space after comment slashes for consistency.
api/python/test/conftest.py (1)
18-19: Import reordering looks good.Added blank line improves readability.
aggregator/src/main/scala/ai/chronon/aggregator/stats/EditDistance.scala (1)
17-17: Package declaration updated to reflect new structure.Package changed from
spark.statstoaggregator.stats, aligning with module restructuring..bazeliskrc (1)
1-1: Bazel version pinned for reproducible builds.Using Bazel 6.4.0 ensures consistent build environment across development machines.
.bazelignore (1)
1-2: Appropriate exclusion of Git directory.Excluding
.gitfrom Bazel scanning improves build performance.api/python/test/sample/joins/sample_team/sample_join_external_parts.py (1)
22-22: Import statements consolidated.Multiple imports consolidated into single line, improving code readability.
api/python/requirements/dev.in (1)
8-9: Additions of zipp and importlib-metadata packagesThese packages support Python package metadata handling, commonly used together.
api/python/ai/chronon/repo/team_json_utils.py (2)
1-1: Docstring formatting improvedSingle-line docstring is more concise.
20-20: Consistent string quoting styleChanged from single to double quotes for consistency.
api/python/test/sample/group_bys/sample_team/sample_group_by_with_derivations.py (1)
17-17: Import consolidationSimplified multiple imports into a single line.
aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala (2)
75-75: Comment formatting fixAdded space after comment slashes for readability.
95-95: Comment formatting fixAdded space after comment slashes for consistency.
aggregator/src/test/scala/ai/chronon/aggregator/test/NaiveAggregator.scala (1)
21-21: Import consolidation looks good.Importing multiple elements from the same package is cleaner.
api/python/test/sample/group_bys/sample_team/sample_group_by_with_incorrect_derivations.py (1)
18-18: Clean import statement.Single-line import improves readability.
api/python/ai/chronon/cli/compile/display/console.py (1)
1-3: Good console setup.Simple and effective Rich console initialization for formatted output.
api/python/test/sample/group_bys/sample_team/sample_group_by_missing_input_column.py (3)
17-17: Proper spacing after imports.Blank line improves code readability.
20-20: Good import reordering.Alphabetical order is more maintainable.
28-28: Helpful test case documentation.Comment clarifies the intentional error case being tested.
.scalafmt.conf (2)
1-2: Version upgrade with dialect specification.Version updated and dialect explicitly set to Scala 2.12.
7-8: Docstring wrapping disabled.Formatting preference added to prevent docstring wrapping.
api/python/test/canary/README.md (1)
1-7: Minimal documentation added.Basic structure with cloud-specific links established.
.tool-versions (1)
1-7: Tool versions pinned.Development environment dependencies properly specified for consistent builds.
aggregator/src/main/scala/ai/chronon/aggregator/row/MapColumnAggregator.scala (1)
21-21: Import update looks goodReplaced Scala's utility with project-specific conversion that handles nulls gracefully.
.plugin-versions (1)
1-6: Pinned plugin versions as expectedVersion pinning ensures consistent development environment.
api/python/test/sample/joins/kaggle/outbrain.py (1)
17-21: Import cleanup looks goodSimplified and consolidated imports while maintaining functionality.
aggregator/src/main/scala/ai/chronon/aggregator/windowing/HopsAggregator.scala (1)
21-21: Import consolidation looks goodGrouped related imports from same package for better readability.
api/python/test/sample/group_bys/sample_team/entity_sample_group_by_from_module.py (2)
21-21: Clean import consolidation.Imports now use
ai.chronon.typesmodule for all type definitions.
28-28: Improved window readability.Window format changed from explicit object to cleaner string notation.
aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothAggregator.scala (1)
20-20: Consolidated imports.Import statement now concisely groups all imports from
ai.chronon.api.api/python/test/sample/group_bys/sample_team/sample_group_by_from_module.py (2)
21-25: Consolidated type imports.Imports now consistently use
ai.chronon.typesfor all type-related classes.
35-35: Simplified window syntax.Window format changed from explicit object to cleaner string notation
"7d".api/python/test/sample/group_bys/sample_team/mutation_sample_group_by.py (1)
21-26: Organized imports.Import statement now consistently groups all type-related classes from
ai.chronon.types..github/pull_request_template.md (1)
1-8: Looks good.Clean and simple PR template.
api/python/test/sample/joins/sample_team/sample_join_from_group_by_from_join.py (2)
15-17: Import ordering improvement.Cleaner import structure.
23-27:✅ Verification successful
Verify version change.
Changed from using
v1tov2for group_by reference. Confirm this is intentional.
🏁 Script executed:
#!/bin/bash # Check if sample_group_by_from_join_part.v2 exists and compare with v1 grep -rn "v2 = " group_bys/sample_team/sample_group_by_from_join_part.pyLength of output: 163
🏁 Script executed:
#!/bin/bash # locate group_by part file file=$(fd sample_group_by_from_join_part.py) echo "file: $file" # list v1 and v2 defs grep -nE 'v1\s*=|v2\s*=' "$file"Length of output: 427
Version change verified:
sample_group_by_from_join_part.pyonly definesv2(at line 26), andv1is absent—referencingv2is intentional.aggregator/src/main/scala/ai/chronon/aggregator/windowing/TwoStackLiteAggregator.scala (2)
47-53: Improved lambda syntax.More concise and readable code structure.
164-217: Better documentation format.Converted to proper ScalaDoc style. Content preserved.
api/python/test/sample/group_bys/sample_team/sample_non_prod_group_by.py (3)
15-21: Simplified imports.Consolidated imports from ai.chronon.types.
26-29: Improved window syntax.Using string shorthand "7d" is cleaner than Window object.
30-33: Added trailing comma.Good practice for cleaner diffs when parameters are added.
.github/release.yml (1)
1-17: Solid changelog configuration.Clean setup for semantic versioning labels.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 9-9: trailing spaces
(trailing-spaces)
[error] 17-17: no new line character at the end of file
(new-line-at-end-of-file)
api/python/test/sample/group_bys/sample_team/sample_group_by_group_by.py (1)
17-20: Import reorganization looks good.Cleaner import structure.
.github/ISSUE_TEMPLATE/feature_request.md (1)
1-20: Well-structured feature request template.Standard GitHub template with clear sections.
aggregator/src/main/scala/ai/chronon/aggregator/row/RowAggregator.scala (1)
39-63: Good lambda refactoring.Improved pattern matching syntax while maintaining functionality.
api/python/test/sample/joins/sample_team/sample_join_derivation.py (1)
21-21: Import changes follow best practices.Reordering imports alphabetically and grouping by source improves readability.
Also applies to: 23-23, 25-25
api/python/test/sample/group_bys/sample_team/group_by_with_kwargs.py (2)
21-25: Consolidated imports enhance maintainability.Moving all imports to
ai.chronon.typessimplifies the import structure.
36-36: Simplified window specification improves usability.String-based window specification
"7d"replaces Window object, aligning with API simplification goals.aggregator/src/test/scala/ai/chronon/aggregator/test/MinHeapTest.scala (1)
21-21: Test framework migration improves consistency.Migrating from JUnit to ScalaTest's AnyFlatSpec aligns with project's test standardization while preserving test logic.
Also applies to: 26-27
api/python/test/sample/group_bys/quickstart/users.py (3)
15-15: Import changes align with API updates.Import order updated and
selectsreplacesselect.Also applies to: 19-19
26-35: Correctly migrated toselectsfunction.Updated to use the new
selectsfunction with proper formatting.
37-42: Added trailing commas for consistency.Formatting improvements with trailing commas.
.github/workflows/require_triggered_status_checks.yaml (1)
1-14: Good branch protection workflow.Correctly configured to enforce required status checks before allowing pushes.
api/python/ai/chronon/cli/compile/display/compiled_obj.py (1)
1-12: Well-structured CompiledObj dataclass.Clean implementation with proper type hints.
api/python/test/sample/joins/quickstart/training_set.py (4)
19-22: Import statements properly reorganized.Imports are now better grouped and organized.
27-37: Source definition correctly updated.Migrated to
selectsfunction with improved formatting.
39-44: Join constructor nicely reformatted.Better multi-line style with clear comments.
46-52: Second join constructor properly formatted.Consistent with the first join's style improvements.
api/python/test/sample/joins/sample_team/sample_join_with_derivations_on_external_parts.py (3)
19-22: Import reordering looks good.The reorganization of imports is clean and follows a logical grouping.
23-23: Import moved to improve organization.Moving test_sources import after group_bys imports follows a better organizational pattern.
25-33: Import reordering for better organization.The reordering groups related concepts together - contextual elements, data types, and operations.
api/python/test/sample/group_bys/quickstart/schema.py (4)
1-3: Updates API imports correctly.Properly updates imports to use
selectsinstead of deprecatedselect.
9-12: Updates API usage and improves formatting.Correctly changes
selecttoselectsand adds trailing comma for better git diffs.
22-22: Simplified aggregation formatting.Converted multi-line list to more concise single-line format.
24-24: Added trailing comma for consistency.Improves maintainability by enabling cleaner diffs in future changes.
.bazelproject (1)
1-22: Good Bazel workspace configuration.Properly configures the project for Bazel builds with appropriate language support for Python, Scala, TypeScript, and Java.
api/python/test/sample/group_bys/sample_team/sample_group_by.py (4)
17-17: Imports organized on a single line.Consolidated related imports for better readability.
29-30: Added trailing comma for consistency.Improves git diff readability for future changes.
32-32: Added online=True parameter.Explicitly enables online processing mode.
46-48: Improved derivations formatting.Converted to more concise inline format with proper trailing commas.
aggregator/src/test/scala/ai/chronon/aggregator/test/EditDistanceTest.scala (1)
17-23: Framework migration looks good.Clean migration from JUnit to ScalaTest with proper imports and class inheritance.
api/python/test/sample/joins/sample_team/sample_join_bootstrap.py (3)
26-27: Import consolidation looks good.Properly consolidated imports from ai.chronon submodules.
32-44: String literal style change is consistent.Good conversion from single to double quotes for dictionary keys.
59-63: API update is correct.Properly updated to use
selectsinstead ofselectfunction.api/python/test/sample/aws/teams.json (2)
7-7: Placeholder needs attention.
<CUSTOMER_ID>placeholder aligns with PR objective to remove customer names.
1-16:✅ Verification successful
New configuration file looks good.
Clean JSON structure with appropriate defaults for AWS environment.
🏁 Script executed:
#!/bin/bash # Validate JSON syntax jq '.' api/python/test/sample/aws/teams.json > /dev/null echo $?Length of output: 55
Configuration file approved
Syntax validation passed (jq .exit code 0). JSON is clean with correct AWS defaults. Ready to merge.api/python/test/sample/group_bys/sample_team/sample_chaining_group_by.py (6)
21-23: Import restructuring looks good.Clean import organization.
25-35: Simplified imports from ai.chronon.types.Consolidated imports improve maintainability.
42-43: Standardized string quotes in key_mapping.Consistent double quote usage.
Also applies to: 46-47
50-51: Added check_consistency flag.Important safety feature.
54-66: Updated sources formatting and API usage.Proper list format for sources and using selects instead of select.
76-77: Added trailing comma.Style consistency improvement.
.github/ISSUE_TEMPLATE/bug_report.md (1)
1-38: Standard GitHub bug report template.Template follows GitHub best practices with clear sections for bug description, reproduction steps, expected behavior, and environment details.
api/python/test/sample/group_bys/sample_team/chaining_group_by.py (2)
1-4: Clean imports.Well-organized imports.
5-29: Well-structured GroupBy definition.GroupBy correctly uses JoinSource with appropriate configuration.
api/python/test/sample/group_bys/sample_team/sample_group_by_from_join_part.py (4)
15-17: Reordered import.Import restructuring improves readability.
18-24: Consolidated imports from ai.chronon.types.Better module organization.
26-29: Renamed variable and improved formatting.Renamed from v1 to v2 with better multi-line formatting.
35-36: Added trailing comma.Consistent style.
api/python/ai/chronon/resources/gcp/sources/test/data.py (4)
1-3: Clean imports.Good structure of imports with clear separation.
4-9: Clear documentation.Docstring explains purpose concisely.
14-21: Well-structured Source definition.Source configuration is clear with good inline comments explaining each component.
23-24: Helpful usage note.Good indication of how the object can be used.
api/python/test/sample/joins/sample_team/sample_join_from_module.py (4)
19-22: Import reordering.Imports are now better organized.
25-25: Cleaner import structure.Clean single-line import format.
28-28: Consistent spacing.Removed extra spaces around the equals sign.
32-33: Standardized string quotes.Switched from single to double quotes for consistency and added trailing comma for better git diffs.
Also applies to: 36-37
api/python/ai/chronon/resources/gcp/group_bys/test/data.py (3)
2-4: Clean imports.Good separation of imports with appropriate blank line.
6-6: Concise window definition.Good use of list comprehension with clear inline comment.
8-33: Well-structured GroupBy configuration.Clear organization with good comments explaining each aggregation type.
aggregator/src/main/scala/ai/chronon/aggregator/row/StatsGenerator.scala (8)
21-21: Updated import for ScalaJavaConversions.Standardized Scala-Java conversion utility.
28-35: Improved comment formatting.Better spacing in ScalaDoc comment.
47-54: Fixed comment formatting.Improved ScalaDoc comment structure.
61-62: Consistent ScalaDoc style.Fixed spacing in comment.
69-71: Improved comment formatting.Better spacing in ScalaDoc.
114-124: Cleaner pattern matching.More idiomatic Scala with case expressions.
145-153: Fixed comment formatting.Better spacing in documentation.
160-165: More readable method chaining.Better line breaks and indentation in method calls.
api/python/test/sample/group_bys/risk/user_data.py (1)
1-28: Clean implementation of user data GroupByGood structure with clear documentation and well-organized fields selection for user data.
api/python/test/sample/group_bys/quickstart/purchases.py (4)
15-17: Import refactoring looks goodUpdated imports align with new API conventions.
24-35: Source definition update looks goodChanged
selectsusage and added bucket_rand column.
37-37: Window format simplificationGood conversion to string-based window format.
43-66: GroupBy aggregation updates look goodAdded LAST_K operations with appropriate configurations.
api/python/ai/chronon/repo/__init__.py (1)
15-32: Good mapping structure for folder names to classesClean implementation of class mapping for the new compilation framework.
api/python/test/sample/group_bys/kaggle/outbrain.py (3)
15-22: Import cleanup looks goodRemoved unnecessary imports.
43-53: Simplified window formatGood conversion to string-based window format for aggregations.
79-81: Improved formattingBetter multi-line formatting for readability.
aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothAggregatorTest.scala (4)
26-26: Appropriate framework migrationScalaTest import replaces JUnit.
49-49: Good modernizationUpdated to extend AnyFlatSpec instead of TestCase.
51-51: Improved test styleConverted to ScalaTest's BDD style.
122-122: Consistent test styleMatches ScalaTest convention used throughout file.
aggregator/src/test/scala/ai/chronon/aggregator/test/VarianceTest.scala (3)
21-21: Framework modernizationScalaTest import added.
25-25: Modern test base classAnyFlatSpec is appropriate for this test.
63-63: Better test declaration styleConverted to BDD "it should" format.
api/python/test/canary/joins/gcp/training_set.py (1)
1-36: Clean training set join definitionJoin setup correctly references GCP group_bys. Well-documented.
aggregator/src/test/scala/ai/chronon/aggregator/test/RowAggregatorTest.scala (7)
22-22: Framework migrationScalaTest import added.
51-52: Modern test structureUpdated to AnyFlatSpec with BDD style.
88-88: Helpful comment addedExplanation clarifies calculation.
118-121: Cleaner pattern matchingImproved lambda syntax.
123-126: Consistent pattern matchingMatches style used elsewhere.
132-135: Cleaner lambda syntaxPattern matching improves readability.
138-140: Simplified lambdaCleaner pattern matching syntax.
api/python/requirements/dev.txt (1)
1-77: Dependency updates look good.Version upgrades maintain compatibility.
🧰 Tools
🪛 LanguageTool
[duplication] ~63-~63: Possible typo: you repeated a word.
Context: ...1 # via tox pytest==8.3.5 # via pytest-cov pytest-cov==6.1.1 # via -r requirements/dev.in...(ENGLISH_WORD_REPEAT_RULE)
api/python/ai/chronon/cli/compile/display/diff_result.py (1)
1-46: Well-structured diff display.Clean implementation of colored diff output.
api/python/ai/chronon/resources/gcp/joins/test/data.py (1)
11-21: Good implementation with clear comments.The source definition is well-structured with helpful inline comments.
.gitignore (4)
14-32: Good updates for new project structure.Directory path updates reflect migration from
api/py/toapi/python/.
39-44: Path updates aligned with directory restructuring.Python distribution patterns updated properly.
84-86: Good addition for frontend generated files.Properly excludes Thrift-generated types from version control.
103-114: Appropriate Bazel-related exclusions.Correctly excludes Bazel temporary files and local configuration.
.bazelrc (4)
1-2: Good practice for local overrides.Allows developers to customize build settings locally.
3-14: Well-structured Scala version configuration.Clear documentation and configuration options for different Scala versions.
21-31: Good platform-specific configurations.Properly handles compiler warning suppressions for different platforms.
15-17:✅ Verification successful
Consider securing the remote cache URL.
Hardcoded GCS URL could expose sensitive infrastructure details.
Consider moving the remote cache URL to the
.bazelrc.localfile or using an environment variable.
🏁 Script executed:
#!/bin/bash # Check if this URL is publicly accessible or contains sensitive information curl -I https://storage.googleapis.com/zipline-bazel-cache 2>/dev/null | head -n 1Length of output: 84
Remote cache URL is private (HTTP 403)
The GCS endpoint rejects anonymous requests, so the bucket isn’t publicly accessible. No change required.
Optional: if you’d like to abstract infra details, you can move the URL to.bazelrc.localor reference an environment variable instead.aggregator/src/test/scala/ai/chronon/aggregator/test/SawtoothOnlineAggregatorTest.scala (2)
27-27: Successful migration to ScalaTest.Clean conversion from JUnit to ScalaTest's AnyFlatSpec style.
Also applies to: 34-34, 36-36
52-123: Improved formatting consistency.Parameter alignment in Builders.Aggregation calls is now more consistent.
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxPercentilesTest.scala (4)
23-23: Appropriate import for ScalaTest migrationClean import for AnyFlatSpec.
29-29: Good migration to ScalaTestProper inheritance from AnyFlatSpec instead of TestCase.
59-65: Well-formatted ScalaTest syntaxClean conversion from JUnit to ScalaTest style using "it should".
77-95: Clean test migrationGood conversion of "testPSIDrifts" to ScalaTest format.
api/python/ai/chronon/repo/serializer.py (5)
18-25: Well-organized importsBetter organization of Thrift-related imports.
31-41: Improved code formattingBetter indentation in ThriftJSONDecoder methods.
62-75: Better readability for complex dictionary comprehensionMulti-line formatting improves readability.
95-102: Useful new utility functionGood addition of json2binary for Thrift serialization.
109-112: Improved error messageBetter formatted error message with more context.
aggregator/src/main/scala/ai/chronon/aggregator/windowing/SawtoothMutationAggregator.scala (4)
41-41: Made parameter publicConverting parameter to val makes it accessible.
112-115: Performance optimizationStoring windowMillis avoids repeated property access.
131-135: Loop optimizationReduces repeated calls to windowMappings.
148-153: Consistent optimizationSame pattern applied throughout the class.
api/python/test/sample/group_bys/kaggle/clicks.py (4)
15-25: Better import organizationAppropriate import grouping and updated to use selects.
44-52: Updated query constructionUsing selects instead of select aligns with API changes.
58-60: Enhanced aggregationsAdded COUNT operation and simplified window syntax.
64-65: Consistent formattingAdded trailing comma for better git diffs.
api/python/test/sample/group_bys/quickstart/returns.py (4)
15-15: Imports updated to use EventSource before SourceUpdated import order provides better structure for dependency resolution.
21-21: Updated to use selects instead of selectAPI updated to use the newer selects function from query module.
38-38: Simplified window sizes with string durationsReplaced Window objects with more readable string format.
44-53: Reordered aggregations for clarityUpdated the order of operations to be more logical (SUM, COUNT, AVERAGE).
api/python/test/sample/joins/sample_team/sample_online_join.py (4)
26-27: Added important imports for environment configurationNew imports enable proper environment configuration.
34-43: Standardized key format to double quotesChanged key_mapping values to use consistent double quotes.
45-49: Improved environment variable configurationReplaced legacy parameters with structured EnvironmentVariables object.
50-51: Added important join configuration flagsAdded online and consistency checking parameters.
api/python/ai/chronon/repo/init.py (4)
14-28: Well-structured CLI command definitionGood use of click decorators with appropriate options.
33-39: Added safeguard for existing directoriesPrevents accidental overwrites with user confirmation.
42-47: Clear user feedback with helpful instructionsGood UX with success message and PYTHONPATH instructions.
48-49: Robust error handlingException handling prints full traceback for debugging.
api/python/test/sample/joins/sample_team/sample_label_join_with_agg.py (5)
23-23: Import structure updatedCleaner import from dedicated module.
26-26: Consolidated import from types moduleGood simplification using the centralized types module.
34-35: String format standardizationConsistent use of double quotes in key mappings.
Also applies to: 38-39
41-48: Updated label parts structureMigrated to the newer
LabelPartsAPI that wraps a list ofJoinPartobjects.
49-50: Simplified Join parametersRemoved deprecated parameters and improved formatting.
api/python/test/sample/joins/risk/user_transactions.py (4)
1-4: Well-organized importsGood organization of domain-specific group-bys.
5-7: Clear API importsClean import structure for Chronon types.
9-13: Well-defined sourceClean source definition with appropriate table and query configuration.
15-23: Well-structured joinEffective organization of join parts with appropriate prefixes to differentiate fields.
api/python/test/sample/joins/sample_team/sample_label_join.py (5)
23-23: Import from dedicated moduleImproved structure using dedicated module.
26-26: Organized importsGood import organization from join module.
34-35: Consistent string formatStandardized double quotes in key mappings.
Also applies to: 38-39
41-48: Updated label parts structureUsing the newer
LabelPartsAPI with list ofJoinPartobjects.
49-50: Simplified constructor parametersRemoved deprecated parameters for cleaner interface.
api/python/test/sample/group_bys/sample_team/event_sample_group_by.py (2)
17-17: Consolidated importsGood use of centralized types module.
23-30: Streamlined aggregationsImproved aggregation structure with:
- Simplified window specification ("7d")
- Better organization of operations
- Clearer percentile parameters
api/python/ai/chronon/repo/explore.py (4)
147-148: Updated team attribute accessNow using
__dict__attribute access instead of direct dictionary access, aligning with the new Team object model.
315-316: Fixed variable shadowingRenamed
filetofilepathto avoid shadowing the parameter name.
370-384: Added support for Python-based team configurationNew function supports both JSON and Python module loading, enhancing flexibility.
403-403: Updated call to load_team_dataNow passing teams_root parameter to support the enhanced team loading functionality.
api/python/test/sample/group_bys/sample_team/label_part_group_by.py (1)
1-24: LGTM - Clean GroupBy definitionsThe file properly defines two GroupBy objects with different configurations.
aggregator/src/test/scala/ai/chronon/aggregator/test/ApproxDistinctTest.scala (3)
21-23: Updated test frameworkMigrated from JUnit to ScalaTest by extending AnyFlatSpec.
53-57: Modernized test styleConverted to ScalaTest's more readable "it should" syntax.
59-63: Consistent test style updatesMatches the ScalaTest pattern applied to other tests.
api/python/setup.py (7)
30-32: Added explicit version defaultSet default version to "0.0.1" for better versioning control.
52-52: Added resource collectionNew glob pattern collects test samples recursively.
55-55: Updated Python version requirementNow requiring Python 3.11 in classifiers.
59-63: Added CLI entry pointConsole script maps "zipline" command to ai.chronon.repo.zipline:zipline.
64-67: Package rebrandingChanged name from "chronon-ai" to "zipline-ai" with updated description.
68-70: Improved package data handlingNow includes resources directory content in the package.
74-78: Updated packaging settingsIncreased minimum Python version to 3.11 and set zip_safe to False.
api/python/test/canary/group_bys/aws/purchases.py (1)
42-44: ConfirmOperation.LAST_K(10)API.
IfLAST_Kis an enum not a function, this will throw at runtime. Please verify.api/python/test/sample/group_bys/risk/transaction_events.py (8)
1-3: Updated imports for standardized API.Import changes align with API updates - using
selectsinstead ofselectand removing unused imports.
9-9: Line spacing improvement.Added space for better code readability.
16-17: Updated to use selects function and improved formatting.Correctly updated to use
selectsfunction and added trailing comma for consistent formatting.
22-23: Enhanced readability with line spacing.Added space for better code organization.
23-23: Simplified window specification.Changed from explicit
Windowobjects to string literals for more concise window specification.
35-36: Updated window specifications to use string format.String-based window specifications replace explicit
Windowobjects, consistent with API updates.Also applies to: 40-41
45-45: Improved code organization.Added space for better visual separation between function and implementation.
50-50: Added trailing comma for consistency.Fixed formatting consistency.
api/python/test/sample/group_bys/risk/merchant_data.py (1)
1-29: New GroupBy for merchant data.Well-structured merchant data source and GroupBy definition. Selects relevant merchant fields and correctly sets up the GroupBy keyed by merchant_id.
.github/workflows/push_to_platform.yaml (4)
3-11: Well-configured workflow trigger.Correctly set to run only when PRs to main are closed and merged.
14-20: Proper repository checkout.Correctly checks out platform repo with appropriate depth and reference.
22-25: Git configuration for Actions user.Properly configures Git user identity for automated commits.
27-54: SSH setup for repository access.Comprehensive SSH configuration for secure subtree operations.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 34-34: trailing spaces
(trailing-spaces)
[error] 37-37: trailing spaces
(trailing-spaces)
[error] 41-41: trailing spaces
(trailing-spaces)
[error] 49-49: trailing spaces
(trailing-spaces)
[error] 52-52: trailing spaces
(trailing-spaces)
api/python/ai/chronon/windows.py (3)
1-51: Well-structured window parsing implementation.Clean implementation for converting string duration formats to Window objects with proper error handling.
1-10: Helper functions for window creation.Simple, focused helper functions that encapsulate window creation logic.
12-50: Robust string parsing with comprehensive error handling.The
_from_strfunction handles all edge cases with clear error messages.api/python/test/sample/joins/sample_team/sample_join.py (4)
17-17: New imports added for enhanced functionalityImports restructured to include RunMode, EnvironmentVariables, and LabelParts.
Also applies to: 22-23
27-27: Removed experimental tagExperimental tag removed from JoinPart instances.
30-36: Modernized environment variable handlingReplaced legacy env dictionary with structured EnvironmentVariables class. Added online flag and label_part configuration.
43-43: Updated string formattingChanged single quotes to double quotes for consistency.
api/python/ai/chronon/cli/compile/fill_templates.py (3)
6-14: Well-structured helper functionSimple utility function for template substitution.
17-31: Clean implementation for join template handlingFunction properly processes bootstrapParts and dependencies.
32-40: Label dependencies template handlingCorrectly handles join_backfill_table template replacement for labelParts.
api/python/ai/chronon/repo/compilev3.py (2)
11-32: Well-designed CLI commandGood use of Click decorators with appropriate defaults and help text. Path handling is robust.
35-52: Clean compilation helper functionProper directory validation and error handling. Good separation of concerns.
api/python/test/sample/deprecated_teams.json (2)
1-36: Appropriate default configurationsGeneric default settings with placeholders for sensitive values.
37-64: Generic team definitionsTeams defined using generic names instead of actual customer names, consistent with PR objective.
api/python/test/canary/deprecated_teams.json (1)
9-11: Replace TODO placeholdersRemove temporary placeholder values with actual paths and classes.
These placeholders indicate incomplete configuration that should be addressed before production use.
api/python/test/canary/teams.py (2)
7-7: Update email placeholderReplace customer placeholder with actual email.
- email="ml-infra@<customer>.com", # TODO: Infra team email + email="[email protected]", # Replace with actual team email
18-20: Replace TODO placeholdersThree TODO items need resolution before production use.
These placeholders indicate incomplete configuration:
- Hadoop directory path
- Online class specification
- Online args configuration
api/python/test/canary/group_bys/gcp/purchases.py (1)
30-37: VerifyGroupBysignature. Ensure anameparameter isn’t required byGroupBy.aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala (1)
142-148:Double→Longconversion silently truncatesCasting a
Double(orFloat) toLongvialongValue()discards the fractional part. If callers expect rounding, this will surprise them—document or adjust.api/python/ai/chronon/repo/zipline.py (2)
6-9: Internal CLI imports look solid. Paths and aliases align with the new modular layout.
47-49: Subcommand registration is correct. The CLI group is wired to compile, run, and init as intended.api/python/ai/chronon/types.py (7)
13-20: Approve source re-exports. Clear aliases forQueryandSourceimprove discoverability.
21-28: Approve aggregation exports. ExposingGroupBy,Aggregation, and friends in one place is helpful.
29-32: Accuracy enums correct.TEMPORALandSNAPSHOTare properly aliased.
33-44: Join primitives consolidated. All join-related types are neatly re-exported.
46-49: StagingQuery & MetaData aliases. Good centralization of staging‐query concepts.
51-55: Common types re-exported.ExecutionInfoandTableDependencyare critical for workflow definitions.
56-57: VerifyTeamexistence. Ensurettypes.Teamis defined in your Thrift schema to prevent import failures..github/workflows/test_scala_2_12_non_spark.yaml (1)
31-34: Concurrency is well set. Cancelling in-progress runs for the same branch is a solid safeguard..github/workflows/test_scala_2_13_non_spark.yaml (2)
51-58: Scala 2.13 config applied.--config=scala_2.13is correctly included in the test commands.
29-32: Concurrency block is good. Settings ensure orderly execution.api/python/ai/chronon/eval/sample_tables.py (1)
24-24:✅ Verification successful
Verify path handling.
Ensure
local_warehouseis a Path object if using/operator.
🏁 Script executed:
#!/bin/bash # Check local_warehouse definition grep -r "local_warehouse" api/python/ai/chronon/eval/Length of output: 686
Path handling validated
local_warehouseis defined as apathlib.Pathinapi/python/ai/chronon/eval/table_scan.py, so using the/operator is safe.api/python/ai/chronon/cli/compile/serializer.py (2)
24-84: LGTM!Well-structured code with good type handling.
86-116: Clean implementation with good error handling.Functions properly handle serialization/deserialization.
aggregator/BUILD.bazel (3)
1-21: LGTM!Well-structured Scala library definition.
36-52: LGTM!Test configuration looks good.
23-23:✅ Verification successful
Verify dependency definition.
_SCALA_TEST_DEPSis used but definition not shown.
🏁 Script executed:
#!/bin/bash # Find where _SCALA_TEST_DEPS is defined grep -r "_SCALA_TEST_DEPS" --include="BUILD*" .Length of output: 463
🏁 Script executed:
#!/bin/bash # locate definition of _SCALA_TEST_DEPS grep -R "_SCALA_TEST_DEPS" -n . --include="*.bzl"Length of output: 49
🏁 Script executed:
#!/bin/bash grep -R "load" -n aggregator/BUILD.bazelLength of output: 40
🏁 Script executed:
#!/bin/bash # search repo for definition rg -n '_SCALA_TEST_DEPS' .Length of output: 522
_SCALA_TEST_DEPS definition found
The_SCALA_TEST_DEPSlist is declared intools/build_rules/prelude_bazel:39. No further action needed here.api/python/ai/chronon/query.py (6)
15-16: Added OrderedDict import and reordered imports for new selects function.
27-27: Fixed mutable default argument pattern.Changed from
setups: List[str] = []tosetups: List[str] = Noneto avoid potential bugs with shared mutable defaults.
30-32: Added new partition-related parameters.New parameters enhance query configuration flexibility.
77-85: Well-documented new parameters.Clear descriptions for all new parameters.
89-99: Updated API object construction with explicit keyword arguments.Properly added new parameters to the API Query constructor.
103-126: Renamed select to selects and improved implementation.The new implementation:
- Uses OrderedDict to preserve argument order
- Handles both positional and keyword arguments
- Has clear documentation with examples
This is a better design that offers more flexibility.
api/python/ai/chronon/cli/compile/parse_configs.py (1)
15-74: Well-structured function for recursive config parsing.Function properly handles:
- File discovery
- Object extraction
- Metadata updates
- Error handling with graceful recovery
api/python/ai/chronon/source.py (3)
8-35: Well-designed EventSource wrapper with comprehensive documentation.Clean function signature with clear docstring explaining event source semantics.
38-71: EntitySource wrapper with thorough documentation.Function correctly constructs the entity source with appropriate parameters.
74-88: Concise JoinSource wrapper with clear online/offline flow documentation.Function correctly constructs a join source with informative description of how it's used.
api/python/ai/chronon/repo/run.py (2)
52-86: Comprehensive default value handling.Function properly initializes defaults from environment variables with fallbacks.
247-268: Cloud provider-specific runner dispatch logic.The code effectively handles different cloud environments by dispatching to appropriate runners.
api/python/ai/chronon/repo/hub_uploader.py (1)
12-18:changed_entity_nameslogic is placeholder but misleading
Assigning the full dict tochanged_entity_namesthen iterating over it always returns all entities. Add a TODO comment or implement real diffing to avoid accidental full uploads..github/workflows/test_scala_2_13_spark.yaml (2)
1-19: Well-structured workflow trigger configurationProperly configured to run only when relevant files change.
23-23:❓ Verification inconclusive
Custom runner labels need verification
These runner labels aren't standard GitHub Actions runners.
Also applies to: 50-50, 77-77, 104-104, 131-131, 158-158, 185-185
🏁 Script executed:
#!/bin/bash # Check if these are self-hosted runners defined in the repository gh api repos/$GITHUB_REPOSITORY/actions/runners | jq '.runners[] | .name'Length of output: 140
Verify custom runner labels exist
These labels aren’t standard GitHub-hosted runners. Ensure you have matching self-hosted runners registered with these exact labels in your repo/org; otherwise these jobs will fail.
Affected locations:
- .github/workflows/test_scala_2_13_spark.yaml: line 23 (
runs-on: ubuntu-8_cores-32_gb)- Same pattern at lines 50, 77, 104, 131, 158, 185
🧰 Tools
🪛 actionlint (1.7.4)
23-23: label "ubuntu-8_cores-32_gb" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
api/python/ai/chronon/repo/extract_objects.py (4)
25-41: Simplified from_folder with better parameter alignmentProperly removed unused root_path parameter.
44-65: Enhanced error tracking in V2 implementationGood addition of structured error handling with target file tracking.
83-103: Improved from_file with better path handlingNow uses dedicated helper functions for path conversion.
106-131: New robust path handling utilitiesGood assertions to catch invalid paths early.
api/BUILD.bazel (3)
1-19: Well-structured Thrift generation and Java library setupClear dependencies with appropriate visibility.
21-43: Clean Scala library configuration with conditional formattingSmart conditional formatting based on Scala version.
45-72: Comprehensive test setupWell-structured test dependencies and suite configuration.
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (4)
8-17: Proper migration to ScalaTestWell-executed transition from JUnit to ScalaTest.
83-96: Updated sketch size expectationsReordered and updated expected sketch size mappings.
166-182: Excellent test data generation helperCreates realistic skewed data distribution for testing.
184-210: Comprehensive test cases for frequent items behaviorTests both frequent items and heavy hitters configuration modes.
api/python/ai/chronon/staging_query.py (1)
90-92: Caller-path team detection is brittle
inspect.stack()[1].filename.split("/")[-2]fails on shallow paths & Windows. Consider an explicitteamarg orpathlib.Path.api/python/ai/chronon/cli/compile/compile_context.py (1)
106-113: Dot-split assumes team present.
compiled_obj.name.split(".", 1)will fail if no dot. Guard to avoidValueError.api/python/ai/chronon/repo/constants.py (1)
72-78: Check Spark/Scala mapping.
Spark 3.2.1 is usually built for Scala 2.12, not 2.13. Confirm before releasing.api/python/ai/chronon/join.py (1)
523-529: VerifyExecutionInfofield names
common.ExecutionInfois populated withenv=env_vars; schema often usesenvVars(camel-case). Confirm to avoid lost data.api/python/ai/chronon/group_by.py (1)
143-146: Defaultkdropped from 128→20 – please confirm.Lower
kincreases error onAPPROX_PERCENTILE. Is 20 empirically justified?
| esac | ||
| done | ||
|
|
||
| gcloud storage cp "${ARTIFACT_PREFIX%/}/release/$VERSION/wheels/zipline_ai-$VERSION-py3-none-any.whl" . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for download.
Missing check if download fails.
-gcloud storage cp "${ARTIFACT_PREFIX%/}/release/$VERSION/wheels/zipline_ai-$VERSION-py3-none-any.whl" .
+if ! gcloud storage cp "${ARTIFACT_PREFIX%/}/release/$VERSION/wheels/zipline_ai-$VERSION-py3-none-any.whl" .; then
+ echo "Error: Failed to download wheel file."
+ exit 1
+fi🤖 Prompt for AI Agents
In api/python/ai/chronon/resources/gcp/zipline-cli-install.sh at line 48, the
script downloads a file using gcloud storage cp but does not check if the
download succeeds. Add error handling by checking the exit status of the gcloud
command immediately after it runs. If the command fails, output an error message
and exit the script with a non-zero status to prevent further execution.
|
|
||
| trap 'rm -f ./zipline_ai-$VERSION-py3-none-any.whl' EXIT | ||
|
|
||
| pip3 uninstall zipline-ai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add -y flag to uninstall.
Pip uninstall prompts for confirmation.
-pip3 uninstall zipline-ai
+pip3 uninstall -y zipline-ai || true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pip3 uninstall zipline-ai | |
| pip3 uninstall -y zipline-ai || true |
🤖 Prompt for AI Agents
In api/python/ai/chronon/resources/gcp/zipline-cli-install.sh at line 52, the
pip3 uninstall command lacks the -y flag, causing it to prompt for confirmation.
Add the -y flag to the pip3 uninstall command to enable automatic confirmation
and prevent the prompt during uninstallation.
|
|
||
| pip3 uninstall zipline-ai | ||
|
|
||
| pip3 install ./zipline_ai-$VERSION-py3-none-any.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for install.
Missing check if install fails.
-pip3 install ./zipline_ai-$VERSION-py3-none-any.whl
+if ! pip3 install ./zipline_ai-$VERSION-py3-none-any.whl; then
+ echo "Error: Failed to install wheel file."
+ exit 1
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pip3 install ./zipline_ai-$VERSION-py3-none-any.whl | |
| if ! pip3 install ./zipline_ai-$VERSION-py3-none-any.whl; then | |
| echo "Error: Failed to install wheel file." | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
In api/python/ai/chronon/resources/gcp/zipline-cli-install.sh at line 54, the
pip3 install command lacks error handling. Modify the script to check the exit
status of the pip3 install command immediately after it runs, and if the
installation fails (non-zero exit code), output an error message and exit the
script with a failure status to prevent further execution.
| if [ $# -ne 4 ]; then | ||
| print_usage | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix argument validation logic.
Script requires exactly 4 args but doesn't verify both required params are present.
-if [ $# -ne 4 ]; then
+if [ $# -lt 2 ]; then
print_usage
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ $# -ne 4 ]; then | |
| print_usage | |
| exit 1 | |
| fi | |
| if [ $# -lt 2 ]; then | |
| print_usage | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
In api/python/ai/chronon/resources/gcp/zipline-cli-install.sh around lines 11 to
14, the argument validation only checks if the number of arguments is not equal
to 4 but does not verify that all required parameters are present. Update the
validation logic to explicitly check that each required argument is provided and
not empty, ensuring the script only proceeds when all four necessary parameters
are correctly passed.
| uses: docker/login-action@v1 | ||
| with: | ||
| registry: ${{ env.REGISTRY }} | ||
| username: ${{ github.actor }} | ||
| password: ${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docker/login-action version
The runner for docker/login-action@v1 is too old for GitHub Actions.
- - name: Log in to the Container registry
- uses: docker/login-action@v1
+ - name: Log in to the Container registry
+ uses: docker/login-action@v2Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 actionlint (1.7.4)
24-24: the runner of "docker/login-action@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/build_and_push_docker.yaml around lines 24 to 28, the
docker/login-action is using version v1, which is outdated for the current
GitHub Actions runner. Update the action version from docker/login-action@v1 to
the latest stable version, such as docker/login-action@v2, to ensure
compatibility and improved functionality.
| - name: Push Jars to s3 Bucket | ||
| shell: bash | ||
| run: | | ||
| set -eo pipefail | ||
| aws s3 cp ${{ needs.build_artifacts.outputs.wheel_file }} s3://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/wheels/ --metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| aws s3 cp flink_assembly_deploy.jar s3://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/flink_assembly_deploy.jar --metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| aws s3 cp cloud_aws_lib_deploy.jar s3://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/cloud_aws_lib_deploy.jar --metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| aws s3 cp service_assembly_deploy.jar s3://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/service_assembly_deploy.jar --metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifact paths wrong ⇒ aws s3 cp will 404
actions/download-artifact restores files under a directory named after the artifact (flink-assembly-jar/…).
aws s3 cp flink_assembly_deploy.jar … therefore fails.
-aws s3 cp flink_assembly_deploy.jar ...
+aws s3 cp "flink-assembly-jar/flink_assembly_deploy.jar" ...Apply the same prefix for all jar & wheel uploads, and quote variables to silence SC2086.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In .github/workflows/push_to_canary.yaml around lines 185 to 193, the aws s3 cp
commands use incorrect artifact paths causing 404 errors because the downloaded
artifacts are inside directories named after the artifact. Fix this by prefixing
the jar and wheel file paths with their respective artifact directory names.
Also, quote all variable expansions in the commands to prevent word splitting
and silence shellcheck SC2086 warnings.
| set -eo pipefail | ||
| gcloud storage cp ${{ needs.build_artifacts.outputs.wheel_file }} gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/wheels/ | ||
| gcloud storage objects update gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/wheels/${{ needs.build_artifacts.outputs.wheel_file }} --custom-metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| gcloud storage cp flink_assembly_deploy.jar gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/flink_assembly_deploy.jar | ||
| gcloud storage objects update gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/flink_assembly_deploy.jar --custom-metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| gcloud storage cp cloud_gcp_lib_deploy.jar gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/cloud_gcp_lib_deploy.jar | ||
| gcloud storage objects update gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/cloud_gcp_lib_deploy.jar --custom-metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| gcloud storage cp cloud_gcp_embedded_lib_deploy.jar gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/cloud_gcp_embedded_lib_deploy.jar | ||
| gcloud storage objects update gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/cloud_gcp_embedded_lib_deploy.jar --custom-metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" | ||
| gcloud storage cp service_assembly_deploy.jar gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/service_assembly_deploy.jar | ||
| gcloud storage objects update gs://zipline-artifacts-canary/release/${{ needs.build_artifacts.outputs.version }}/jars/service_assembly_deploy.jar --custom-metadata="updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Same missing sub-folder with gcloud storage cp
All gcloud storage cp lines assume flat files. Prefix with the download directory & quote paths.
Example:
-gcloud storage cp flink_assembly_deploy.jar ...
+gcloud storage cp "flink-assembly-jar/flink_assembly_deploy.jar" ...🤖 Prompt for AI Agents
In .github/workflows/push_to_canary.yaml around lines 248 to 258, the gcloud
storage cp commands do not include the source sub-folder path and do not quote
file paths, which can cause errors if files are not in the current directory or
paths contain spaces. Fix this by prefixing each source file with its download
directory path and enclosing all file paths in quotes to ensure correct file
referencing and handling of spaces.
| def LabelParts( | ||
| labels: List[api.JoinPart], | ||
| left_start_offset: int, | ||
| left_end_offset: int, | ||
| label_offline_schedule: str = "@daily", | ||
| ) -> api.LabelPart: | ||
| ) -> api.LabelParts: | ||
| """ | ||
| Used to describe labels in join. Label part can be viewed as regular join part but represent | ||
| label data instead of regular feature data. Once labels are mature, label join job would join | ||
| labels with features in the training window user specified using `leftStartOffset` and | ||
| `leftEndOffset`. | ||
| The offsets are relative days compared to given label landing date `label_ds`. This parameter is required to be | ||
| passed in for each label join job. For example, given `label_ds = 2023-04-30`, `left_start_offset = 30`, and | ||
| `left_end_offset = 10`, the left size start date will be computed as 30 days before `label_ds` (inclusive), | ||
| which is 2023-04-01. Similarly, the left end date will be 2023-04-21. Labels will be refreshed within this window | ||
| [2023-04-01, 2023-04-21] in this specific label job run. | ||
| labels with features in the training window user specified within the label GroupBy-s. | ||
| Since label join job will run continuously based on the schedule, multiple labels could be generated but with | ||
| different label_ds or label version. Label join job would have all computed label versions available, as well as | ||
| a view of latest version for easy label retrieval. | ||
| LabelPart definition can be updated along the way, but label join job can only accommodate these changes going | ||
| LabelParts definition can be updated along the way, but label join job can only accommodate these changes going | ||
| forward unless a backfill is manually triggered. | ||
| Label aggregation is also supported but with conditions applied. Single aggregation with one window is allowed | ||
| for now. If aggregation is present, we would infer the left_start_offset and left_end_offset same as window size | ||
| and the param input will be ignored. | ||
| :param labels: List of labels | ||
| :param left_start_offset: Relative integer to define the earliest date label should be refreshed | ||
| compared to label_ds date specified. For labels with aggregations, | ||
| this param has to be same as aggregation window size. | ||
| :param left_end_offset: Relative integer to define the most recent date(inclusive) label should be refreshed. | ||
| e.g. left_end_offset = 3 most recent label available will be 3 days | ||
| prior to 'label_ds' (including `label_ds`). For labels with aggregations, this param | ||
| has to be same as aggregation window size. | ||
| :param label_offline_schedule: Cron expression for Airflow to schedule a DAG for offline | ||
| label join compute tasks | ||
| """ | ||
|
|
||
| label_metadata = api.MetaData(offlineSchedule=label_offline_schedule) | ||
| exec_info = common.ExecutionInfo( | ||
| scheduleCron=label_offline_schedule, | ||
| ) | ||
| label_metadata = api.MetaData(executionInfo=exec_info) | ||
|
|
||
| for label in labels: | ||
| if label.groupBy.aggregations is not None: | ||
| assert len(labels) == 1, "Multiple label joinPart is not supported yet" | ||
| valid_agg = ( | ||
| len(label.groupBy.aggregations) == 1 | ||
| and label.groupBy.aggregations[0].windows is not None | ||
| and len(label.groupBy.aggregations[0].windows) == 1 | ||
| ) | ||
| assert valid_agg, ( | ||
| "Too many aggregations or invalid windows found. " | ||
| "Single aggregation with one window allowed." | ||
| ) | ||
| valid_time_unit = ( | ||
| label.groupBy.aggregations[0].windows[0].timeUnit == api.TimeUnit.DAYS | ||
| ) | ||
| assert valid_time_unit, "Label aggregation window unit must be DAYS" | ||
| window_size = label.groupBy.aggregations[0].windows[0].length | ||
| if left_start_offset != window_size or left_start_offset != left_end_offset: | ||
| assert ( | ||
| left_start_offset == window_size and left_end_offset == window_size | ||
| ), ( | ||
| "left_start_offset and left_end_offset will be inferred to be same as aggregation" | ||
| "window {window_size} and the incorrect values will be ignored. " | ||
| ) | ||
|
|
||
| return api.LabelPart( | ||
| return api.LabelParts( | ||
| labels=labels, | ||
| leftStartOffset=left_start_offset, | ||
| leftEndOffset=left_end_offset, | ||
| metaData=label_metadata, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left_start_offset / left_end_offset silently ignored
The parameters remain in the signature & docstring but are never used, so callers believe offsets are honored while they’re discarded. Either accept + store them or delete from the API.
🤖 Prompt for AI Agents
In api/python/ai/chronon/join.py between lines 252 and 287, the parameters
left_start_offset and left_end_offset are present in the function signature and
docstring but are not used in the function body, causing confusion. To fix this,
either remove these parameters from the function signature and update the
docstring accordingly, or modify the function to accept and store these offset
values properly, ensuring they are utilized or reflected in the returned
api.LabelParts object or related metadata.
| """ | ||
| assert name != "contextual", "Please use `ContextualSource`" | ||
| return api.ExternalSource( | ||
| metadata=api.MetaData(name=name, team=team, customJson=custom_json), | ||
| metadata=api.MetaData(name=name, team=team), | ||
| keySchema=DataType.STRUCT(f"ext_{name}_keys", *key_fields), | ||
| valueSchema=DataType.STRUCT(f"ext_{name}_values", *value_fields), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field name typo breaks Thrift serialization
api.ExternalSource expects metaData, not metadata; current code silently drops the metadata, making source.metaData.name later None and crashing duplicate checks.
- metadata=api.MetaData(name=name, team=team),
+ metaData=api.MetaData(name=name, team=team),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| assert name != "contextual", "Please use `ContextualSource`" | |
| return api.ExternalSource( | |
| metadata=api.MetaData(name=name, team=team, customJson=custom_json), | |
| metadata=api.MetaData(name=name, team=team), | |
| keySchema=DataType.STRUCT(f"ext_{name}_keys", *key_fields), | |
| valueSchema=DataType.STRUCT(f"ext_{name}_values", *value_fields), | |
| ) | |
| """ | |
| assert name != "contextual", "Please use `ContextualSource`" | |
| return api.ExternalSource( | |
| metaData=api.MetaData(name=name, team=team), | |
| keySchema=DataType.STRUCT(f"ext_{name}_keys", *key_fields), | |
| valueSchema=DataType.STRUCT(f"ext_{name}_values", *value_fields), | |
| ) |
🤖 Prompt for AI Agents
In api/python/ai/chronon/join.py around lines 200 to 206, the argument name for
the metadata parameter in the api.ExternalSource constructor is incorrectly
written as "metadata" instead of the expected "metaData". This typo causes the
metadata to be ignored, leading to None values and crashes later. Fix this by
renaming the argument from "metadata" to "metaData" to ensure proper
serialization and avoid runtime errors.
| pre_derived_columns: Set[str], derivations: List[Derivation] | ||
| ) -> List[str]: | ||
| """ | ||
| Build the derived columns from pre-derived columns and derivations. | ||
| """ | ||
| # if derivations contain star, then all columns are included except the columns which are renamed | ||
| output_columns = pre_derived_columns | ||
| if derivations: | ||
| found = any(derivation.expression == "*" for derivation in derivations) | ||
| if not found: | ||
| output_columns.clear() | ||
| for derivation in derivations: | ||
| if found and is_identifier(derivation.expression): | ||
| output_columns.remove(derivation.expression) | ||
| if derivation.name != "*": | ||
| output_columns.add(derivation.name) | ||
| return list(output_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Don’t mutate caller’s set
output_columns = pre_derived_columns shares the original set; later .clear() wipes data upstream. Copy first.
- output_columns = pre_derived_columns
+ output_columns = set(pre_derived_columns) # local copy📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pre_derived_columns: Set[str], derivations: List[Derivation] | |
| ) -> List[str]: | |
| """ | |
| Build the derived columns from pre-derived columns and derivations. | |
| """ | |
| # if derivations contain star, then all columns are included except the columns which are renamed | |
| output_columns = pre_derived_columns | |
| if derivations: | |
| found = any(derivation.expression == "*" for derivation in derivations) | |
| if not found: | |
| output_columns.clear() | |
| for derivation in derivations: | |
| if found and is_identifier(derivation.expression): | |
| output_columns.remove(derivation.expression) | |
| if derivation.name != "*": | |
| output_columns.add(derivation.name) | |
| return list(output_columns) | |
| pre_derived_columns: Set[str], derivations: List[Derivation] | |
| ) -> List[str]: | |
| """ | |
| Build the derived columns from pre-derived columns and derivations. | |
| """ | |
| # if derivations contain star, then all columns are included except the columns which are renamed | |
| output_columns = set(pre_derived_columns) # local copy | |
| if derivations: | |
| found = any(derivation.expression == "*" for derivation in derivations) | |
| if not found: | |
| output_columns.clear() | |
| for derivation in derivations: | |
| if found and is_identifier(derivation.expression): | |
| output_columns.remove(derivation.expression) | |
| if derivation.name != "*": | |
| output_columns.add(derivation.name) | |
| return list(output_columns) |
🤖 Prompt for AI Agents
In api/python/ai/chronon/cli/compile/conf_validator.py around lines 150 to 166,
the variable output_columns is assigned directly from pre_derived_columns,
causing mutations like clear() to affect the caller's set. To fix this, create a
copy of pre_derived_columns when assigning to output_columns to avoid mutating
the original set passed by the caller.
Summary
Remove customer names from commit history
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
Chores
Style
Tests
Revert