Skip to content

Conversation

@nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented Mar 25, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Tests

    • Added a test to verify that tile key decoding works reliably.
  • Refactor

    • Adjusted method access levels to improve internal code encapsulation without affecting external behavior.

ken-zlai and others added 30 commits January 21, 2025 09:32
## Summary 
This PR aligns the UI with the Figma design (~95% match), allowing for
minor padding, color, and typography differences.

### Key Updates  
- **Job Status Colors:** Updated to match Figma; defined in `app.css`.  
- **Zipline Logo:** Replaced with the new official logo.  
- **Reusable Metadata Table:** Added `MetadataTable.svelte` component
for flexible use.
- **Support Buttons:** Adjusted behavior in the bottom-left navigation
bar.
- **Route Change:** Renamed `job-tracker` to `job-tracking`.  

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
  - Added a new `MetadataTable` component for structured data display.
  - Introduced new icons for improved visual navigation.
- Enhanced job tracking and observability pages with modular components.

- **UI/UX Improvements**
  - Updated favicon from PNG to SVG.
  - Refined styling for buttons, tabs, and status indicators.
  - Improved navigation bar with new feedback and support buttons.

- **Styling Changes**
  - Revamped job status color and border styling.
  - Adjusted component padding and margins.
  - Updated color variables in CSS and Tailwind configuration.

- **Component Enhancements**
  - Added flexibility to existing components with new props.
- Removed `LogsDownloadButton` and replaced with a more generic button.
  - Improved metadata and job tracking displays.
  - Enhanced collapsible sections for better consistency.
  - Introduced new properties for better customization in components.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Sean Lynch <[email protected]>
## Summary

accept `1h`, `30d` etc as valid entries to the Windows arg of
Aggregation.


## Checklist
- [x] Added Unit Tests
- [x] Covered by existing CI
- [x] Integration tested
- [x] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Enhanced window specification in GroupBy functionality, now supporting
string-based time window definitions (e.g., "7d", "1h").

- **Improvements**
	- Simplified window configuration syntax across multiple files.
	- Removed deprecated `Window` and `TimeUnit` class imports.
	- Improved code readability in various sample and test files.

- **Tests**
	- Added new test case to validate string-based window specifications.

- **Documentation**
- Updated documentation examples to reflect new window specification
method.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Switched over to using layerstack format function. This should preserve
functionality (and treat dates in job tracker "as is" without converting
for timezone.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Changes in Date Formatting**
	- Removed `formatDate` function from multiple components.
- Updated date formatting approach to utilize a new utility function for
consistency.
- Simplified date display logic, enhancing clarity in job run
information.

- **Impact on User Experience**
	- Date representations may appear slightly different.
	- Consistent date formatting across the application.
	- Minor visual changes in job tracking and observability views.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Migrated `Flink` module to Bazel from sbt.

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
	- Added Scala library definitions for the main application and testing.
- Introduced new Apache Flink dependencies for streaming, metrics, and
client interactions.

- **Chores**
- Updated build configuration to support a modular Scala project
structure.
	- Enhanced Maven repository with additional Flink-related libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Eugene changed some of the job tracker colors, they are updated here.
([link, they are below this
view](https://www.figma.com/design/zM3rfODJTgJ6iu5zbXIPqk/%5BCS%5D-ZIpline-AI-Platform?node-id=1108-7802&m=dev))

I also noticed that tailwind was ignoring the hover styles (weird, since
it worked for me yesterday. must be a bug), so I added a regex to catch
those.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Style**
	- Updated color values for job state styles in the application
- Added new CSS custom properties for job states, including waiting and
invalid states
	- Enhanced visual representation of job status colors

- **Chores**
- Modified Tailwind CSS configuration to support dynamic job status
border styles

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
… partitioned tables. (#257)

… partitioned tables.

## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved BigQuery data writing method to support partitioned tables by
changing the write method from "direct" to "indirect".

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Migrated `cloud_gcp` module to Bazel from sbt. Below are still pending
and will fix them in a separate PR
* Few tests are failing, this is actually a general problem across the
project and working on it in parallel
* cloud_gcp_submitter target logic also need to be migrated

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **Dependency Updates**
  - Downgraded `rules_jvm_external` from version 6.6 to 4.5
  - Updated Mockito version to 5.12.0
- Added new Google Cloud service libraries (BigQuery, Bigtable, PubSub,
Dataproc)
  - Added Apache Curator artifact

- **Build Configuration**
  - Updated library visibility settings
  - Modified logging dependencies in Spark library
  - Enhanced Maven repository with additional artifacts

- **Development Improvements**
  - Expanded testing and dependency management capabilities

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Migrated `service` module to Bazel from sbt.

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Build Configuration**
	- Added new Bazel targets for Java library and test suite
	- Configured dependencies for library and testing

- **Test Improvements**
- Updated test code to use more concise collection initialization
methods
	- Simplified test code structure without changing functionality

- **Dependency Management**
	- Added Vert.x Unit testing library version 4.5.10
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: nikhil-zlai <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
## Summary
Migrated `hub` module to Bazel from sbt.

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **Dependency Management**
  - Added new dependency lists for Flink, Vert.x, and Circe libraries
- Consolidated and standardized dependency management across multiple
projects
  - Introduced new Maven artifact dependencies for Circe and Vert.x

- **Build Configuration**
  - Refactored dependency references in multiple `BUILD.bazel` files
  - Updated build rules to use centralized dependency variables
  - Simplified library and test suite configurations

- **Library Updates**
  - Added Scala libraries for hub project
  - Updated service commons library dependencies
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: nikhil-zlai <[email protected]>
Co-authored-by: Kumar Teja Chippala <[email protected]>
## Summary
Migrated `orchestration` module to Bazel from sbt.

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
	- Added Scala library configurations for main and test code.
	- Introduced new Log4j dependencies for enhanced logging capabilities.

- **Chores**
	- Updated build configuration to support Scala project structure.
	- Expanded Maven repository with additional logging libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
eval("select ... from ... where ...") is now supported. easier for
exploration.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [x] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
	- Updated setup instructions for Chronon Python interactive runner
	- Modified environment variable paths and gateway service instructions

- **New Features**
	- Added ability to specify result limit when evaluating queries
	- Enhanced SQL query evaluation with error handling

- **Bug Fixes**
	- Simplified interactive usage example
	- Updated Spark session method in DataFrame creation

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

^^^

- Did have to add `bigtables.tables.create` permission to the dataproc
service account `[email protected]`
since one of the tables `CHRONON_ENTITY_BY_TEAM` didn't exist at the
time. See job:

https://console.cloud.google.com/dataproc/jobs/a04f9ba8-583c-475e-8956-9b53d28f3ed6/monitoring?region=us-central1&inv=1&invt=AbnIoA&project=canary-443022

cc @chewy-zlai 

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

Successful test in Dataproc:
https://console.cloud.google.com/dataproc/jobs/04ff550c-4df6-47ce-a36b-3238c1cb63a1/monitoring?region=us-central1&inv=1&invt=AbnIzw&project=canary-443022

See:
```
(dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_METADATA
2025/01/17 21:44:34 -creds flag unset, will use gcloud credential
----------------------------------------
CHRONON_METADATA#purchases.v1
  cf:value                                 @ 2025/01/17-21:30:36.130000
    "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":13,\"argMap\":{\"k\":\"10\"}}],\"backfillStartDate\":\"2023-11-20\"}"
  cf:value                                 @ 2025/01/17-21:28:45.503000
    "{\"metaData\":{\"name\":\"quickstart.purchases.v1\",\"online\":1,\"customJson\":\"{\\\"lag\\\": 0, \\\"groupby_tags\\\": null, \\\"column_tags\\\": {}}\",\"dependencies\":[\"{\\\"name\\\": \\\"wait_for_data.purchases_external_ds\\\", \\\"spec\\\": \\\"data.purchases_external/ds={{ ds }}\\\", \\\"start\\\": null, \\\"end\\\": null}\"],\"outputNamespace\":\"data\",\"team\":\"quickstart\",\"offlineSchedule\":\"@daily\"},\"sources\":[{\"events\":{\"table\":\"data.purchases_external\",\"query\":{\"selects\":{\"user_id\":\"user_id\",\"purchase_price\":\"purchase_price\"},\"timeColumn\":\"ts\",\"setups\":[]}}}],\"keyColumns\":[\"user_id\"],\"aggregations\":[{\"inputColumn\":\"purchase_price\",\"operation\":7,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":6,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputColumn\":\"purchase_price\",\"operation\":8,\"argMap\":{},\"windows\":[{\"length\":3,\"timeUnit\":1},{\"length\":14,\"timeUnit\":1},{\"length\":30,\"timeUnit\":1}]},{\"inputCol
```

```
(dev_chronon) davidhan@Davids-MacBook-Pro: ~/zipline/chronon/api/py/test/sample (davidhan/metadata_upload) $ cbt -project=canary-443022 -instance=zipline-canary-instance read CHRONON_ENTITY_BY_TEAM
2025/01/17 21:45:43 -creds flag unset, will use gcloud credential
----------------------------------------
CHRONON_ENTITY_BY_TEAM#group_bys/quickstart
  cf:value                                 @ 2025/01/17-21:30:36.621000
    "cHVyY2hhc2VzLnYx"
```



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Added configuration type specification option for command-line
interfaces.
- Enhanced metadata processing with more flexible configuration
handling.

- **Improvements**
  - Refined command execution logic for better control flow.
  - Updated metadata parsing to support additional configuration types.

- **Documentation**
- Added clarifying comments about asynchronous task behaviors in dataset
and table creation processes.

The release introduces more flexible configuration management and
improved command-line argument handling across various components.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

Input/out schema for the Job tracker and lineage API on ZiplineHub. 

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a comprehensive job tracking system with support for
tracking job execution, status, and metadata.
	- Added a new API endpoint for handling job type requests.
- Defined structures for job tracking requests and responses, including
job metadata.
- Introduced enumerations for job processing direction, execution modes,
and statuses.
	- Added a new handler for processing job tracking requests.

- **Bug Fixes**
- Corrected a typographical error in a comment within the orchestration
Thrift file.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
## Summary

- Should be using the same reader as every other case. 
## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Improvements**
- Enhanced data loading consistency for Hive, Delta, and Iceberg table
formats
- Ensured that DataFrameReader options are consistently applied when
loading tables

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- remove the implicit dependency on FormatProvider for CreationUtils.
Step 1 to supporting format-specific table creation.


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Code Improvements**
- Updated method signatures in `TableUtils` and `CreationUtils` to
enhance clarity and flexibility of table creation and partition
insertion processes
- Refined how file formats and table types are specified during table
operations
	- Streamlined parameter handling for table creation methods

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- set writeOptions instead of global conf to avoid mutating / races
between theads

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Configuration Management**
	- Simplified BigQuery configuration handling in Spark session
	- Removed configuration restoration logic after method execution

Note: The changes are primarily internal and do not directly impact
end-user functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- Initial implementation of Python `compile` -- compiles the whole repo
- Placeholder implementation of `backfill` -- next step will be to wire
up to the orchestrator for implementation
- Documentation for the broader user API surface area and how it should
behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

<img width="729" alt="image"
src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed"
/>

Next steps:
- Connect with ZiplineHub for lineage, and backfill execution

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [x] Documentation update




<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Expanded CLI documentation with detailed descriptions of `plan`,
`backfill`, `deploy`, and `fetch` commands.
	- Added comprehensive usage instructions and examples for each command.

- **New Features**
	- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration
details.
- Introduced new utility functions for module and variable name
retrieval.
- Added functionality to compute and upload differences in a local
repository.
	- New function to convert JSON to Thrift binary format.

- **Improvements**
- Streamlined function signatures in extraction and compilation modules.
	- Added error handling and logging improvements.
	- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for `backfill`, `deploy`, and `info` functions
in `GroupBy` and `Join`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved BigQuery partition loading mechanism to use more direct
method calls when retrieving partition columns and values.

Note: The changes appear technical and primarily affect internal data
loading processes, with no direct end-user visible impact.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
	- Updated Java version to Corretto 11.0.25.9.1
	- Upgraded Google Cloud SDK (gcloud) from version 504.0.1 to 507.0.0
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary
We have a strict dependency on java 11 for all the dataproc stuff so
it's good to be consistent across our project. Currently only
service_commons package has strict dependency on java 17 so made changes
to be compatible with java 11.

was able to successfully build using both sbt and bazel.

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Configuration**
- Downgraded Java language and runtime versions from 17 to 11 in Bazel
configuration

- **Code Improvement**
- Updated type handling in `RouteHandlerWrapper` method signature for
enhanced type safety

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Set up a Flink job that can take beacon data as avro (configured in gcs)
and emit it at a configurable rate to Kafka. We can use this stream in
our GB streaming jobs

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
Kicked off the job and you can see events flowing in topic
[test-beacon-main](https://console.cloud.google.com/managedkafka/us-central1/clusters/zipline-kafka-cluster/topics/test-beacon-main?hl=en&invt=Abnpeg&project=canary-443022)

- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
  - Added Kafka data ingestion capabilities using Apache Flink.
- Introduced a new driver for streaming events from GCS to Kafka with
configurable delay.

- **Dependencies**
  - Added Apache Flink connectors for Kafka, Avro, and file integration.
- Integrated managed Kafka authentication handler for cloud
environments.

- **Infrastructure**
  - Created new project configuration for Kafka data ingestion.
  - Updated build settings to support advanced streaming workflows.
  - Updated cluster name configuration for Dataproc submitter.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

- the way tables are created can be different depending on the
underlying storage catalog that we are using. In the case of BigQuery,
we cannot issue a spark sql statement to do this operation. Let's
abstract that out into the `Format` layer for now, but eventually we
will need a `Catalog` abstraction that supports this.
- Try to remove the dependency on a sparksession in `Format`, invert the
dependency with a HOF

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
	- Enhanced BigQuery integration with new client parameter
	- Added support for more flexible table creation methods
	- Improved logging capabilities for table operations

- **Bug Fixes**
	- Standardized table creation process across different formats
	- Removed unsupported BigQuery table creation operations

- **Refactor**
	- Simplified table creation and partition insertion logic
- Updated method signatures to support more comprehensive table
management
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary


- https://app.asana.com/0/1208949807589885/1209206040434612/f 
- Support explicit bigquery table creation. 
## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209206040434612
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced BigQuery table creation functionality with improved schema
and partitioning support.
	- Streamlined table creation process in Spark's TableUtils.

- **Refactor**
	- Simplified table existence checking logic.
	- Consolidated import statements for better readability.
	- Removed unused import in BigQuery catalog test.
- Updated import statement in GcpFormatProviderTest for better
integration with Spark BigQuery connector.

- **Bug Fixes**
	- Improved error handling for table creation scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
…obs (#274)

## Summary
Changes to avoid runtime errors while running Spark and Flume jobs on
the cluster using Dataproc submit for the newly built bazel assembly
jars. Successfully ran the DataprocSubmitterTest jobs using the newly
built bazel jars for testing.

Testing Steps
1. Build the assembly jars
```
bazel build //cloud_gcp:lib_deploy.jar
bazel build //flink:assembly_deploy.jar
bazel build //flink:kafka-assembly_deploy.jar
```
2. Copy the jars to gcp account used by our jobs
```
gsutil cp bazel-bin/cloud_gcp/lib_deploy.jar gs://zipline-jars/bazel-cloud-gcp.jar
gsutil cp bazel-bin/flink/assembly_deploy.jar gs://zipline-jars/bazel-flink.jar
gsutil cp bazel-bin/flink/kafka-assembly_deploy.jar gs://zipline-jars/bazel-flink-kafka.jar
```
3. Modify the jar paths in the DataprocSubmitterTest file and run the
tests

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
  - Added Kafka and Avro support for Flink.
  - Introduced new Flink assembly and Kafka assembly binaries.

- **Dependency Management**
  - Updated Maven artifact dependencies, including Kafka and Avro.
- Excluded specific Hadoop and Spark-related artifacts to prevent
runtime conflicts.

- **Build Configuration**
  - Enhanced build rules for Flink and Spark environments.
- Improved dependency management to prevent class compatibility issues.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
The string formatting here was breaking zipline commands when I built a
new wheel:

```
  File "/Users/davidhan/zipline/chronon/dev_chronon/lib/python3.11/site-packages/ai/chronon/repo/hub_uploader.py", line 73
    print(f"\n\nUploading:\n {"\n".join(diffed_entities.keys())}")
                                ^
SyntaxError: unexpected character after line continuation character
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Style**
- Improved string formatting and error message construction for better
code readability
    - Enhanced log message clarity in upload process

Note: These changes are internal improvements that do not impact
end-user functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
- With #263 we control table
creation ourselves. We don't need to rely on indirect writes to then do
the table creation (and partitioning) for us, we just simply use the
storage API to write directly into the table we created. This should be
much more performant and preferred over indirect writes because we don't
need to stage data, then load as a temp BQ table, and it uses the
BigQuery storage API directly.
- Remove configs that are used only for indirect writes

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **Improvements**
- Enhanced BigQuery data writing process with more precise configuration
options.
  - Simplified table creation and partition insertion logic.
- Improved handling of DataFrame column arrangements during data
operations.

- **Changes**
  - Updated BigQuery write method to use a direct writing approach.
- Introduced a new option to prevent table creation if it does not
exist.
  - Modified table creation process to be more format-aware.
  - Streamlined partition insertion mechanism.

These updates improve data management and writing efficiency in cloud
data processing workflows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- Pull alter table properties functionality into `Format` 

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added a placeholder method for altering table properties in BigQuery
format
- Introduced a new method to modify table properties across different
Spark formats
- Enhanced table creation utility to use format-specific property
alteration methods

- **Refactor**
- Improved table creation process by abstracting table property
modifications
- Standardized approach to handling table property changes across
different formats

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
  - Added configurable repartitioning option for DataFrame writes.
- Introduced a new configuration setting to control repartitioning
behavior.
  - Enhanced test suite with functionality to handle empty DataFrames.

- **Chores**
  - Improved code formatting and logging for DataFrame writing process.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: tchow-zlai <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
)

## Summary

we cannot represent absent data in time series with null values because
the thrift serializer doesn't allow nulls in list<double> - so we create
a magic double value (based on string "chronon") to represent nulls

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new constant `magicNullDouble` for handling null value
representations

- **Bug Fixes**
- Improved null value handling in serialization and data processing
workflows

- **Tests**
- Added new test case for `TileDriftSeries` serialization to validate
null value management

The changes enhance data processing consistency and provide a
standardized approach to managing null values across the system.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
This fixes an issue where Infinity/NaN values in drift calculations were
causing JSON parse errors in the frontend. These special values are now
converted to our standard magic null value (-2.7980863399423856E16)
before serialization.

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced error handling for special double values (infinity and NaN)
in data processing
- Improved serialization test case to handle null and special values
more robustly

- **Tests**
- Updated test case for `TileDriftSeries` serialization to cover edge
cases with special double values

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
david-zlai and others added 21 commits March 14, 2025 17:17
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Revised the upload process so that release artifacts are now organized
into distinct directories based on their type. This update separates
application packages into dedicated folders, ensuring a more orderly and
reliable distribution of the latest releases.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved cloud artifact deployment now ensures that release items are
available on both legacy and updated storage paths for enhanced
reliability.
- Enhanced data processing now dynamically determines partition
configurations and provides clear notifications when sub-partition
filtering isn’t supported.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

```bash
➜  chronon git:(tchow/zipline-init) ✗ zipline init
Warning: /Users/thomaschow/zipline-ai/chronon/zipline is not empty. Proceed? [y/N]: Y
Generating scaffolding at /Users/thomaschow/zipline-ai/chronon/zipline...
Project scaffolding created successfully! 🎉
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new CLI tool for safely initializing and scaffolding
projects with user confirmation to avoid accidental overwrites.
- Integrated the new command into the main interface for streamlined
project setup.
- Added a configuration file for custom sample team join and aggregation
settings.

- **Chores**
- Updated dependency management by reintroducing essential packages and
adding new ones.
- Revised package metadata and version requirements to support a newer
Python version.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- Do not use the `DelegatingBigQueryMetastore` as a session catalog,
just have it be a custom catalog.

This will change the following configuration set 


From:

```bash
spark.sql.catalog.spark_catalog.warehouse: "gs://zipline-warehouse-etsy/data/tables/"
spark.sql.catalog.spark_catalog.gcp_location: "us"
spark.sql.catalog.spark_catalog.gcp_project: "etsy-zipline-dev"
spark.sql.catalog.spark_catalog.catalog-impl: org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog
spark.sql.catalog.spark_catalog: ai.chronon.integrations.cloud_gcp.DelegatingBigQueryMetastoreCatalog
spark.sql.catalog.spark_catalog.io-impl: org.apache.iceberg.io.ResolvingFileIO
spark.sql.catalog.default_iceberg:  ai.chronon.integrations.cloud_gcp.DelegatingBigQueryMetastoreCatalog
spark.sql.catalog.default_iceberg.catalog-impl: org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog
spark.sql.catalog.default_iceberg.io-impl: org.apache.iceberg.io.ResolvingFileIO
spark.sql.catalog.default_iceberg.warehouse:  "gs://zipline-warehouse-etsy/data/tables/"
spark.sql.catalog.default_iceberg.gcp_location:  "us"
spark.sql.catalog.default_iceberg.gcp_project: "etsy-zipline-dev"
spark.sql.defaultUrlStreamHandlerFactory.enabled: "false"
spark.kryo.registrator: "ai.chronon.integrations.cloud_gcp.ChrononIcebergKryoRegistrator"
```

to:


```bash

spark.sql.defaultCatalog: "default_iceberg"
spark.sql.catalog.default_iceberg:  "ai.chronon.integrations.cloud_gcp.DelegatingBigQueryMetastoreCatalog"
spark.sql.catalog.default_iceberg.catalog-impl: "org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog"
spark.sql.catalog.default_iceberg.io-impl: "org.apache.iceberg.io.ResolvingFileIO"
spark.sql.catalog.default_iceberg.warehouse:  "gs://zipline-warehouse-etsy/data/tables/"
spark.sql.catalog.default_iceberg.gcp_location:  "us"
spark.sql.catalog.default_iceberg.gcp_project: "etsy-zipline-dev"
spark.sql.defaultUrlStreamHandlerFactory.enabled: "false"
spark.kryo.registrator: "ai.chronon.integrations.cloud_gcp.ChrononIcebergKryoRegistrator"
spark.sql.catalog.default_bigquery: "ai.chronon.integrations.cloud_gcp.DelegatingBigQueryMetastoreCatalog"
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Improved internal table processing by restructuring class integrations
and enhancing error messaging when a table isn’t found.
- **Tests**
- Updated integration settings and adjusted reference parameters to
ensure validations remain aligned with the new catalog implementation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
…493)

## Summary

Adding function stubs and thrift schemas for orchestration with column level lineage.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Streamlined CLI operations with a simplified backfill command and
removal of an obsolete synchronization command.
- Enhanced configuration processing that now automatically compiles and
returns results.
- Expanded physical planning capabilities with new mechanisms for
managing computation graphs.
- Extended tracking of data transformations through improved column
lineage support in metadata.
- Introduction of new classes and methods for managing physical nodes
and graphs, enhancing data processing workflows.
- New interface for orchestrators to handle configuration fetching and
uploading.
- New structures for detailed column lineage tracking throughout data
processing stages.
- Added support for generating Python code from new Thrift definitions
in the testing workflow.

- **Bug Fixes**
- Improved documentation for clarity on the functionality and logic of
certain fields.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: ezvz <[email protected]>
## Summary
Initial skeleton implementation for the execution layer with the
following
1. Custom Thrift payload converter for our inputs/outputs
2. NodeExecutionWorkflow implementation which utilizes
NodeRunnerActivity implementation to recursively trigger and wait for
dependencies before submitting the job for the node
3. NodeExecutionWorker and NodeExecutionWorkflowRunner implementation
for local testing (this can be refactored and modified for the actual
production deployment)
4. Added basic unit tests for the DagExecution workflow and will add
more in future iterations

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new orchestration system for executing node-based
operations with dependency management, including enhanced worker and
runner components.
- Added a new `DummyNode` structure for improved data representation in
workflows.
- Implemented a `ThriftPayloadConverter` for custom serialization and
deserialization of Thrift objects in workflows.
- Developed a `NodeExecutionWorkflow` for managing the execution of
nodes within a Directed Acyclic Graph (DAG).
- Created a `NodeExecutionActivity` interface for handling node
dependencies and job submissions.
- Added a `TaskQueue` trait to facilitate load balancing and independent
scaling of task queues.
- Introduced a `WorkflowOperations` interface for managing workflow
operations and statuses.

- **Documentation**
- Added a README.md file detailing the orchestration module's
functionalities and usage instructions.

- **Tests**
- Expanded test coverage for workflow executions, activity handling, and
serialization, ensuring improved reliability.
- Introduced new unit and integration tests for `NodeExecutionWorkflow`,
`NodeExecutionActivity`, and `ThriftPayloadConverter`.

- **Chores**
- Streamlined build configurations and updated dependency management for
enhanced performance and maintainability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: david-zlai <[email protected]>
Co-authored-by: chewy-zlai <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
Co-authored-by: Ken Morton <[email protected]>
Co-authored-by: Sean Lynch <[email protected]>
Co-authored-by: Sean Lynch <[email protected]>
Co-authored-by: Piyush Narang <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
Co-authored-by: Thomas Chow <[email protected]>
…#489)

## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Refactor**
- Updated the method for loading data in tests to enhance consistency
and abstraction across various cases.
- **Style**
- Reorganized import statements within tests for improved clarity and
maintainability.

*Note: These behind‑the‑scenes improvements bolster overall system
quality without affecting end‑user functionality.*
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

Co-authored-by: Thomas Chow <[email protected]>
… exec nodes (#515)

## Summary
The updated Etsy listing actions GroupBy consists of a select which has
"explode(transform(...))". When we run that through our Validation Flink
job we see no matches. This is due to the explode + transform resulting
in spark splitting up the whole stage code gen into multiple stages (and
one of the stages being a 'GenerateExec' node for the explode). This PR
reworks our CU code quite a bit to support this.
* With these changes we consistently do see the validation upfront
succeeding
* Added a set of tests that mimic the listing actions behavior in terms
of avro shape, mock data examples and confirm they pass
* There are some perf issues still - we see that we need higher //ism to
keep up with the kafka stream (need 96). This is something we can tune
in a follow up.

## Checklist
- [X] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
	- Introduced a new utility for Avro data encoding and deserialization.
	- Added a test suite for validating complex Avro payload processing.
- **Refactor**
- Streamlined data processing and transformation logic for improved
performance and maintainability.
	- Enhanced logging for execution plan processing.
- **Tests**
- Expanded test coverage for Avro deserialization and SQL transformation
scenarios.
	- Added new test cases for list and array containers.
- **Chores**
	- Made minor formatting updates for better code clarity.
	- Deleted obsolete helper methods.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Nikhil Simha <[email protected]>
## Summary
Our wheel seems to be broken after we added the lineage.thrift file - 
```
zipline compile --input_path group_bys/search/beacons/listing.py                                             
Traceback (most recent call last):
  File "/Users/pnarang/workspace/zipline/dev_chronon/bin/zipline", line 5, in <module>
    from ai.chronon.repo.zipline import zipline
  File "/Users/pnarang/workspace/zipline/dev_chronon/lib/python3.11/site-packages/ai/chronon/repo/__init__.py", line 15, in <module>
    from ai.chronon.api.ttypes import GroupBy, Join, StagingQuery, Model
  File "/Users/pnarang/workspace/zipline/dev_chronon/lib/python3.11/site-packages/ai/chronon/api/ttypes.py", line 16, in <module>
    import ai.chronon.lineage.ttypes
ModuleNotFoundError: No module named 'ai.chronon.lineage'
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced the automated generation process to support all Thrift files
in the `api/thrift/` directory, improving efficiency.
- Updated the build routine to streamline the generation of Thrift
files, ensuring consistency and smooth integration.
- Consolidated ignored directory paths in the version control system for
better management.

- **Bug Fixes**
- Removed the deprecated script responsible for generating Python Thrift
files, eliminating associated issues.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit


- **New Features**
- Enhanced data write configuration by adding options for distribution
mode (none) and target file size (512 MB) during the write operation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary
This can help us dig into why our Flink jobs are slow. [Flink
docs](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/flame_graphs/)

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [X] Integration tested - Tested by running on Etsy side
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Enhanced job monitoring through the addition of flame graph generation
for improved REST API insights.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Simplified tests to only have unit tests using PostgresSql test
containers. Deleted integration tests as they are not truly integration
tests and we had to use slightly complicated mix-in composition to
remove duplicate code.

Ideally we should have used test containers that starts PGAdapter
connection to Spanner emulator but currently that implementation doesn't
exist and creating our own had challenges as the docker image for local
PGAdapter doesn't have good multi platform support and are failing on
MAC (only supported on linux) and had challenges making the tests work
on all platforms

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced enhanced batch processing for managing dependency
relationships, strengthening orchestration performance.

- **Chores**
- Streamlined testing infrastructure for improved database management
and tighter code encapsulation.
- Retired outdated integration tests and refined dependency
configurations to optimize the build process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

- Create a `/canary` dir for integration testing
- Migrate the `gcp` integration test script. 

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new configuration files and pipelines for purchase metric
aggregations across production, development, and test environments for
both AWS and GCP.
- New README files added for canary configurations and AWS Zipline
project, providing structured guides and cloud-specific instructions.

- **Documentation**
- Released new and updated guides with cloud-specific instructions and
sample pipelines.

- **Refactor**
- Enhanced clarity and consistency in variable initialization for cloud
file uploads.
- Streamlined quickstart scripts by centralizing repository paths and
removing redundant setup steps.

- **Bug Fixes**
- Improved handling of variable initialization for cloud file uploads,
ensuring consistent behavior across different operational modes.
- Consolidated customer ID handling in upload scripts for improved
usability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

- We want users to be able to specify write options. In the case of
iceberg tables, these can be set at the table level through table
properties. The chronon API already supports table properties, so let's
just ride on those rails.
- Users can set write distribution options through this mechanism:
https://iceberg.apache.org/docs/latest/spark-writes/#writing-distribution-modes



## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enabled dynamic updates of table properties during partition
insertions, offering more flexible table management.
- **Refactor**
- Streamlined table write operations by removing legacy configuration
options to simplify data handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update


<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
  - Streamlined internal table data handling for improved consistency.
  - Removed outdated partition processing functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Thomas Chow <[email protected]>
## Summary

This updates the Push To Canary workflow to push the updated jars and
wheel to the cloud canaries and run integration tests.

The jobs are as follows:
* **build_artifacts** 
builds the jars and wheel from the main branch and uploads them to
github for use in separate jobs simultaneously
* **push_to_aws**
downloads the jars and wheel from github and uploads them to the s3
zipline-artifacts-canary bucket under version "candidate"
* **push_to_gcp**
downloads the jars and wheel from github and uploads them to the gcs
zipline-artifacts-canary bucket under version "candidate"
* **run_aws_integration_tests**
runs the quickstart compile and backfill based on
[scripts/distribution/run_aws_quickstart.sh](https://github.com/zipline-ai/chronon/blob/main/scripts/distribution/run_aws_quickstart.sh)
* **run_gcp_integration_tests**
runs the quickstart compile, backfill, group-by-upload, upload-to-kv,
metadata-upload, and fetch based on
[scripts/distribution/run_gcp_quickstart.sh](https://github.com/zipline-ai/chronon/blob/main/scripts/distribution/run_gcp_quickstart.sh)
* **clean_up_artifacts**
removes the jars and wheel from github to minimize the storage costs.
* **merge_to_main-passing-tests**
if the tests all pass, this will merge the latest commits to the branch
"main-passing-tests". This also creates a txt file artifact marking
which commit sha passed which is saved by github.
* **on_fail_notify_slack**
if the tests fail, this will send a notification to the
"integration-tests" channel of our Slack.

A test run of the workflow can be seen here:
https://github.com/zipline-ai/chronon/actions/runs/13913346353

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **Chores**
- Enhanced our release pipeline to streamline package generation and
distribution.
	- Automated deployment of new build packages across AWS and GCP.
	- Introduced cleanup routines to optimize the post-deployment process.
- Updated job structure for improved artifact management and deployment.
- Added new jobs for building artifacts and uploading them to cloud
storage, including `build_artifacts`, `push_to_aws`, `push_to_gcp`, and
`clean_up_artifacts`.
	- Defined outputs for generated artifacts to improve tracking.
- Improved flexibility in deployment scripts for different environments
(dev and canary).
- Updated scripts to conditionally handle resource management based on
environment settings.
- Introduced environment-specific configurations for AWS and GCP
operations.
	- Added integration testing jobs for AWS and GCP post-deployment.
- Enhanced scripts for AWS and GCP to support environment-specific
operations and configurations.
- Introduced a mechanism to switch between different environments
affecting resource management and command configurations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary
Pulled some code out of the CU hotpath (the row => {...} methods) that
relates to init stuff based on flamegraph debugging.

## Checklist
- [ ] Added Unit Tests
- [X] Covered by existing CI
- [X] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Extended validation reports now include additional row count metrics,
offering enhanced insight into data processing performance.
  
- **Refactor**
- Optimized transformation logic for greater efficiency and consistency.
  - Improved error handling and logging for better performance tracking.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Integrated a new linting tool into the development workflow for
improved code quality.

- **Refactor/Style**
- Streamlined and reorganized code structure by reordering import
statements and adjusting default parameters.
- Enhanced error handling to provide clearer diagnostics without
impacting functionality.

- **Chores**
- Updated configuration and build settings to standardize the codebase
and CI processes.
- Added a new configuration file for the Ruff linter, specifying linting
settings and rules.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary

This PR is mostly a continuation of the initial Tailwind 4
[upgrade](#430) with some
additional best practices and simplification applied and has the goal of
introducing minimal visual changes.

With that said, there are some minor visual changes due to the removal
of all the
[custom](https://github.com/zipline-ai/chronon/pull/505/files#diff-8fdab0c1827a182dfaa61b1253086037a44105350409f6701c49ebe130f6705bL168-L243)
font sizes. This removal helps to simplify the config and usage
(Tailwind already provides many
[sizes](https://tailwindcss.com/docs/font-size) and
[weights](https://tailwindcss.com/docs/font-weight) by default). We can
still refine the sizing if desired with 1-off custom sizes (ex.
`text-[13px]`) but the simplification improved readability. I did
manually refine a few sizes already to tighten up the design.

- Switch from `@tailwindcss/postcss` to `@tailwindcss/vite`
([recommended](https://tailwindcss.com/docs/upgrade-guide#using-vite))
- Migrate config from
[legacy](https://tailwindcss.com/docs/upgrade-guide#using-a-javascript-config-file)
`tailwind.config.js` (js-based) to `app.css` (css-first with directives)
  - Cleanup / simplify
    - Remove unused `@tailwindcss/typography` plugin
- Remove unused container config (no longer
[supported](https://tailwindcss.com/docs/upgrade-guide#container-configuration)
in v4 as well)
- Remove use of deprecated/unsupported
[safelist](https://tailwindcss.com/docs/upgrade-guide#using-a-javascript-config-file)
- Simplify job tracking styling by using css directly (remove js /
tailwind indirection)
- Rename `MetadataTableSection` to `MetadataTable` and replace
`MetadataTable` usage with simple div
      - `MetadataTableSection` was the actual `Table`
- Removes dynamic `grid-col-{$columns}` class string which isn't
[supported](https://tailwindcss.com/docs/detecting-classes-in-source-files#dynamic-class-names)
by Tailwind (especially without `safelist` support)
- Update
[LayerChart](https://github.com/techniq/layerchart/pull/449/files) and
[Svelte UX](techniq/svelte-ux#571) to
`2.0.0@next` (improved compat with Tailwind 4 and
[later](techniq/layerchart#458) Svelte 5)

## Screenshots

### Dark mode

Before | After
--- | ---

![image](https://github.com/user-attachments/assets/2dc3a358-2146-4973-b997-e5005d83bbe1)
|
![image](https://github.com/user-attachments/assets/541b78b2-20a0-48e5-8142-3a396a71170d)

![image](https://github.com/user-attachments/assets/c938134b-95bc-4acf-84bb-56f9cc4fefd2)
|
![image](https://github.com/user-attachments/assets/441be3de-bc18-43ad-93a5-4e1331a3c2a5)

![image](https://github.com/user-attachments/assets/550e998a-0629-49e6-9a5f-5a0ef3aa00b1)
|
![image](https://github.com/user-attachments/assets/e1a331d4-f0ee-4083-991c-6ce88aeee385)

![image](https://github.com/user-attachments/assets/94a77457-b1c2-49fc-94ae-b4a2aa14b8ab)
|
![image](https://github.com/user-attachments/assets/4c060367-f54f-4f44-b79e-90f2958eb019)


### Light mode

Before | After
--- | ---

![image](https://github.com/user-attachments/assets/953859d2-e056-4a0b-b272-b95b303a4bc9)
|
![image](https://github.com/user-attachments/assets/20667afd-49cb-4182-bf9f-ec7f6f14031c)

![image](https://github.com/user-attachments/assets/de159fe6-eace-45b2-a741-4e6600aa1580)
|
![image](https://github.com/user-attachments/assets/82b332ae-865b-4f1a-b61f-8b0be178ff9f)

![image](https://github.com/user-attachments/assets/deb19053-c8a9-4205-b8d4-0f594f97fb35)
|
![image](https://github.com/user-attachments/assets/28819745-0598-4eca-a65b-942bd5db79e6)

![image](https://github.com/user-attachments/assets/c46b9a2f-8250-42a0-a7c1-991053df3abf)
|
![image](https://github.com/user-attachments/assets/7fdfd8cb-b8c6-44e7-a225-d381956b67a6)




## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced enhanced theming with a comprehensive set of CSS custom
properties for improved color schemes and typography.
- Streamlined metadata and status displays for a cleaner and more
intuitive presentation.

- **Style**
- Refined UI elements including navigation bars, headers, and buttons
for better readability and visual consistency.
- Updated layout adjustments across key pages, offering a more cohesive
look and feel.

- **Chores**
- Upgraded dependencies and optimized build configurations while
removing legacy styling tools for improved performance.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Sean Lynch <[email protected]>
## Summary

Ran the canary integration tests! 

```bash

                    <-----------------------------------------------------------------------------------
                    ------------------------------------------------------------------------------------
                                                      DATAPROC LOGS
                    ------------------------------------------------------------------------------------
                    ------------------------------------------------------------------------------------>

++ cat tmp_metadata_upload.out
++ grep 'Dataproc submitter job id'
++ cut -d ' ' -f5
+ METADATA_UPLOAD_JOB_ID=0f0a7099-3f03-4487-b291-de26f3d56cdb
+ check_dataproc_job_state 0f0a7099-3f03-4487-b291-de26f3d56cdb
+ JOB_ID=0f0a7099-3f03-4487-b291-de26f3d56cdb
+ '[' -z 0f0a7099-3f03-4487-b291-de26f3d56cdb ']'
+ echo -e '\033[0;32m <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>\033[0m'
 <<<<<<<<<<<<<<<<-----------------JOB STATUS----------------->>>>>>>>>>>>>>>>>
++ gcloud dataproc jobs describe 0f0a7099-3f03-4487-b291-de26f3d56cdb --region=us-central1 --format=flattened
++ grep status.state:
+ JOB_STATE='status.state:                    DONE'
+ echo status.state: DONE
status.state: DONE
+ '[' -z 'status.state:                    DONE' ']'
+ echo -e '\033[0;32m<<<<<.....................................FETCH.....................................>>>>>\033[0m'
<<<<<.....................................FETCH.....................................>>>>>
+ touch tmp_fetch.out
+ zipline run --repo=/Users/thomaschow/zipline-ai/chronon/api/py/test/canary --mode fetch --conf=production/group_bys/gcp/purchases.v1_dev -k '{"user_id":"5"}' --name gcp.purchases.v1_dev
+ tee tmp_fetch.out
+ grep -q purchase_price_average_14d
+ cat tmp_fetch.out
+ grep purchase_price_average_14d
  "purchase_price_average_14d" : 72.5,
+ '[' 0 -ne 0 ']'
+ echo -e '\033[0;32m<<<<<.....................................SUCCEEDED!!!.....................................>>>>>\033[0m'
<<<<<.....................................SUCCEEDED!!!.....................................>>>>>
```

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
	- Upgraded several core dependency versions from 1.5.2 to 1.6.1.
- Updated artifact metadata, including revised checksums and additional
dependency entries, to enhance concurrency handling and HTTP client
support.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: Thomas Chow <[email protected]>
## Summary

Adding stubs and schemas for orchestrator

## Checklist
- [x] Added Unit Tests
- [x] Covered by existing CI
- [x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced an orchestration service with HTTP endpoints for health
checks, configuration display, and diff uploads.
- Expanded branch mapping capabilities and enhanced data exchange
processes.
- Added new data models and database access logic for configuration
management.
- Implemented a new upload handler for managing configuration uploads
and diff operations.
- Added a new `UploadHandler` class for handling uploads and diff
requests.
- Introduced a new `WorkflowHandler` class for future workflow
management.

- **Refactor**
- Streamlined job orchestration for join, merge, and source operations
by adopting unified node-based configurations.
- Restructured metadata and persistence handling for greater clarity and
efficiency.
- Updated import paths to reflect new organizational structures within
the codebase.

- **Chore**
  - Updated dependency integrations and package structures.
- Removed obsolete components and tests to improve overall
maintainability.
- Introduced new test specifications for validating database access
logic.
  - Added new tests for the `NodeDao` class to ensure functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2025

Walkthrough

This update adds a new test case in the API to verify that a base64-encoded tile key written in Flink format is properly deserialized using TilingUtils.deserializeTileKey. Additionally, the access modifier of the avroConvertTileToPutRequest method in the Flink codec function is changed from public to private, ensuring internal encapsulation without affecting functionality.

Changes

File(s) Change Summary
api/src/test/scala/ai/chronon/.../TilingUtilSpec.scala Added a test case to decode a base64-encoded tile key and verify the non-null deserialized output using TilingUtils.deserializeTileKey.
flink/src/main/scala/ai/chronon/.../AvroCodecFn.scala Changed the access modifier of avroConvertTileToPutRequest in TiledAvroCodecFn from public to private; functionality remains unchanged.

Possibly related PRs

Suggested reviewers

  • tchow-zlai

Poem

In our code realm, a test takes flight,
A secret function tucked away from sight,
Base64 mysteries now unfurled,
Private methods shape a safer world,
CodeRabbit strides with a joyful spark!
🚀 Happy coding in the light of the dark!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a29efd2 and 3871d23.

📒 Files selected for processing (2)
  • api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1 hunks)
  • flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: non_spark_tests
  • GitHub Check: frontend_tests
  • GitHub Check: build-and-push
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala (1)

156-156: Good encapsulation improvement.

Making this method private properly restricts its visibility to the class scope, which aligns with its internal-only usage pattern.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1)

27-32: Test only verifies non-null, needs to check key properties

Test verifies deserialization succeeds but doesn't validate key contents. Consider assertions for dataset, keyBytes, etc.

  "TilingUtils" should "deser Flink written tile key" in {
    val base64String = "GChTRUFSQ0hfQkVBQ09OU19MSVNUSU5HX0FDVElPTlNfU1RSRUFNSU5HGWMCkJHL6Q0WgLq3AxaAyJ/uuWUA" //  "Aiw=" // "ArL4u9gL"
    val bytes = java.util.Base64.getDecoder.decode(base64String)
    val deserializedKey = TilingUtils.deserializeTileKey(bytes)
    deserializedKey should not be null
+   deserializedKey.getDataset should be("SEARCH_BEACONS_LISTING_ACTIONS_STREAMING")
+   deserializedKey.getTileSizeMillis should be > 0L
+   deserializedKey.getTileStartTimestampMillis should be > 0L
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between e7706f0 and a29efd2.

📒 Files selected for processing (2)
  • api/src/main/scala/ai/chronon/api/TilingUtils.scala (2 hunks)
  • api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
api/src/test/scala/ai/chronon/api/test/TilingUtilSpec.scala (1)
api/src/main/scala/ai/chronon/api/TilingUtils.scala (2)
  • TilingUtils (11-35)
  • deserializeTileKey (30-34)
🔇 Additional comments (4)
api/src/main/scala/ai/chronon/api/TilingUtils.scala (4)

17-19: Protocol change needs validation

Changed from binary to compact serialization. Ensure compatibility with existing data.


22-24: Protocol change needs compatibility testing

Switching from binary to compact deserialization may affect reading existing data.


26-28: Serialization method update looks good

Updated to use compact serializer.


30-34: Deserialization method update looks good

Updated to use compact deserializer.

@nikhil-zlai nikhil-zlai force-pushed the nikhil/tile_serde_bug branch from a29efd2 to 3871d23 Compare March 25, 2025 21:09
@nikhil-zlai nikhil-zlai changed the title fix: use compact serde for tiling chore: refactor flink code to re-use stream creation logic Mar 25, 2025
@nikhil-zlai nikhil-zlai changed the title chore: refactor flink code to re-use stream creation logic fix: use tiling flag from groupby + code re-use Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants