Python compile and API skeleton/docs #113

varant-zlai · 2024-12-06T18:46:47Z

Summary

Initial implementation of Python compile -- compiles the whole repo
Placeholder implementation of backfill -- next step will be to wire up to the orchestrator for implementation
Documentation for the broader user API surface area and how it should behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

Next steps:

Connect with ZiplineHub for lineage, and backfill execution

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

Documentation
- Expanded CLI documentation with detailed descriptions of plan, backfill, deploy, and fetch commands.
- Added comprehensive usage instructions and examples for each command.
New Features
- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration details.
- Introduced new utility functions for module and variable name retrieval.
- Added functionality to compute and upload differences in a local repository.
- New function to convert JSON to Thrift binary format.
Improvements
- Streamlined function signatures in extraction and compilation modules.
- Added error handling and logging improvements.
- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for backfill, deploy, and info functions in GroupBy and Join.

coderabbitai · 2024-12-06T18:46:55Z

Walkthrough

This pull request enhances the Chronon CLI documentation and repository management functionality. It introduces comprehensive documentation for CLI commands such as plan, backfill, deploy, and fetch. Additionally, multiple Python files in the api/py/ai/chronon directory have been modified to add new functions for configuration management, object extraction, and repository interactions, improving error handling and method bindings.

Changes

File	Change Summary
`cli_readme.md`	Added detailed sections for `plan`, `backfill`, `deploy`, and `fetch` commands with usage examples.
`api/py/ai/chronon/repo/compilev2.py`	Updated `extract_and_convert` function signature; added new utility functions for configuration management.
`api/py/ai/chronon/utils.py`	Added `get_mod_and_var_name_from_gc` function for module and variable name retrieval.
`api/py/ai/chronon/repo/extract_objects.py`	Modified `from_folder` and `from_file` functions; added `from_folderV2` with enhanced error handling.
`api/py/ai/chronon/repo/runner.py`	Introduced `ConfigDetails` class; added functions for repository and branch management.
`api/py/ai/chronon/repo/__init__.py`	Added `OUTPUT_ROOT` constant.
`api/py/ai/chronon/repo/hub_uploader.py`	Added functions for computing and uploading repository differences.
`api/py/ai/chronon/repo/serializer.py`	Added `json2binary` function for JSON to Thrift binary conversion.
`api/py/ai/chronon/group_by.py`	Added method bindings for `backfill`, `deploy`, and `info`.
`api/py/ai/chronon/join.py`	Added method bindings for `backfill`, `upload`, and `info`.

Possibly related PRs

Build zipline-ai with Commands "zipline compile" and "zipline run" #161: The changes in this PR involve the introduction of command-line commands related to the Zipline CLI, which aligns with the enhancements made to the cli_readme.md in the main PR, particularly regarding the deploy and fetch commands.
Connect run.py to DataprocSubmitter.scala so that offline jobs can be run on Dataproc #186: This PR connects the run.py script to the DataprocSubmitter, enhancing the functionality of the CLI commands related to job submissions, similar to the detailed descriptions of commands in the main PR.
interactive mode #247: The introduction of interactive mode for executing configurations in notebooks relates to the enhancements in the CLI commands, as both aim to improve user interaction with the system.

Suggested Reviewers

tchow-zlai
nikhil-zlai
sean-zlai
piyush-zlai

Poem

🚀 Chronon's CLI, a tale unfurled
Commands dancing, features swirled
Backfill, deploy, fetch with grace
Repository magic in every trace
Code poetry in motion's embrace! 🌟

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c7b29fc and 718cef7.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/hub_uploader.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/py/ai/chronon/repo/hub_uploader.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)

cli_readme.md (2)
4-10: Enhance Plan command documentation clarity

The explanation of the plan command could be more precise and include:

What specific types of changes are tracked (schema changes, transformations, etc.)

A more structured example with before/after scenarios

Clearer explanation of environment behavior

Consider expanding the explanation like this:
The `plan` command analyzes and reports:
- Schema changes (added/removed columns)
- Transformation logic changes
- Dependency changes
- Configuration updates

Example:
```bash
# Initial GroupBy configuration
zipline plan
> No changes detected

# After adding a new column 'revenue'
zipline plan
> Changes detected:
> - GroupBy 'daily_metrics': Added column 'revenue'
🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

6-6: Fix grammar and style issues

Several grammar improvements are needed:

Line 6: Separate "however" with semicolons or periods

Line 8: Add comma after "By default"

Line 21: Change "This produces" to "These produce"

Line 39: Remove "of" for conciseness

Apply these corrections:
-plan, you would see no change to your `GroupBy` `backfill` plan, however, there would still be
+plan, you would see no change to your `GroupBy` `backfill` plan; however, there would still be

-By default it is the branch
+By default, it is the branch

-This produces computed values
+These produce computed values

-it will run for all of the `GroupBy`s
+it will run for all the `GroupBy`s
Also applies to: 8-8, 21-21, 39-39

🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 60c30cb and 06f71bb.

📒 Files selected for processing (1)

cli_readme.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.35.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai · 2024-12-06T18:48:14Z

cli_readme.md

+
+# Plan
+
+Describes the logical changes in your current environment versus the last run for each of the changed entities.
+
+For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+
+`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.
+
+Note that `plan` does not fully describe any particular computation because it is agnostic of date ranges and other computation-specific arguments. To see a full compute plan for a given `backfill`, for example, you can call the `backfill` command with the `--plan` argument.
+
+### Usage: `zipline plan [OPTIONS]`
+
+Options:
+```
+--env         the environment against which to compare the local state. Can be set to `prod` or a different `branch_id`. Defaults to the current branch if one is set, else `prod`.
+```
+
+# Backfill
+
+Runs a backfill for the specified entity and date range. This produces computed values for the `Entity`s defined transformation. Commonly used with `Join` to produce point-in-time correct historical data of feature values. For `GroupBy`s it produces snapshots of values as of some boundary (usually midnight on each day in the range), and for `StagingQuery` it simply runs the provided SQL to produce output in the range.
+
+### Usage: `zipline backfill ENTITY_ID [OPTIONS]`
+
+### Valid entity types: `GroupBy`, `Join`, `StagingQuery`
+
+Options:
+
+```
+--env                 the environment to backfill into. Defaults to the current branch if one is set, otherwise is a required argument. If set to `prod` then it will overwrite production tables - use with caution.
+--start               start date for which you want data backfilled for this entity. Defaults to the configured start date.
+--end                 end date for which you want data backfilled for this entity. Defaults to today - 2 days.
+--force-recompute     recomputes the backfill for the specified entity, even if the date range is already present. For Joins, also recomputes all join_part intermediate tables. Defaults to false.
+--plan                visualizes the computation entailed in this backfill, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
+```
+
+# Upload
+
+Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.
+
+After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
+
+### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+
+### Valid entity types: `GroupBy`, `Join`
+
+Options:
+
+```
+--ds        The date to use for the batch upload. Defaults to 
+--stream    Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).
+--plan                visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
+```
+
+
+# Fetch
+
+Fetches data for the given `Entity` and keys. Useful for testing online serving.
+
+### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+
+### Valid entity types: `GroupBy`, `Join`
+
+Options:
+
+```
+--keys    the keys to use in the fetch request, map of key name to value. Required argument.
+```


🛠️ Refactor suggestion

Improve document structure and formatting

The document needs some structural improvements:

Add a main title at the top (e.g., "Zipline CLI Documentation")

Fix heading hierarchy - use consistent levels (h1 for main title, h2 for commands, h3 for subsections)

Add language specifiers to code blocks (e.g., ```bash for command examples)

Here's how to fix the heading structure:

+# Zipline CLI Documentation + -# Plan +## Plan -### Usage: `zipline plan [OPTIONS]` +#### Usage: `zipline plan [OPTIONS]` -# Backfill +## Backfill -### Usage: `zipline backfill ENTITY_ID [OPTIONS]` +#### Usage: `zipline backfill ENTITY_ID [OPTIONS]`

And for code blocks:

-``` +```bash --env the environment against which to compare...

<details> <summary>🧰 Tools</summary> <details> <summary>🪛 LanguageTool</summary> [typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence. Context: ...change to your `GroupBy` `backfill` plan, however, there would still be a column addition ... (HOWEVER_SENTENCE) --- [uncategorized] ~8-~8: Did you mean: “By default,”? Context: ...s run against a particular environment. By default it is the branch which you are authorin... (BY_DEFAULT_COMMA) --- [typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence. Context: ...is the branch which you are authoring on, however, it can also be set to `prod` in which c... (HOWEVER_SENTENCE) --- [grammar] ~21-~21: Did you mean “these”? Context: ...or the specified entity and date range. This produces computed values for the `Entit... (THIS_NNS_VB) --- [style] ~39-~39: Consider removing “of” to be more concise Context: ...`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as we... (ALL_OF_THE) </details> <details> <summary>🪛 Markdownlint (0.35.0)</summary> 12-12: Expected: h2; Actual: h3 Heading levels should only increment by one level at a time (MD001, heading-increment) --- 23-23: Expected: h2; Actual: h3 Heading levels should only increment by one level at a time (MD001, heading-increment) --- 43-43: Expected: h2; Actual: h3 Heading levels should only increment by one level at a time (MD001, heading-increment) --- 60-60: Expected: h2; Actual: h3 Heading levels should only increment by one level at a time (MD001, heading-increment) --- 15-15: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 29-29: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 49-49: null Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 66-66: null Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details>

coderabbitai · 2024-12-06T18:48:14Z

cli_readme.md

+Runs a backfill for the specified entity and date range. This produces computed values for the `Entity`s defined transformation. Commonly used with `Join` to produce point-in-time correct historical data of feature values. For `GroupBy`s it produces snapshots of values as of some boundary (usually midnight on each day in the range), and for `StagingQuery` it simply runs the provided SQL to produce output in the range.
+
+### Usage: `zipline backfill ENTITY_ID [OPTIONS]`
+
+### Valid entity types: `GroupBy`, `Join`, `StagingQuery`
+
+Options:
+
+```
+--env                 the environment to backfill into. Defaults to the current branch if one is set, otherwise is a required argument. If set to `prod` then it will overwrite production tables - use with caution.
+--start               start date for which you want data backfilled for this entity. Defaults to the configured start date.
+--end                 end date for which you want data backfilled for this entity. Defaults to today - 2 days.
+--force-recompute     recomputes the backfill for the specified entity, even if the date range is already present. For Joins, also recomputes all join_part intermediate tables. Defaults to false.
+--plan                visualizes the computation entailed in this backfill, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
+```


⚠️ Potential issue

Add critical operational details to Backfill command

The backfill documentation is missing important operational details:

Expected date formats for --start and --end

Error handling and recovery procedures

Concurrent execution behavior

Resource requirements and execution time expectations

Please add these missing details. For example:

Options: --start Start date in YYYY-MM-DD format --end End date in YYYY-MM-DD format Error Handling: - If a backfill fails, it can be safely retried - Use --force-recompute to restart a failed job from the beginning Performance: - Multiple backfills can run concurrently for different entities - Large date ranges may take several hours to complete

🧰 Tools

🪛 LanguageTool

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

🪛 Markdownlint (0.35.0)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai · 2024-12-06T18:48:15Z

cli_readme.md

+--ds        The date to use for the batch upload. Defaults to 
+--stream    Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).
+--plan                visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
+```


⚠️ Potential issue

Fix incomplete option description and add missing details

The --ds option description is cut off, and critical operational details are missing.

Fix the incomplete description and add missing details:

--- --ds The date to use for the batch upload. Defaults to +-- --ds The date to use for the batch upload. Defaults to current date. Format: YYYY-MM-DD +-- +-- Important Notes: +-- - Ensure backfill is complete before uploading +-- - Streaming jobs will process data after the specified --ds date +-- - Data consistency checks are performed before upload

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

--ds The date to use for the batch upload. Defaults to

--stream Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).

--plan visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.

```

--ds The date to use for the batch upload. Defaults to current date. Format: YYYY-MM-DD

Important Notes:

- Ensure backfill is complete before uploading

- Streaming jobs will process data after the specified --ds date

- Data consistency checks are performed before upload

--stream Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).

--plan visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.

cli_readme.md

tchow-zlai · 2024-12-07T00:04:40Z

cli_readme.md

+
+### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+
+### Valid entity types: `GroupBy`, `Join`


Does the user need to specify an entity type if we're already asking for the entity_id? Or do we want to skip a lookup by asking them to specify?

Ah I was thinking they don't need to specify the type, but just documenting it here. If they try on a different entity we would just err.

tchow-zlai · 2024-12-07T00:10:09Z

cli_readme.md

+
+Options:
+```
+--env         the environment against which to compare the local state. Can be set to `prod` or a different `branch_id`. Defaults to the current branch if one is set, else `prod`.


It looks like branches and env are used interchangeably here - I think what we're really shooting for is env rather than branch. The distinction is that these environments are global (and probably lower cardinality than literal git "branches"), and multiple users would develop against the same environment, and their changes would be seen by others. Does that match your understanding?

Ok yeah that' a good simplification, we should update accordingly.

env is too general imo - users and our code will be referring to git branches any way - and we probably also will map env to branch 1:1. I think branch is a simpler and easier term to use.

How do we capture legit environments like qa? The branch master in qa and master in prod in most companies tend to run slightly different jobs (qa master might be test GropuBys / Joins as the prod datasets arent present in qa)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

cli_readme.md (2)
21-21: Fix grammar in description

Change "This produces" to "These produce" for better grammar as it refers to multiple computed values.
-This produces computed values for the `Entity`s defined transformation.
+These produce computed values for the `Entity`s defined transformation.
🧰 Tools

🪛 LanguageTool

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

39-39: Improve conciseness

Remove unnecessary words while maintaining clarity.
-If run for a `Join`, it will run for all of the `GroupBy`s included in that join
+If run for a `Join`, it will run for all `GroupBy`s included in that join
🧰 Tools

🪛 LanguageTool

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 06f71bb and cda621b.

📒 Files selected for processing (1)

cli_readme.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)

cli_readme.md (2)

1-68: Improve document structure and formatting

The document structure needs improvement:

Add a main title (e.g., "Zipline CLI Documentation")
Fix heading hierarchy (use h1 for title, h2 for commands, h3 for subsections)
Add language specifiers to code blocks (e.g., ```bash)

🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

27-35: Add critical operational details

The backfill documentation needs additional operational details:

Date format specifications
Error handling procedures
Performance expectations
Resource requirements

🧰 Tools

🪛 Markdownlint (0.37.0)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai · 2024-12-17T04:41:51Z

cli_readme.md

+For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+
+`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.


🛠️ Refactor suggestion

Improve clarity and fix grammatical issues

The sentences using "however" need restructuring for better readability. Additionally, based on previous discussions, we should clarify the distinction between environments and branches.

-If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan. +If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan. -`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment +`plan` is always run against a particular environment. By default, it runs against your current development environment. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.

`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.

For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

`plan` is always run against a particular environment. By default, it runs against your current development environment. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment, reflecting what would happen should you merge and deploy your change.

🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

api/py/ai/chronon/api_helpers.py (3)

5-6: Consider passing OUTPUT_ROOT from environment config.

Hardcoding "production" may not suit all environments.

8-22: Make target directories configurable or documented.

It might be safer to externalize or document the set of target dirs.

72-75: Rename or remove the test function.

It only prints the caller file. If it's for debugging, clarify its purpose.
cli_readme.md (4)
6-6: Rephrase “however” for clarity.

Example: use a period and start a new sentence instead, per style guidelines.
-..., however, there would ...
+... . However, there would ...
🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

8-8: Add comma after “By default”.

Enhances sentence readability.
-By default it is
+By default, it is
🧰 Tools

🪛 LanguageTool

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

12-12: Adopt consistent heading levels.

Use H2 for major sections, H3 for sub-sections. Current usage triggers lint warnings.

Also applies to: 23-23, 43-43, 60-60

🧰 Tools

🪛 Markdownlint (0.37.0)

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: Specify language for fenced code blocks.

Markdownlint suggests adding a language type (e.g., ```bash).

Also applies to: 29-29, 49-49, 66-66

🧰 Tools

🪛 Markdownlint (0.37.0)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between cda621b and 88149e1.

📒 Files selected for processing (3)

api/py/ai/chronon/api_helpers.py (1 hunks)
api/py/ai/chronon/group_by.py (2 hunks)
cli_readme.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (6)

api/py/ai/chronon/api_helpers.py (3)

24-45: Handle errors from extract_and_convert.callback.

Add try-catch or a validation step to handle any failures gracefully.

47-65: Ensure variable_name is always defined.

If it's missing, the returned path might be invalid.

67-70: Unused parameters in backfill.

They're never referenced in the function body. Confirm usage or remove them.

api/py/ai/chronon/group_by.py (2)

19-19: Import looks fine.

Ensures backfill is directly accessible. No issues found.

589-590: Binding backfill to group_by is correct.

Enables in-place usage of the method. Looks good.

cli_readme.md (1)

60-60: ⚠️ Potential issue

Correct usage label.

It should say “fetch” rather than “upload” under the Fetch heading.

-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

Likely invalid or redundant comment.

🧰 Tools

🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

api/py/ai/chronon/repo/compilev2.py (2)
52-55: Refactor to a ternary operator.
Shorten the log level assignment:
- if debug:
-     log_level = logging.DEBUG
- else:
-     log_level = logging.INFO
+ log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

52-55: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

62-62: Remove unnecessary .keys().
Use if d in FOLDER_NAME_TO_CLASS: for clarity.
- obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()]
+ obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS]
🧰 Tools

🪛 Ruff (0.8.2)

62-62: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
api/py/ai/chronon/repo/compile.py (1)
94-94: Remove .keys() usage.
- obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()]
+ obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS]
🧰 Tools

🪛 Ruff (0.8.2)

94-94: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
api/py/ai/chronon/group_by.py (1)
23-23: Remove unused import.
types isn't referenced in the final implementation.
- import types
🧰 Tools

🪛 Ruff (0.8.2)

23-23: types imported but unused

Remove unused import: types

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 88149e1 and c968f31.

📒 Files selected for processing (5)

api/py/ai/chronon/api_helpers.py (1 hunks)
api/py/ai/chronon/group_by.py (2 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/utils.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/compile.py

94-94: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

api/py/ai/chronon/repo/compilev2.py

52-55: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

62-62: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

81-81: Undefined name input_path

(F821)

api/py/ai/chronon/group_by.py

23-23: types imported but unused

Remove unused import: types

(F401)

🔇 Additional comments (3)

api/py/ai/chronon/utils.py (1)

187-200: Validate dynamic imports and performance.
Ensure error handling around importlib.import_module(mod_name) in case the module is missing or invalid. Also verify that frequent gc.collect() calls won't impair performance.

api/py/ai/chronon/api_helpers.py (1)

1-39: General checks
This addition looks consistent. No immediate issues found.

api/py/ai/chronon/group_by.py (1)

589-589: Consider enabling the backfill binding.
It is commented out. If it’s intentional, remove the unused code. Otherwise, uncomment it for functionality.

coderabbitai · 2024-12-27T06:40:32Z

api/py/ai/chronon/repo/compilev2.py

+        elif os.path.isfile(full_input_path):
+            assert full_input_path.endswith(
+                ".py"
+            ), f"Input Path: {input_path} isn't a python file"


⚠️ Potential issue

Undefined variable input_path.
Replace input_path with the correct variable or define it in scope.

- assert full_input_path.endswith(".py"), f"Input Path: {input_path} isn't a python file" + assert full_input_path.endswith(".py"), f"Input Path: {full_input_path} isn't a python file"

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

81-81: Undefined name input_path

(F821)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (12)

api/py/ai/chronon/repo/runner.py (2)
7-7: Make OUTPUT_ROOT configurable
- OUTPUT_ROOT = "production"
+ OUTPUT_ROOT = os.getenv("OUTPUT_ROOT", "production")
63-67: Backfill not implemented

Let me know if you'd like a stub implementation or usage example.
api/py/ai/chronon/repo/extract_objects.py (2)

Line range hint 25-37: Redundant folder processing

from_folderV2 partly duplicates from_folder; consider merging functionality.

43-66: Error capture

If multiple files fail, consider collating failures into a single error log for better visibility.
api/py/ai/chronon/repo/compile.py (2)
92-92: Drop .keys()
- d in FOLDER_NAME_TO_CLASS.keys()
+ d in FOLDER_NAME_TO_CLASS
123-125: Raise specialized exception

Use a more specific exception type for clarity.
api/py/ai/chronon/repo/compilev2.py (6)
55-58: Use ternary
- if debug:
-     log_level = logging.DEBUG
- else:
-     log_level = logging.INFO
+ log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

65-65: Simplify membership check
- if d in FOLDER_NAME_TO_CLASS.keys():
+ if d in FOLDER_NAME_TO_CLASS:
🧰 Tools

🪛 Ruff (0.8.2)

65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

109-109: Unused variable

Remove target_object_written if not used.

🧰 Tools

🪛 Ruff (0.8.2)

109-109: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

152-152: Unused variable

Remove e if it's not used.

🧰 Tools

🪛 Ruff (0.8.2)

152-152: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

189-189: Simplify membership check
- for file in compile_errors.keys()
+ for file in compile_errors
🧰 Tools

🪛 Ruff (0.8.2)

189-189: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

273-273: Unused variable

class_name is never used. Remove it.

🧰 Tools

🪛 Ruff (0.8.2)

273-273: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c968f31 and b45b935.

📒 Files selected for processing (4)

api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/repo/extract_objects.py (3 hunks)
api/py/ai/chronon/repo/runner.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/compile.py

94-94: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

109-109: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

152-152: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

189-189: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

273-273: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (3)

api/py/ai/chronon/repo/runner.py (1)

31-31: Remove hardcoded "group_bys" if other types are expected

Ensure this reflects all relevant object types.

api/py/ai/chronon/repo/extract_objects.py (1)

83-83: from_file usage

No issues spotted here.

api/py/ai/chronon/repo/compile.py (1)

98-100: Index safety

path_split[0] could fail if path_split is empty. Verify input handling.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

api/py/ai/chronon/repo/compilev2.py (4)
105-106: Remove unused variable.

The variable target_object_written is assigned but never used.
- if obj == target_object:
-     target_object_written = True
🧰 Tools

🪛 Ruff (0.8.2)

106-106: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

149-150: Use underscore for unused exception.

Replace unused exception variable with underscore.
- except Exception as e:
+ except Exception as _:
🧰 Tools

🪛 Ruff (0.8.2)

149-149: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

186-186: Simplify dictionary iteration.

Remove redundant .keys() call.
- "\n".join([f"- {os.path.relpath(file, chronon_root_path)}" for file in compile_errors.keys()]) +
+ "\n".join([f"- {os.path.relpath(file, chronon_root_path)}" for file in compile_errors]) +
🧰 Tools

🪛 Ruff (0.8.2)

186-186: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

259-259: Remove unused variable.

The variable class_name is assigned but never used.
- class_name = obj_class.__name__
🧰 Tools

🪛 Ruff (0.8.2)

259-259: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between b45b935 and 22e301f.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/compilev2.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

106-106: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

149-149: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

186-186: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

259-259: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (2)

api/py/ai/chronon/repo/compilev2.py (2)

190-202: LGTM!

Clean implementation with proper type hints and nested metadata handling.

233-256: LGTM!

Robust implementation with proper validation and error handling.

api/py/ai/chronon/repo/compilev2.py

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

api/py/ai/chronon/repo/runner.py (3)
13-23: Consider adding error handling for directory traversal.

The while loop could potentially reach the root directory without finding target directories.
 while current_path != os.path.dirname(current_path):  # Stop at root directory
+    if current_path == '/':  # Additional safety check
+        break
     dir_name = os.path.basename(current_path)
31-37: Add type hints for better code maintainability.

The function parameters and return values lack type annotations.
-def get_target_compiled_file(zipline_obj):
+def get_target_compiled_file(zipline_obj: Any) -> Tuple[str, str, str]:
102-103: Add logging for better observability.

The placeholder implementations should log their status.
-    print("\n\n TODO -- Implement \n\n")
+    logging.info("TODO: Implement upload functionality")
Also applies to: 117-118, 134-134
api/py/ai/chronon/repo/compilev2.py (1)
55-58: Simplify log level assignment.

Use ternary operator for cleaner code.
-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 22e301f and 825202f.

📒 Files selected for processing (2)

api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/repo/runner.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

106-106: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

149-149: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

186-186: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

259-259: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (1)

api/py/ai/chronon/repo/compilev2.py (1)

291-292: ⚠️ Potential issue

Add required arguments to main function.

The function call is missing required arguments.

 if __name__ == "__main__":
-    extract_and_convert()
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--chronon-root", required=True)
+    parser.add_argument("--output-root", required=True)
+    parser.add_argument("--target-object", required=True)
+    parser.add_argument("--target-object-file", required=True)
+    args = parser.parse_args()
+    extract_and_convert(
+        args.chronon_root,
+        args.output_root,
+        args.target_object,
+        args.target_object_file
+    )

Likely invalid or redundant comment.

coderabbitai · 2024-12-31T23:49:31Z

api/py/ai/chronon/repo/runner.py

+    module_file, compiled_file, chronon_root = get_target_compiled_file(zipline_obj)
+    extract_and_convert(chronon_root, chronon_root, zipline_obj, module_file)
+    # TODO: actually kick off backfill for compiled_file
+    print("\n\n TODO -- Implement \n\n")


💡 Codebase verification

Add compilation status check before backfill

The extract_and_convert function in compilev2.py already handles compilation failures for the target object by exiting with a failure message. However, the runner code doesn't verify this status.

Add error handling around extract_and_convert call in runner.py to verify compilation success

Check if compiled_file exists before proceeding with backfill

🔗 Analysis chain

Verify compilation success before proceeding.

The backfill function should verify if compilation was successful.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if compiled file exists fd -t f "$(basename $compiled_file)" "$(dirname $compiled_file)"

Length of output: 282

Script:

#!/bin/bash # Find extract_and_convert function implementation ast-grep --pattern 'def extract_and_convert($$$)' # Find get_target_compiled_file function implementation ast-grep --pattern 'def get_target_compiled_file($$$)'

Length of output: 108

Script:

#!/bin/bash # Search for function implementations rg "def (extract_and_convert|get_target_compiled_file)" -A 10

Length of output: 2564

api/py/ai/chronon/repo/compilev2.py

nikhil-zlai

nits and some refactoring.

I also think coderabbit suggestions on readme are reasonable

nikhil-zlai · 2025-01-03T00:19:39Z

api/py/ai/chronon/group_by.py

 import ai.chronon.api.common.ttypes as common
 import ai.chronon.utils as utils
 from copy import deepcopy
+#from ai.chronon.api_helpers import backfill


nikhil-zlai · 2025-01-03T00:19:49Z

api/py/ai/chronon/group_by.py

 import logging
 import inspect
 import json
+import types


nikhil-zlai · 2025-01-03T00:19:54Z

api/py/ai/chronon/group_by.py

        derivations=derivations,
    )
    validate_group_by(group_by)
+    #group_by.backfill = types.MethodType(backfill, group_by)


api/py/ai/chronon/repo/compilev2.py

api/py/ai/chronon/repo/extract_objects.py

nikhil-zlai · 2025-01-03T00:31:06Z

api/py/ai/chronon/repo/runner.py

+    # Get the file where zipline_obj is defined
+    # zipline_obj_file = inspect.getmodule(zipline_obj).__file__
+    # if not zipline_obj_file:
+    #     raise ValueError("Could not determine file location for zipline_obj")
+


api/py/ai/chronon/repo/runner.py

nikhil-zlai · 2025-01-03T00:37:37Z

api/py/ai/chronon/repo/runner.py

+    zipline_obj_module, zipline_var_name = get_mod_and_var_name_from_gc(zipline_obj, zipline_obj_type)
+
+    if zipline_obj_module is None or zipline_var_name is None:    
+        raise ValueError("Could not determine module and variable name for object")
+
+    import importlib
+    module = importlib.import_module(zipline_obj_module)
+
+    # find the file of the module
+    module_file = inspect.getmodule(module).__file__
+    if not module_file:
+        raise ValueError(f"Could not determine file location for module {zipline_obj_module}")


this whole bit is better encapsulated as a dataclass

@dataclass class ConfigDetails: module: str variable: str file: str compiled_file: str def __init__(self, obj, cls): self.module = ... self.variable = ... self.file = ... self.compiled_file = ...

piyush-zlai · 2025-01-03T04:13:19Z

cli_readme.md

+
+Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.
+
+After calling `Upload` for any `Entity`, you can then call `fetch` to get values.


we need the streaming job deployed too in some scenarios so we'll need upload + stream (or deploy) prior to fetching..

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 825202f and 5b07ac4.

📒 Files selected for processing (1)

cli_readme.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (4)

cli_readme.md (4)

1-1: Add main title and improve document structure

Add a main title (e.g., "Zipline CLI Documentation") at the top.

29-35: Add missing operational details for backfill

Add date formats, error handling, and performance expectations.

🧰 Tools

🪛 Markdownlint (0.37.0)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

6-8: 🛠️ Refactor suggestion

Fix grammar and clarify environment terminology

Previous discussions indicate confusion about branches vs environments. Let's clarify this and fix grammar.
-For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` always runs against a specific environment (e.g., development, qa, prod). By default, it uses your current environment. However, when set to `prod`, it will display how your local changes would affect the production environment
Likely invalid or redundant comment.

🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

50-50: ⚠️ Potential issue

Complete the --ds option description

Based on previous discussion, complete the description.
--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)
Likely invalid or redundant comment.

coderabbitai · 2025-01-03T05:42:12Z

cli_readme.md

+### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+
+### Valid entity types: `GroupBy`, `Join`
+
+Options:
+
+```
+--keys    the keys to use in the fetch request, map of key name to value. Required argument.
+```


⚠️ Potential issue

Fix command usage and add examples

Command shows incorrect usage and lacks examples.

-### Usage: `zipline upload ENTITY_ID [OPTIONS]` +### Usage: `zipline fetch ENTITY_ID [OPTIONS]` +Example: +```bash +zipline fetch user_metrics --keys '{"user_id": "12345"}' +```

🧰 Tools

🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai · 2025-01-03T05:42:13Z

cli_readme.md

+
+Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.
+
+After calling `Upload` for any `Entity`, you can then call `fetch` to get values.


🛠️ Refactor suggestion

Clarify streaming job requirements

Add note about streaming job deployment requirements.

-After calling `Upload` for any `Entity`, you can then call `fetch` to get values. +After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

After calling `Upload` for any `Entity`, you can then call `fetch` to get values.

After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/compilev2.py (2)
116-129: Extract online join validation logic.

Move validation logic to a separate function for better maintainability.
+def _validate_online_join(obj: Join):
+    offline_gbs = [jp.groupBy.metaData.name for jp in obj.joinParts if not jp.groupBy.metaData.online]
+    assert not offline_gbs, (
+        "You must make all dependent GroupBys `online` if you want to make your join `online`. "
+        "You can do this by passing the `online=True` argument to the GroupBy constructor. "
+        "Fix the following: {}".format(offline_gbs)
+    )
+
146-155: Track ZiplineHub integration TODO.

The TODO comment indicates missing ZiplineHub integration for lineage tracking.

Would you like me to create a GitHub issue to track the ZiplineHub integration task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 5b07ac4 and 774df11.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/compilev2.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

113-113: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

193-193: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

266-266: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (3)

api/py/ai/chronon/repo/compilev2.py (3)

156-159: 🛠️ Refactor suggestion

Include error details in warning message.

Add exception details to help with debugging.

-    except Exception as e:
-        
-        _print_warning("Failed to connect to ZiplineHub (check connection and VPN settings). \n\n " + 
+    except Exception as e:
+        _print_warning(f"Failed to connect to ZiplineHub: {str(e)}\nCheck connection and VPN settings.\n\n" +

Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

112-113: 🛠️ Refactor suggestion

Remove unused variable.

Replace with logging statement.

-                if obj == target_object:
-                    target_object_written = True
+                if obj == target_object:
+                    logging.info(f"Successfully wrote target object: {target_object}")

Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

113-113: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)

298-299: ⚠️ Potential issue

Add required arguments to main function.

The function requires mandatory arguments that are missing.

 if __name__ == "__main__":
-    extract_and_convert()
+    import argparse
+    parser = argparse.ArgumentParser(description='Compile Chronon repository')
+    parser.add_argument('--chronon-root', required=True, help='Path to chronon root')
+    parser.add_argument('--output-root', required=True, help='Path to output root')
+    parser.add_argument('--target-object', required=True, help='Target object name')
+    parser.add_argument('--target-object-file', required=True, help='Target object file')
+    parser.add_argument('--debug', action='store_true', help='Enable debug logging')
+    args = parser.parse_args()
+    
+    extract_and_convert(
+        chronon_root=args.chronon_root,
+        output_root=args.output_root,
+        target_object=args.target_object,
+        target_object_file=args.target_object_file,
+        debug=args.debug
+    )

Likely invalid or redundant comment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

api/py/ai/chronon/repo/extract_objects.py (1)
43-64: Consider adding docstring parameters.

The docstring should document the target_file parameter and return values.
 def from_folderV2(full_path: str, target_file: str, cls: type):
     """
     Recursively consumes a folder, and constructs a map of 
     object qualifier to StagingQuery, GroupBy, or Join
+
+    Args:
+        full_path: Path to the folder to process
+        target_file: Specific file to track errors for
+        cls: Type of object to extract
+
+    Returns:
+        Tuple containing:
+        - Dict mapping object qualifiers to (object, source_file) tuples
+        - Dict mapping files to their errors
+        - Error specific to target_file if any
     """
api/py/ai/chronon/repo/runner.py (2)
7-10: Remove unused imports.

These folder name constants are imported but never used.
 from ai.chronon.repo import (
     FOLDER_NAME_TO_CLASS,
-    JOIN_FOLDER_NAME,
-    GROUP_BY_FOLDER_NAME,
-    STAGING_QUERY_FOLDER_NAME,
-    MODEL_FOLDER_NAME,
     OUTPUT_ROOT
 )
🧰 Tools

🪛 Ruff (0.8.2)

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)

8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)

9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)

10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)

62-66: Remove commented code.

Delete the unused code block for finding zipline_obj file.
-    # Get the file where zipline_obj is defined
-    # zipline_obj_file = inspect.getmodule(zipline_obj).__file__
-    # if not zipline_obj_file:
-    #     raise ValueError("Could not determine file location for zipline_obj")
api/py/ai/chronon/repo/compilev2.py (4)
55-58: Simplify log level assignment.

Use a ternary operator for cleaner code.
-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

67-67: Optimize dictionary key check.

Use direct key membership test.
-    obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()]
+    obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS]
🧰 Tools

🪛 Ruff (0.8.2)

67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

133-143: Add TODO tracking for ZiplineHub integration.

The placeholder implementation needs tracking.

Would you like me to create a GitHub issue to track the ZiplineHub integration task?

253-264: Remove unused variable.

The class_name variable is assigned but never used.
 def _write_obj_as_json(name: str, obj: object, output_file: str, obj_class: type):
-    class_name = obj_class.__name__
     output_folder = os.path.dirname(output_file)
🧰 Tools

🪛 Ruff (0.8.2)

254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 774df11 and e70963a.

📒 Files selected for processing (4)

api/py/ai/chronon/repo/__init__.py (1 hunks)
api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/repo/extract_objects.py (3 hunks)
api/py/ai/chronon/repo/runner.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

api/py/ai/chronon/repo/init.py

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/runner.py

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)

8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)

9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)

10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

181-181: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (3)

api/py/ai/chronon/repo/extract_objects.py (1)

Line range hint 25-41: LGTM! Clean implementation of folder traversal.

The function correctly processes Python files recursively and handles exceptions appropriately.

api/py/ai/chronon/repo/runner.py (2)

14-28: LGTM! Robust root directory detection.

The function correctly traverses up the directory tree to find the Chronon root.

90-93: Add compilation status check before backfill.

Verify compilation success before proceeding with backfill.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (9)

cli_readme.md (3)
1-2: Add title and introduction

Add a main title and brief introduction to set context.
+# Zipline CLI Documentation
+
+Command-line interface for managing Zipline data pipelines.
+
 # Plan
6-8: Fix sentence structure

Improve readability by properly structuring sentences with "however".
-If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` is always run against a particular environment. By default, it runs against your current development environment. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment.
Also applies to: 8-8

🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

15-17: Add language specifiers to code blocks

Add bash language specifier to CLI command examples.
-```
+```bash
 --branch         the branch against which to compare...
Also applies to: 29-35, 49-53, 66-68

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</blockquote></details>
<details>
<summary>api/py/ai/chronon/repo/extract_objects.py (1)</summary><blockquote>

Line range hint `25-41`: **Improve error handling.**

Consider collecting errors in a dictionary and returning them along with results, similar to `from_folderV2`.

```diff
 def from_folder(full_path: str, cls: type, log_level=logging.INFO):
     if full_path.endswith("/"):
         full_path = full_path[:-1]
 
     python_files = glob.glob(os.path.join(full_path, "**/*.py"), recursive=True)
     result = {}
+    errors = {}
     for f in python_files:
         try:
             result.update(from_file(f, cls, log_level))
         except Exception as e:
-            logging.error(f"Failed to extract: {f}")
-            logging.exception(e)
+            errors[f] = e
-    return result
+    return result, errors
api/py/ai/chronon/repo/compilev2.py (2)
128-160: Implement ZiplineHub integration.

The function is a placeholder with good error handling but needs implementation.

Would you like me to help implement the ZiplineHub integration for lineage tracking?

253-264: Remove unused variable.

The class_name variable is assigned but never used.
 def _write_obj_as_json(name: str, obj: object, output_file: str, obj_class: type):
-    class_name = obj_class.__name__
     output_folder = os.path.dirname(output_file)
🧰 Tools

🪛 Ruff (0.8.2)

254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)
api/py/ai/chronon/repo/runner.py (2)
62-66: Remove commented code block.

The commented code block should be removed if no longer needed.
-    # Get the file where zipline_obj is defined
-    # zipline_obj_file = inspect.getmodule(zipline_obj).__file__
-    # if not zipline_obj_file:
-    #     raise ValueError("Could not determine file location for zipline_obj")
5-12: Remove unused imports.

The following imports are not used:

JOIN_FOLDER_NAME

GROUP_BY_FOLDER_NAME

STAGING_QUERY_FOLDER_NAME

MODEL_FOLDER_NAME

🧰 Tools

🪛 Ruff (0.8.2)

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)

8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)

9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)

10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)
api/py/ai/chronon/utils.py (1)

195-197: Consider performance optimization.

The iteration through all module variables could be inefficient for large modules. Consider caching results or using more efficient lookup methods if this becomes a bottleneck.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between e70963a and 03443f8.

📒 Files selected for processing (6)

api/py/ai/chronon/repo/__init__.py (1 hunks)
api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/repo/extract_objects.py (3 hunks)
api/py/ai/chronon/repo/runner.py (1 hunks)
api/py/ai/chronon/utils.py (1 hunks)
cli_readme.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/py/ai/chronon/repo/init.py

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/runner.py

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)

8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)

9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)

10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

181-181: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (6)

cli_readme.md (2)

60-60: ⚠️ Potential issue

Fix incorrect command in usage section

The fetch command usage shows incorrect command.
-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`
Likely invalid or redundant comment.

🧰 Tools

🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

50-50: ⚠️ Potential issue

Complete the --ds option description

The --ds option description is incomplete.
---ds        The date to use for the batch upload. Defaults to 
+--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)
Likely invalid or redundant comment.
api/py/ai/chronon/repo/extract_objects.py (2)

43-64: LGTM! Well-structured error handling.

Good separation of target file errors and tracking of file origins in results.

82-82: LGTM! Clean and focused implementation.
api/py/ai/chronon/repo/runner.py (2)

14-28: LGTM! Clean directory traversal implementation.

90-93: ⚠️ Potential issue

Add error handling for extract_and_convert.

The function should handle potential errors from extract_and_convert.
     module_file, compiled_file, chronon_root = get_target_compiled_file(zipline_obj)
-    extract_and_convert(chronon_root, chronon_root, zipline_obj, module_file)
+    try:
+        extract_and_convert(chronon_root, chronon_root, zipline_obj, module_file)
+    except Exception as e:
+        raise ValueError(f"Failed to compile {module_file}: {str(e)}")
     # TODO: actually kick off backfill for compiled_file
Likely invalid or redundant comment.

coderabbitai · 2025-01-04T01:01:48Z

api/py/ai/chronon/repo/compilev2.py

+def extract_and_convert(
+    chronon_root, output_root, target_object, target_object_file, debug=False
+):


⚠️ Potential issue

Add force_overwrite parameter.

The function uses force_overwrite in _write_obj but it's not in the signature.

def extract_and_convert( - chronon_root, output_root, target_object, target_object_file, debug=False + chronon_root, output_root, target_object, target_object_file, debug=False, force_overwrite=False ):

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def extract_and_convert(

chronon_root, output_root, target_object, target_object_file, debug=False

):

def extract_and_convert(

chronon_root, output_root, target_object, target_object_file, debug=False, force_overwrite=False

):

coderabbitai · 2025-01-04T01:01:49Z

api/py/ai/chronon/utils.py

+    module = importlib.import_module(mod_name)
+    for var_name, value in vars(module).items():
+        if value is obj:
+            return mod_name, var_name


⚠️ Potential issue

Add error handling for module import.

- module = importlib.import_module(mod_name) - for var_name, value in vars(module).items(): - if value is obj: - return mod_name, var_name + try: + module = importlib.import_module(mod_name) + for var_name, value in vars(module).items(): + if value is obj: + return mod_name, var_name + except ImportError: + return mod_name, None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

module = importlib.import_module(mod_name)

for var_name, value in vars(module).items():

if value is obj:

return mod_name, var_name

try:

module = importlib.import_module(mod_name)

for var_name, value in vars(module).items():

if value is obj:

return mod_name, var_name

except ImportError:

return mod_name, None

coderabbitai · 2025-01-04T01:01:49Z

api/py/ai/chronon/utils.py

+def get_mod_and_var_name_from_gc(obj, mod_prefix):
+    # Find the variable name within the module
+    mod_name = get_mod_name_from_gc(obj, mod_prefix)
+    """Get the variable name that points to the obj in the module"""
+    if not mod_name:
+        return None
+
+    module = importlib.import_module(mod_name)
+    for var_name, value in vars(module).items():
+        if value is obj:
+            return mod_name, var_name
+
+    return mod_name, None
+


🛠️ Refactor suggestion

Add type hints and proper docstring.

-def get_mod_and_var_name_from_gc(obj, mod_prefix): - # Find the variable name within the module - mod_name = get_mod_name_from_gc(obj, mod_prefix) - """Get the variable name that points to the obj in the module""" +def get_mod_and_var_name_from_gc(obj: object, mod_prefix: str) -> tuple[str | None, str | None]: + """Get the module and variable name that points to the object. + + Args: + obj: Object to find in garbage collector + mod_prefix: Module prefix to filter by + + Returns: + Tuple of (module_name, variable_name) or (module_name, None) if variable not found + """ + mod_name = get_mod_name_from_gc(obj, mod_prefix)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_mod_and_var_name_from_gc(obj, mod_prefix):

# Find the variable name within the module

mod_name = get_mod_name_from_gc(obj, mod_prefix)

"""Get the variable name that points to the obj in the module"""

if not mod_name:

return None

module = importlib.import_module(mod_name)

for var_name, value in vars(module).items():

if value is obj:

return mod_name, var_name

return mod_name, None

def get_mod_and_var_name_from_gc(obj: object, mod_prefix: str) -> tuple[str | None, str | None]:

"""Get the module and variable name that points to the object.

Args:

obj: Object to find in garbage collector

mod_prefix: Module prefix to filter by

Returns:

Tuple of (module_name, variable_name) or (module_name, None) if variable not found

"""

mod_name = get_mod_name_from_gc(obj, mod_prefix)

if not mod_name:

return None

module = importlib.import_module(mod_name)

for var_name, value in vars(module).items():

if value is obj:

return mod_name, var_name

return mod_name, None

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (9)

cli_readme.md (1)
6-8: Improve readability and clarify environment terminology
-If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+If you run `backfill` for the `GroupBy` and then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, the column addition would still be reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` always runs against a particular environment. By default, it runs against your development environment. However, it can also be set to `prod` to display the plan of your local changes against the production environment
🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)
api/py/ai/chronon/repo/extract_objects.py (1)
43-47: Complete the docstring
 """
-    Recursively consumes a folder, and constructs a map of 
-    object qualifier to StagingQuery, GroupBy, or Join
+    Recursively consumes a folder and constructs a map of object qualifiers to their implementations.
+    
+    Args:
+        full_path: Path to the folder to process
+        target_file: Specific file to track errors for
+        cls: Type of objects to extract (StagingQuery, GroupBy, or Join)
+    
+    Returns:
+        Tuple of (results dict, errors dict, target file error)
 """
api/py/ai/chronon/repo/runner.py (3)
27-28: Chain exceptions for better debugging
-            raise ValueError(f"Can only run one of {valid_types}, got {type(obj).__name__}")
+            raise ValueError(f"Can only run one of {valid_types}, got {type(obj).__name__}") from None
🧰 Tools

🪛 Ruff (0.8.2)

28-28: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

94-95: Implement backfill functionality

The TODO comment indicates missing implementation.

Would you like me to help implement the backfill functionality or create a GitHub issue to track this task?

98-146: Well-documented placeholder functions

Docstrings are comprehensive. Functions need implementation.

Would you like me to help implement these functions or create GitHub issues to track their implementation?
api/py/ai/chronon/repo/compilev2.py (4)
38-38: Remove unused constant.

The DEFAULT_TEAM_NAME constant is not used anywhere in the code.
-DEFAULT_TEAM_NAME = "default"
55-58: Use ternary operator for brevity.
-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

137-146: Track TODO for ZiplineHub integration.

The TODO comment indicates missing ZiplineHub integration for lineage and schema display.

Would you like me to create a GitHub issue to track this task?

257-257: Remove unused variable.

The class_name variable is assigned but never used.
-    class_name = obj_class.__name__
🧰 Tools

🪛 Ruff (0.8.2)

257-257: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 03443f8 and 0ddaabc.

📒 Files selected for processing (6)

api/py/ai/chronon/repo/__init__.py (1 hunks)
api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/repo/extract_objects.py (3 hunks)
api/py/ai/chronon/repo/runner.py (1 hunks)
api/py/ai/chronon/utils.py (1 hunks)
cli_readme.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

api/py/ai/chronon/repo/init.py
api/py/ai/chronon/utils.py

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/runner.py

28-28: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

70-70: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

184-184: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

257-257: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (6)

cli_readme.md (3)

41-41: 🛠️ Refactor suggestion

Add streaming job deployment requirement
-After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
+After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.
Likely invalid or redundant comment.

60-60: ⚠️ Potential issue

Fix incorrect command in usage section
-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`
Likely invalid or redundant comment.

🧰 Tools

🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

50-50: ⚠️ Potential issue

Complete the --ds option description
---ds        The date to use for the batch upload. Defaults to 
+--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)
Likely invalid or redundant comment.
api/py/ai/chronon/repo/extract_objects.py (2)

Line range hint 25-41: LGTM!

Function simplification improves clarity.

Line range hint 82-101: LGTM!

Function simplification improves clarity.
api/py/ai/chronon/repo/compilev2.py (1)

45-47: ⚠️ Potential issue

Add force_overwrite parameter.

The function uses force_overwrite in _write_obj but it's not in the signature.
-def extract_and_convert(
-    chronon_root, target_object, target_object_file, debug=False, output_root=None
-):
+def extract_and_convert(
+    chronon_root, target_object, target_object_file, debug=False, output_root=None, force_overwrite=False
+):
Likely invalid or redundant comment.

coderabbitai · 2025-01-05T20:32:03Z

cli_readme.md

+--start               start date for which you want data backfilled for this entity. Defaults to the configured start date.
+--end                 end date for which you want data backfilled for this entity. Defaults to today - 2 days.


🛠️ Refactor suggestion

Add date format specifications

---start start date for which you want data backfilled for this entity. Defaults to the configured start date. ---end end date for which you want data backfilled for this entity. Defaults to today - 2 days. +--start start date (YYYY-MM-DD format) for which you want data backfilled. Defaults to the configured start date. +--end end date (YYYY-MM-DD format) for which you want data backfilled. Defaults to today - 2 days.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

--start start date for which you want data backfilled for this entity. Defaults to the configured start date.

--end end date for which you want data backfilled for this entity. Defaults to today - 2 days.

--start start date (YYYY-MM-DD format) for which you want data backfilled. Defaults to the configured start date.

--end end date (YYYY-MM-DD format) for which you want data backfilled. Defaults to today - 2 days.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/runner.py (2)
64-81: Consider normalizing paths

Add os.path.normpath for consistent path handling across platforms.
-    current_path = os.path.dirname(os.path.abspath(filepath))
+    current_path = os.path.dirname(os.path.normpath(os.path.abspath(filepath)))
104-104: Track TODOs systematically

Consider creating GitHub issues for tracking these TODOs.

Would you like me to create GitHub issues for implementing these functions?

Also applies to: 124-124, 141-141, 160-160

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 334c43b and e16558e.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/runner.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/runner.py

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (2)

api/py/ai/chronon/repo/runner.py (2)

10-62: LGTM! Clean dataclass implementation

Well-structured configuration encapsulation with thorough validation.

🧰 Tools

🪛 Ruff (0.8.2)

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

101-102: Add error handling for compilation

Verify compilation success before proceeding.
-    extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
+    try:
+        extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
+    except Exception as e:
+        raise ValueError(f"Failed to compile object: {e}") from e
+    
+    if not os.path.exists(config_details.compiled_file):
+        raise ValueError("Compilation failed: compiled file not found")

coderabbitai · 2025-01-08T01:15:16Z

api/py/ai/chronon/repo/runner.py

+        except StopIteration:
+            valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
+            raise ValueError(
+                f"Can only run one of {valid_types}, got {type(obj).__name__}"
+            )


🛠️ Refactor suggestion

Add exception chaining

Add from err to preserve traceback:

- except StopIteration: + except StopIteration as err: valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()] - raise ValueError( + raise ValueError( f"Can only run one of {valid_types}, got {type(obj).__name__}" - ) + ) from err

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except StopIteration:

valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]

raise ValueError(

f"Can only run one of {valid_types}, got {type(obj).__name__}"

)

except StopIteration as err:

valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]

raise ValueError(

f"Can only run one of {valid_types}, got {type(obj).__name__}"

) from err

🧰 Tools

🪛 Ruff (0.8.2)

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

api/py/ai/chronon/repo/runner.py (1)
107-160: Add type hints to placeholder functions

Add return type annotations to improve code clarity.
-def upload(zipline_obj, date, force_recompute=False, plan=False):
+def upload(zipline_obj, date, force_recompute=False, plan=False) -> None:

-def stream(zipline_obj):
+def stream(zipline_obj) -> None:

-def info(zipline_obj, branch=None):
+def info(zipline_obj, branch=None) -> None:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between e16558e and 8c9fbfe.

📒 Files selected for processing (2)

api/py/ai/chronon/repo/runner.py (1 hunks)
api/py/ai/chronon/utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/py/ai/chronon/utils.py

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/runner.py

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (1)

api/py/ai/chronon/repo/runner.py (1)

64-81: LGTM!

Well-structured helper function with clear error handling.

coderabbitai · 2025-01-08T01:20:02Z

api/py/ai/chronon/repo/runner.py

+@dataclass
+class ConfigDetails:
+    module: str
+    variable: str
+    file: str
+    compiled_file: str
+    chronon_root: str
+


🛠️ Refactor suggestion

Simplify ConfigDetails implementation

Remove manual __init__ since @dataclass auto-generates it

Add exception chaining

Simplify error messages

@dataclass class ConfigDetails: module: str variable: str file: str compiled_file: str chronon_root: str - def __init__(self, obj): + @classmethod + def from_obj(cls, obj): try: obj_type = next( k for k, v in FOLDER_NAME_TO_CLASS.items() if isinstance(obj, v) ) - except StopIteration: + except StopIteration as err: valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()] raise ValueError( f"Can only run one of {valid_types}, got {type(obj).__name__}" - ) + ) from err module, variable = get_mod_and_var_name_from_gc(obj, obj_type) if module is None or variable is None: raise ValueError("Could not determine module and variable name for object") module_obj = importlib.import_module(module) file = inspect.getmodule(module_obj).__file__ if not file: - raise ValueError( - f""" - Could not determine file location for module {module}, {variable}.\n - Runner currently only supports working on files saved within a valid Chronon - root directory.\n Make sure you have your Zipline python files within the - right directory, and then you can import them to your desired runtime. - """ - ) + raise ValueError(f"Could not determine file location for module {module}, {variable}. Ensure file is within Chronon root directory.") path_parts = module.split(".") if path_parts[0] != obj_type: raise ValueError(f"Expected module path to start with '{obj_type}', got {module}") chronon_root = get_chronon_root(file) team = path_parts[1] path = ".".join(path_parts[2:]) compiled_file = f"{chronon_root}/{OUTPUT_ROOT}/{obj_type}/{team}/{path}.{variable}" + return cls(module=module, variable=variable, file=file, + compiled_file=compiled_file, chronon_root=chronon_root)

Also applies to: 18-62

coderabbitai · 2025-01-08T01:20:02Z

api/py/ai/chronon/repo/runner.py

+def backfill(zipline_obj, start_date, end_date, force_recompute=False, plan=False):
+    """
+    Backfills a Chronon object for a specified date range.
+
+    Args:
+        zipline_obj: The Chronon object (GroupBy, Join, StagingQuery) to backfill
+        start_date: Start date for the backfill period
+        end_date: End date for the backfill period
+        force_recompute: If True, recomputes data even if it already exists (default: False)
+        plan: If True, only shows execution plan without running backfill (default: False)
+
+    Returns:
+        None
+
+    Raises:
+        ValueError: If the object cannot be compiled or backfilled
+    """
+    config_details = ConfigDetails(zipline_obj)
+    # module_file, compiled_file, chronon_root = get_target_compiled_file(zipline_obj)
+    extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
+    # TODO: actually kick off backfill for compiled_file
+    print("\n\n TODO -- Implement \n\n")
+


⚠️ Potential issue

Add compilation verification and update ConfigDetails usage

Verify compilation success

Update to use new factory method

def backfill(zipline_obj, start_date, end_date, force_recompute=False, plan=False): - config_details = ConfigDetails(zipline_obj) + config_details = ConfigDetails.from_obj(zipline_obj) - extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module) + if not extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module): + raise ValueError("Compilation failed") + if not os.path.exists(config_details.compiled_file): + raise ValueError("Compiled file not found")

Committable suggestion skipped: line range outside the PR's diff.

api/py/ai/chronon/repo/compilev2.py

Co-authored-by: Nikhil Simha <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

api/py/ai/chronon/repo/runner.py (3)
28-32: ⚠️ Potential issue

Add exception chaining to preserve traceback

Include the original exception when raising a new one.

Apply this diff:
-            except StopIteration:
+            except StopIteration as err:
                 valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
-                raise ValueError(
+                raise ValueError(
                     f"Can only run one of {valid_types}, got {type(obj).__name__}"
-                )
+                ) from err
🧰 Tools

🪛 Ruff (0.8.2)

30-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

21-66: 🛠️ Refactor suggestion

Refactor ConfigDetails using a factory method

Convert __init__ to a @classmethod for clarity.

Apply this diff:
 @dataclass
 class ConfigDetails:
     module: str
     variable: str
     file: str
     compiled_file: str
     chronon_root: str

-    def __init__(self, obj):
+    @classmethod
+    def from_obj(cls, obj):

         try:
             obj_type = next(
                 k for k, v in FOLDER_NAME_TO_CLASS.items() if isinstance(obj, v)
             )
         except StopIteration as err:
             valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
             raise ValueError(
                 f"Can only run one of {valid_types}, got {type(obj).__name__}"
             ) from err

         # Rest of the code...

+        return cls(
+            module=module,
+            variable=variable,
+            file=file,
+            compiled_file=compiled_file,
+            chronon_root=chronon_root
+        )
🧰 Tools

🪛 Ruff (0.8.2)

30-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

144-160: ⚠️ Potential issue

Update docstring to match parameters

The backfill method's docstring references zipline_obj, which isn't a parameter.

🧹 Nitpick comments (8)

api/py/ai/chronon/repo/extract_objects.py (3)
Line range hint 25-41: Fix incomplete docstring.

Docstring ends mid-sentence and lacks return type.
 def from_folder(full_path: str, cls: type, log_level=logging.INFO):
     """
-    Recursively consumes a folder, and constructs a map
-    Creates a map of object qualifier to
+    Recursively processes Python files in a folder.
+
+    Args:
+        full_path: Directory path to process
+        cls: Type of objects to extract
+        log_level: Logging level
+
+    Returns:
+        dict: Map of object qualifiers to extracted objects
     """
44-66: Add return type hints.

Missing return type annotation for clarity.
-def from_folderV2(full_path: str, target_file: str, cls: type):
+def from_folderV2(full_path: str, target_file: str, cls: type) -> tuple[dict, dict[str, Exception], Exception | None]:
83-86: Use f-strings for better readability.

Replace .format() with f-string.
-    logger.debug("Loading objects of type {cls} from {file_path}".format(**locals()))
+    logger.debug(f"Loading objects of type {cls} from {file_path}")
cli_readme.md (3)
6-8: Fix usage of 'however' and punctuation

Separate 'however' into new sentences and add commas.

Apply this diff:
-...plan, however, there would still...
+...plan. However, there would still...
-...on, however, it can also be...
+...on. However, it can also be...
🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

12-12: Correct heading level

Change heading from H3 to H2 for consistency.

Apply this diff:
-### Usage: `zipline plan [OPTIONS]`
+## Usage: `zipline plan [OPTIONS]`
🧰 Tools

🪛 Markdownlint (0.37.0)

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: Specify language in code block

Add language specifier for syntax highlighting.

Apply this diff:
-```
+```bash
🧰 Tools

🪛 Markdownlint (0.37.0)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)
api/py/ai/chronon/repo/compilev2.py (2)
56-59: Simplify log level assignment

Use a ternary operator for brevity.

Apply this diff:
-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

71-71: Simplify dictionary key check

Use key in dict instead of key in dict.keys().

Apply this diff:
-            d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()
+            d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS
🧰 Tools

🪛 Ruff (0.8.2)

71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between bb6c942 and fc7909f.

📒 Files selected for processing (12)

api/py/ai/chronon/group_by.py (2 hunks)
api/py/ai/chronon/join.py (3 hunks)
api/py/ai/chronon/repo/__init__.py (1 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)
api/py/ai/chronon/repo/compilev2.py (1 hunks)
api/py/ai/chronon/repo/extract_objects.py (3 hunks)
api/py/ai/chronon/repo/hub_uploader.py (1 hunks)
api/py/ai/chronon/repo/runner.py (1 hunks)
api/py/ai/chronon/repo/serializer.py (2 hunks)
api/py/ai/chronon/utils.py (1 hunks)
api/thrift/hub.thrift (1 hunks)
cli_readme.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (7)

api/py/ai/chronon/repo/init.py
api/py/ai/chronon/repo/compile.py
api/thrift/hub.thrift
api/py/ai/chronon/repo/serializer.py
api/py/ai/chronon/utils.py
api/py/ai/chronon/group_by.py
api/py/ai/chronon/repo/hub_uploader.py

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/runner.py

30-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

api/py/ai/chronon/repo/compilev2.py

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

198-198: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (5)

api/py/ai/chronon/repo/extract_objects.py (1)

Line range hint 83-102: Verify module import error handling.

Check if import errors are properly caught and logged.

✅ Verification successful

Import errors are properly handled at caller level

Error handling exists in the calling functions with appropriate logging and error preservation. No changes needed.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for potential import error handling gaps
rg -n "importlib\.(machinery|util)" --type py
Length of output: 186

Script:
#!/bin/bash
# Check for error handling around imports
rg -B 2 -A 2 "try.*:|except.*:|import_module" api/py/ai/chronon/repo/extract_objects.py
Length of output: 1132
api/py/ai/chronon/repo/compilev2.py (1)

264-265: Address commented-out code

Decide whether to implement or remove the safe_to_overwrite check.
api/py/ai/chronon/join.py (3)

27-27: LGTM!

Clean import of MethodType for dynamic method binding.

Line range hint 645-656: LGTM!

Join object creation is well-structured with appropriate parameters.

657-664: Rename upload to deploy for consistency.

Previous review comment suggested renaming upload to deploy for consistency.

Apply this diff:
-    join.upload = MethodType(deploy, join)
+    join.deploy = MethodType(deploy, join)

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

api/py/ai/chronon/repo/compilev2.py (1)

264-265: ⚠️ Potential issue

Implement the safe_to_overwrite check.

The commented code represents a critical safety check. Either implement it or document why it's safe to remove.

🧹 Nitpick comments (2)

api/py/ai/chronon/repo/compilev2.py (2)
56-59: Simplify log_level assignment.

Use ternary operator for better readability.
-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools

🪛 Ruff (0.8.2)

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

54-54: Implement ZiplineHub integration.

The TODO comments indicate missing ZiplineHub integration for lineage and schema functionality.

Would you like me to help implement the ZiplineHub integration or create an issue to track this task?

Also applies to: 143-153

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between fc7909f and aa9830b.

📒 Files selected for processing (1)

api/py/ai/chronon/repo/compilev2.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

api/py/ai/chronon/repo/compilev2.py

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

198-198: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (2)

api/py/ai/chronon/repo/compilev2.py (2)

1-40: LGTM!

The license, imports, and constants are properly defined.

46-48: ⚠️ Potential issue

Add force_overwrite parameter.

Based on usage in _write_obj, the parameter is required.
 def extract_and_convert(
-    chronon_root, target_object, target_object_file, debug=False, output_root=None
+    chronon_root, target_object, target_object_file, debug=False, output_root=None, force_overwrite=False
 ):
Likely invalid or redundant comment.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

api/py/ai/chronon/group_by.py (1)

605-613: Consider using a helper function for method binding.

The method binding implementation is correct but could be more maintainable.

-    # Import locally to avoid circular dependency issue
-    from ai.chronon.repo.runner import backfill, deploy, info
-
-    # Attach functions directly to group_by
-    group_by.backfill = MethodType(backfill, group_by)
-    group_by.deploy = MethodType(deploy, group_by)
-    group_by.info = MethodType(info, group_by)
+    def _bind_methods(obj):
+        # Import locally to avoid circular dependency issue
+        from ai.chronon.repo.runner import backfill, deploy, info
+        
+        methods = {'backfill': backfill, 'deploy': deploy, 'info': info}
+        for name, func in methods.items():
+            setattr(obj, name, MethodType(func, obj))
+
+    _bind_methods(group_by)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 12f5d47 and 4238e17.

📒 Files selected for processing (3)

api/py/ai/chronon/group_by.py (2 hunks)
api/py/ai/chronon/join.py (4 hunks)
api/py/ai/chronon/repo/compile.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

api/py/ai/chronon/repo/compile.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (5)

api/py/ai/chronon/group_by.py (1)

25-25: LGTM!

The import of MethodType is correctly placed with other imports.
api/py/ai/chronon/join.py (4)

27-27: LGTM!

The import is necessary for the method bindings.

73-79: LGTM!

The condition correctly filters out method references from garbage collector.

Line range hint 653-664: LGTM!

The modification allows method binding before returning the Join instance.

665-672: Use consistent method naming: deploy instead of upload.

Rename join.upload to join.deploy for consistency.
-    join.upload = MethodType(deploy, join)
+    join.deploy = MethodType(deploy, join)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

cli_readme.md (1)

74-74: Fix typo.

-Provides information about a given `Entity`, including upstream/downstream lineage and schema informatio.
+Provides information about a given `Entity`, including upstream/downstream lineage and schema information.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4238e17 and c7b29fc.

📒 Files selected for processing (1)

cli_readme.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

45-45: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

62-62: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

76-76: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

51-51: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

68-68: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: no_spark_scala_tests

🔇 Additional comments (5)

cli_readme.md (5)

6-8: Fix grammatical issues and clarify environment terminology.
-For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` is always run against a particular environment. By default, it runs against your current development branch. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment
🧰 Tools

🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

15-17: Add language specifier to code block.
-```
+```bash
 --branch         the branch against which to compare the local state. Can be set to `prod` or a different `branch_id`. Defaults to the current branch if one is set, else `prod`.
<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

---

`31-32`: **Add date format specifications.**

```diff
---start               start date for which you want data backfilled for this entity. Defaults to the configured start date.
---end                 end date for which you want data backfilled for this entity. Defaults to today - 2 days.
+--start               start date (YYYY-MM-DD format) for which you want data backfilled. Defaults to the configured start date.
+--end                 end date (YYYY-MM-DD format) for which you want data backfilled. Defaults to today - 2 days.
52-52: Complete the --ds option description.
---ds        The date to use for the batch upload. Defaults to 
+--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)
62-70: Fix command usage and add examples.
-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

+Example:
+```bash
+zipline fetch user_metrics --keys '{"user_id": "12345"}'
+```
🧰 Tools

🪛 Markdownlint (0.37.0)

62-62: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

68-68: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

nikhil-zlai

Please address the hashing comment:

https://github.com/zipline-ai/chronon/pull/113/files#r1912552717

## Summary - Initial implementation of Python `compile` -- compiles the whole repo - Placeholder implementation of `backfill` -- next step will be to wire up to the orchestrator for implementation - Documentation for the broader user API surface area and how it should behave (should maybe move this to pydocs) Currently, the interaction looks like this: <img width="729" alt="image" src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed" /> Next steps: - Connect with ZiplineHub for lineage, and backfill execution ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [x] Documentation update  ## Summary by CodeRabbit - **Documentation** - Expanded CLI documentation with detailed descriptions of `plan`, `backfill`, `deploy`, and `fetch` commands. - Added comprehensive usage instructions and examples for each command. - **New Features** - Enhanced repository management functions for Chronon objects. - Added methods for compiling, uploading, and managing configuration details. - Introduced new utility functions for module and variable name retrieval. - Added functionality to compute and upload differences in a local repository. - New function to convert JSON to Thrift binary format. - **Improvements** - Streamlined function signatures in extraction and compilation modules. - Added error handling and logging improvements. - Enhanced object serialization and diff computation capabilities. - Dynamic method bindings for `backfill`, `deploy`, and `info` functions in `GroupBy` and `Join`.  --------- Co-authored-by: ezvz <[email protected]> Co-authored-by: Nikhil Simha <[email protected]>

## Summary - Initial implementation of Python `compile` -- compiles the whole repo - Placeholder implementation of `baour clientsfill` -- next step will be to wire up to the orchestrator for implementation - Documentation for the broader user API surface area and how it should behave (should maybe move this to pydocs) Currently, the interaction looks like this: <img width="729" alt="image" src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed" /> Next steps: - Connect with ZiplineHub for lineage, and baour clientsfill execution ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [x] Documentation update  ## Summary by CodeRabbit - **Documentation** - Expanded CLI documentation with detailed descriptions of `plan`, `baour clientsfill`, `deploy`, and `fetch` commands. - Added comprehensive usage instructions and examples for each command. - **New Features** - Enhanced repository management functions for Chronon objects. - Added methods for compiling, uploading, and managing configuration details. - Introduced new utility functions for module and variable name retrieval. - Added functionality to compute and upload differences in a local repository. - New function to convert JSON to Thrift binary format. - **Improvements** - Streamlined function signatures in extraction and compilation modules. - Added error handling and logging improvements. - Enhanced object serialization and diff computation capabilities. - Dynamic method bindings for `baour clientsfill`, `deploy`, and `info` functions in `GroupBy` and `Join`.  --------- Co-authored-by: ezvz <[email protected]> Co-authored-by: Nikhil Simha <[email protected]>

coderabbitai bot reviewed Dec 6, 2024

View reviewed changes

tchow-zlai reviewed Dec 7, 2024

View reviewed changes

coderabbitai bot reviewed Dec 17, 2024

View reviewed changes

varant-zlai force-pushed the cli_documentation branch from cda621b to 88149e1 Compare December 26, 2024 19:02

coderabbitai bot reviewed Dec 26, 2024

View reviewed changes

coderabbitai bot reviewed Dec 27, 2024

View reviewed changes

coderabbitai bot reviewed Dec 31, 2024

View reviewed changes

varant-zlai changed the title ~~Creating docs for CLI~~ Python compile and API skeleton/docs Dec 31, 2024

coderabbitai bot reviewed Dec 31, 2024

View reviewed changes

api/py/ai/chronon/repo/compilev2.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 31, 2024

View reviewed changes

nikhil-zlai requested changes Jan 3, 2025

View reviewed changes

piyush-zlai reviewed Jan 3, 2025

View reviewed changes

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

varant-zlai force-pushed the cli_documentation branch from e70963a to 03443f8 Compare January 4, 2025 00:40

coderabbitai bot reviewed Jan 4, 2025

View reviewed changes

varant-zlai force-pushed the cli_documentation branch from 03443f8 to 0ddaabc Compare January 5, 2025 20:24

coderabbitai bot reviewed Jan 5, 2025

View reviewed changes

coderabbitai bot reviewed Jan 8, 2025

View reviewed changes

ezvz added 7 commits January 11, 2025 16:18

Creating python-based compile flow and API skeleton

64d934e

lint

8f1c1b0

WIP

4cfa8c9

WIP

eea22c8

WIP

64e2f21

WIP

97b2e63

WIP

ad90a0e

nikhil-zlai reviewed Jan 19, 2025

View reviewed changes

api/py/ai/chronon/repo/compilev2.py Show resolved Hide resolved

WIP

a88e5fc

varant-zlai force-pushed the cli_documentation branch from bb6c942 to a88e5fc Compare January 20, 2025 16:03

ezvz and others added 3 commits January 20, 2025 11:04

WIP

0635ebf

Update api/py/ai/chronon/repo/compilev2.py

4d1a289

Co-authored-by: Nikhil Simha <[email protected]>

WIP

aa9830b

varant-zlai force-pushed the cli_documentation branch from fc7909f to aa9830b Compare January 20, 2025 16:12

coderabbitai bot reviewed Jan 20, 2025

View reviewed changes

WIP

12f5d47

varant-zlai force-pushed the cli_documentation branch from f53eeb6 to 12f5d47 Compare January 23, 2025 15:39

Merge branch 'main' into cli_documentation

4238e17

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

WIP

c7b29fc

coderabbitai bot reviewed Jan 23, 2025

View reviewed changes

nikhil-zlai approved these changes Jan 23, 2025

View reviewed changes

WIP

718cef7

varant-zlai merged commit d07aca9 into main Jan 23, 2025
5 checks passed

varant-zlai deleted the cli_documentation branch January 23, 2025 19:07

coderabbitai bot mentioned this pull request Feb 4, 2025

cli v2 #277

Merged

12 tasks

coderabbitai bot mentioned this pull request Mar 3, 2025

Cli improvements #462

Merged

4 tasks

This was referenced Mar 19, 2025

stub/part_1: physical graph for workflow submission + column lineage #493

Merged

compiler cutover #507

Closed

coderabbitai bot mentioned this pull request Apr 16, 2025

feat: improvements to zipline cli #644

Merged

4 tasks

coderabbitai bot mentioned this pull request May 15, 2025

Clean history #779

Closed

4 tasks


		### Usage: `zipline upload ENTITY_ID [OPTIONS]`

		### Valid entity types: `GroupBy`, `Join`

		For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.

		`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.


		Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.

		After calling `Upload` for any `Entity`, you can then call `fetch` to get values.

	After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
	After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.

		--start start date for which you want data backfilled for this entity. Defaults to the configured start date.
		--end end date for which you want data backfilled for this entity. Defaults to today - 2 days.

Python compile and API skeleton/docs #113

Python compile and API skeleton/docs #113

Uh oh!

Conversation

varant-zlai commented Dec 6, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested Reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 27, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 31, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikhil-zlai left a comment

varant-zlai commented Dec 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 6, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)