Skip to content

Conversation

@varant-zlai
Copy link
Collaborator

@varant-zlai varant-zlai commented Dec 6, 2024

Summary

  • Initial implementation of Python compile -- compiles the whole repo
  • Placeholder implementation of backfill -- next step will be to wire up to the orchestrator for implementation
  • Documentation for the broader user API surface area and how it should behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

image

Next steps:

  • Connect with ZiplineHub for lineage, and backfill execution

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Documentation

    • Expanded CLI documentation with detailed descriptions of plan, backfill, deploy, and fetch commands.
    • Added comprehensive usage instructions and examples for each command.
  • New Features

    • Enhanced repository management functions for Chronon objects.
    • Added methods for compiling, uploading, and managing configuration details.
    • Introduced new utility functions for module and variable name retrieval.
    • Added functionality to compute and upload differences in a local repository.
    • New function to convert JSON to Thrift binary format.
  • Improvements

    • Streamlined function signatures in extraction and compilation modules.
    • Added error handling and logging improvements.
    • Enhanced object serialization and diff computation capabilities.
    • Dynamic method bindings for backfill, deploy, and info functions in GroupBy and Join.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 6, 2024

Walkthrough

This pull request enhances the Chronon CLI documentation and repository management functionality. It introduces comprehensive documentation for CLI commands such as plan, backfill, deploy, and fetch. Additionally, multiple Python files in the api/py/ai/chronon directory have been modified to add new functions for configuration management, object extraction, and repository interactions, improving error handling and method bindings.

Changes

File Change Summary
cli_readme.md Added detailed sections for plan, backfill, deploy, and fetch commands with usage examples.
api/py/ai/chronon/repo/compilev2.py Updated extract_and_convert function signature; added new utility functions for configuration management.
api/py/ai/chronon/utils.py Added get_mod_and_var_name_from_gc function for module and variable name retrieval.
api/py/ai/chronon/repo/extract_objects.py Modified from_folder and from_file functions; added from_folderV2 with enhanced error handling.
api/py/ai/chronon/repo/runner.py Introduced ConfigDetails class; added functions for repository and branch management.
api/py/ai/chronon/repo/__init__.py Added OUTPUT_ROOT constant.
api/py/ai/chronon/repo/hub_uploader.py Added functions for computing and uploading repository differences.
api/py/ai/chronon/repo/serializer.py Added json2binary function for JSON to Thrift binary conversion.
api/py/ai/chronon/group_by.py Added method bindings for backfill, deploy, and info.
api/py/ai/chronon/join.py Added method bindings for backfill, upload, and info.

Possibly related PRs

Suggested Reviewers

  • tchow-zlai
  • nikhil-zlai
  • sean-zlai
  • piyush-zlai

Poem

🚀 Chronon's CLI, a tale unfurled
Commands dancing, features swirled
Backfill, deploy, fetch with grace
Repository magic in every trace
Code poetry in motion's embrace! 🌟

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c7b29fc and 718cef7.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/hub_uploader.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/py/ai/chronon/repo/hub_uploader.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)
cli_readme.md (2)

4-10: Enhance Plan command documentation clarity

The explanation of the plan command could be more precise and include:

  1. What specific types of changes are tracked (schema changes, transformations, etc.)
  2. A more structured example with before/after scenarios
  3. Clearer explanation of environment behavior

Consider expanding the explanation like this:

The `plan` command analyzes and reports:
- Schema changes (added/removed columns)
- Transformation logic changes
- Dependency changes
- Configuration updates

Example:
```bash
# Initial GroupBy configuration
zipline plan
> No changes detected

# After adding a new column 'revenue'
zipline plan
> Changes detected:
> - GroupBy 'daily_metrics': Added column 'revenue'
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


6-6: Fix grammar and style issues

Several grammar improvements are needed:

  1. Line 6: Separate "however" with semicolons or periods
  2. Line 8: Add comma after "By default"
  3. Line 21: Change "This produces" to "These produce"
  4. Line 39: Remove "of" for conciseness

Apply these corrections:

-plan, you would see no change to your `GroupBy` `backfill` plan, however, there would still be
+plan, you would see no change to your `GroupBy` `backfill` plan; however, there would still be

-By default it is the branch
+By default, it is the branch

-This produces computed values
+These produce computed values

-it will run for all of the `GroupBy`s
+it will run for all the `GroupBy`s

Also applies to: 8-8, 21-21, 39-39

🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 60c30cb and 06f71bb.

📒 Files selected for processing (1)
  • cli_readme.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.35.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment on lines 1 to 68

# Plan

Describes the logical changes in your current environment versus the last run for each of the changed entities.

For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.

`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.

Note that `plan` does not fully describe any particular computation because it is agnostic of date ranges and other computation-specific arguments. To see a full compute plan for a given `backfill`, for example, you can call the `backfill` command with the `--plan` argument.

### Usage: `zipline plan [OPTIONS]`

Options:
```
--env the environment against which to compare the local state. Can be set to `prod` or a different `branch_id`. Defaults to the current branch if one is set, else `prod`.
```

# Backfill

Runs a backfill for the specified entity and date range. This produces computed values for the `Entity`s defined transformation. Commonly used with `Join` to produce point-in-time correct historical data of feature values. For `GroupBy`s it produces snapshots of values as of some boundary (usually midnight on each day in the range), and for `StagingQuery` it simply runs the provided SQL to produce output in the range.

### Usage: `zipline backfill ENTITY_ID [OPTIONS]`

### Valid entity types: `GroupBy`, `Join`, `StagingQuery`

Options:

```
--env the environment to backfill into. Defaults to the current branch if one is set, otherwise is a required argument. If set to `prod` then it will overwrite production tables - use with caution.
--start start date for which you want data backfilled for this entity. Defaults to the configured start date.
--end end date for which you want data backfilled for this entity. Defaults to today - 2 days.
--force-recompute recomputes the backfill for the specified entity, even if the date range is already present. For Joins, also recomputes all join_part intermediate tables. Defaults to false.
--plan visualizes the computation entailed in this backfill, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
```

# Upload

Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.

After calling `Upload` for any `Entity`, you can then call `fetch` to get values.

### Usage: `zipline upload ENTITY_ID [OPTIONS]`

### Valid entity types: `GroupBy`, `Join`

Options:

```
--ds The date to use for the batch upload. Defaults to
--stream Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).
--plan visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
```


# Fetch

Fetches data for the given `Entity` and keys. Useful for testing online serving.

### Usage: `zipline upload ENTITY_ID [OPTIONS]`

### Valid entity types: `GroupBy`, `Join`

Options:

```
--keys the keys to use in the fetch request, map of key name to value. Required argument.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve document structure and formatting

The document needs some structural improvements:

  1. Add a main title at the top (e.g., "Zipline CLI Documentation")
  2. Fix heading hierarchy - use consistent levels (h1 for main title, h2 for commands, h3 for subsections)
  3. Add language specifiers to code blocks (e.g., ```bash for command examples)

Here's how to fix the heading structure:

+# Zipline CLI Documentation
+
-# Plan
+## Plan

-### Usage: `zipline plan [OPTIONS]`
+#### Usage: `zipline plan [OPTIONS]`

-# Backfill
+## Backfill

-### Usage: `zipline backfill ENTITY_ID [OPTIONS]`
+#### Usage: `zipline backfill ENTITY_ID [OPTIONS]`

And for code blocks:

-```
+```bash
 --env         the environment against which to compare...

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your `GroupBy` `backfill` plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)

---

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)

---

[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to `prod` in which c...

(HOWEVER_SENTENCE)

---

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

---

[style] ~39-~39: Consider removing “of” to be more concise
Context: ...`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as we...

(ALL_OF_THE)

</details>
<details>
<summary>🪛 Markdownlint (0.35.0)</summary>

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

---

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

---

43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

---

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

---

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines 21 to 35
Runs a backfill for the specified entity and date range. This produces computed values for the `Entity`s defined transformation. Commonly used with `Join` to produce point-in-time correct historical data of feature values. For `GroupBy`s it produces snapshots of values as of some boundary (usually midnight on each day in the range), and for `StagingQuery` it simply runs the provided SQL to produce output in the range.

### Usage: `zipline backfill ENTITY_ID [OPTIONS]`

### Valid entity types: `GroupBy`, `Join`, `StagingQuery`

Options:

```
--env the environment to backfill into. Defaults to the current branch if one is set, otherwise is a required argument. If set to `prod` then it will overwrite production tables - use with caution.
--start start date for which you want data backfilled for this entity. Defaults to the configured start date.
--end end date for which you want data backfilled for this entity. Defaults to today - 2 days.
--force-recompute recomputes the backfill for the specified entity, even if the date range is already present. For Joins, also recomputes all join_part intermediate tables. Defaults to false.
--plan visualizes the computation entailed in this backfill, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add critical operational details to Backfill command

The backfill documentation is missing important operational details:

  1. Expected date formats for --start and --end
  2. Error handling and recovery procedures
  3. Concurrent execution behavior
  4. Resource requirements and execution time expectations

Please add these missing details. For example:

Options:
--start               Start date in YYYY-MM-DD format
--end                 End date in YYYY-MM-DD format

Error Handling:
- If a backfill fails, it can be safely retried
- Use --force-recompute to restart a failed job from the beginning

Performance:
- Multiple backfills can run concurrently for different entities
- Large date ranges may take several hours to complete
🧰 Tools
🪛 LanguageTool

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)

🪛 Markdownlint (0.35.0)

23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment on lines 50 to 53
--ds The date to use for the batch upload. Defaults to
--stream Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).
--plan visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incomplete option description and add missing details

The --ds option description is cut off, and critical operational details are missing.

Fix the incomplete description and add missing details:

--- --ds        The date to use for the batch upload. Defaults to 
+-- --ds        The date to use for the batch upload. Defaults to current date. Format: YYYY-MM-DD
+-- 
+-- Important Notes:
+-- - Ensure backfill is complete before uploading
+-- - Streaming jobs will process data after the specified --ds date
+-- - Data consistency checks are performed before upload
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--ds The date to use for the batch upload. Defaults to
--stream Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).
--plan visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.
```
--ds The date to use for the batch upload. Defaults to current date. Format: YYYY-MM-DD
Important Notes:
- Ensure backfill is complete before uploading
- Streaming jobs will process data after the specified --ds date
- Data consistency checks are performed before upload
--stream Only applies to `GroupBy`s that use a streaming source. Runs the streaming job after the batch upload completes (or only runs the streaming job if the batch upload is already completed for the given `ds`).
--plan visualizes the computation entailed in this upload, but does not start compute. Useful to sanity check a job prior to running. If everything looks ok, then rerun the command but omit this flag to begin the job.


### Usage: `zipline upload ENTITY_ID [OPTIONS]`

### Valid entity types: `GroupBy`, `Join`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user need to specify an entity type if we're already asking for the entity_id? Or do we want to skip a lookup by asking them to specify?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was thinking they don't need to specify the type, but just documenting it here. If they try on a different entity we would just err.

cli_readme.md Outdated

Options:
```
--env the environment against which to compare the local state. Can be set to `prod` or a different `branch_id`. Defaults to the current branch if one is set, else `prod`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like branches and env are used interchangeably here - I think what we're really shooting for is env rather than branch. The distinction is that these environments are global (and probably lower cardinality than literal git "branches"), and multiple users would develop against the same environment, and their changes would be seen by others. Does that match your understanding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah that' a good simplification, we should update accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env is too general imo - users and our code will be referring to git branches any way - and we probably also will map env to branch 1:1. I think branch is a simpler and easier term to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we capture legit environments like qa? The branch master in qa and master in prod in most companies tend to run slightly different jobs (qa master might be test GropuBys / Joins as the prod datasets arent present in qa)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
cli_readme.md (2)

21-21: Fix grammar in description

Change "This produces" to "These produce" for better grammar as it refers to multiple computed values.

-This produces computed values for the `Entity`s defined transformation.
+These produce computed values for the `Entity`s defined transformation.
🧰 Tools
🪛 LanguageTool

[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


39-39: Improve conciseness

Remove unnecessary words while maintaining clarity.

-If run for a `Join`, it will run for all of the `GroupBy`s included in that join
+If run for a `Join`, it will run for all `GroupBy`s included in that join
🧰 Tools
🪛 LanguageTool

[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 06f71bb and cda621b.

📒 Files selected for processing (1)
  • cli_readme.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)
cli_readme.md (2)

1-68: Improve document structure and formatting

The document structure needs improvement:

  1. Add a main title (e.g., "Zipline CLI Documentation")
  2. Fix heading hierarchy (use h1 for title, h2 for commands, h3 for subsections)
  3. Add language specifiers to code blocks (e.g., ```bash)
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


27-35: Add critical operational details

The backfill documentation needs additional operational details:

  1. Date format specifications
  2. Error handling procedures
  3. Performance expectations
  4. Resource requirements
🧰 Tools
🪛 Markdownlint (0.37.0)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment on lines +6 to +8
For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.

`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve clarity and fix grammatical issues

The sentences using "however" need restructuring for better readability. Additionally, based on previous discussions, we should clarify the distinction between environments and branches.

-If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` is always run against a particular environment. By default, it runs against your current development environment. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment, reflecting what would happen should you merge and deploy your change.
For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.
`plan` is always run against a particular environment. By default, it runs against your current development environment. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment, reflecting what would happen should you merge and deploy your change.
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
api/py/ai/chronon/api_helpers.py (3)

5-6: Consider passing OUTPUT_ROOT from environment config.

Hardcoding "production" may not suit all environments.


8-22: Make target directories configurable or documented.

It might be safer to externalize or document the set of target dirs.


72-75: Rename or remove the test function.

It only prints the caller file. If it's for debugging, clarify its purpose.

cli_readme.md (4)

6-6: Rephrase “however” for clarity.

Example: use a period and start a new sentence instead, per style guidelines.

-..., however, there would ...
+... . However, there would ...
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


8-8: Add comma after “By default”.

Enhances sentence readability.

-By default it is
+By default, it is
🧰 Tools
🪛 LanguageTool

[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


12-12: Adopt consistent heading levels.

Use H2 for major sections, H3 for sub-sections. Current usage triggers lint warnings.

Also applies to: 23-23, 43-43, 60-60

🧰 Tools
🪛 Markdownlint (0.37.0)

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: Specify language for fenced code blocks.

Markdownlint suggests adding a language type (e.g., ```bash).

Also applies to: 29-29, 49-49, 66-66

🧰 Tools
🪛 Markdownlint (0.37.0)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between cda621b and 88149e1.

📒 Files selected for processing (3)
  • api/py/ai/chronon/api_helpers.py (1 hunks)
  • api/py/ai/chronon/group_by.py (2 hunks)
  • cli_readme.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (6)
api/py/ai/chronon/api_helpers.py (3)

24-45: Handle errors from extract_and_convert.callback.

Add try-catch or a validation step to handle any failures gracefully.


47-65: Ensure variable_name is always defined.

If it's missing, the returned path might be invalid.


67-70: Unused parameters in backfill.

They're never referenced in the function body. Confirm usage or remove them.

api/py/ai/chronon/group_by.py (2)

19-19: Import looks fine.

Ensures backfill is directly accessible. No issues found.


589-590: Binding backfill to group_by is correct.

Enables in-place usage of the method. Looks good.

cli_readme.md (1)

60-60: ⚠️ Potential issue

Correct usage label.

It should say “fetch” rather than “upload” under the Fetch heading.

-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

Likely invalid or redundant comment.

🧰 Tools
🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
api/py/ai/chronon/repo/compilev2.py (2)

52-55: Refactor to a ternary operator.
Shorten the log level assignment:

- if debug:
-     log_level = logging.DEBUG
- else:
-     log_level = logging.INFO
+ log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

52-55: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


62-62: Remove unnecessary .keys().
Use if d in FOLDER_NAME_TO_CLASS: for clarity.

- obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()]
+ obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS]
🧰 Tools
🪛 Ruff (0.8.2)

62-62: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

api/py/ai/chronon/repo/compile.py (1)

94-94: Remove .keys() usage.

- obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()]
+ obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS]
🧰 Tools
🪛 Ruff (0.8.2)

94-94: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

api/py/ai/chronon/group_by.py (1)

23-23: Remove unused import.
types isn't referenced in the final implementation.

- import types
🧰 Tools
🪛 Ruff (0.8.2)

23-23: types imported but unused

Remove unused import: types

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 88149e1 and c968f31.

📒 Files selected for processing (5)
  • api/py/ai/chronon/api_helpers.py (1 hunks)
  • api/py/ai/chronon/group_by.py (2 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/compile.py

94-94: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

api/py/ai/chronon/repo/compilev2.py

52-55: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


62-62: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


81-81: Undefined name input_path

(F821)

api/py/ai/chronon/group_by.py

23-23: types imported but unused

Remove unused import: types

(F401)

🔇 Additional comments (3)
api/py/ai/chronon/utils.py (1)

187-200: Validate dynamic imports and performance.
Ensure error handling around importlib.import_module(mod_name) in case the module is missing or invalid. Also verify that frequent gc.collect() calls won't impair performance.

api/py/ai/chronon/api_helpers.py (1)

1-39: General checks
This addition looks consistent. No immediate issues found.

api/py/ai/chronon/group_by.py (1)

589-589: Consider enabling the backfill binding.
It is commented out. If it’s intentional, remove the unused code. Otherwise, uncomment it for functionality.

elif os.path.isfile(full_input_path):
assert full_input_path.endswith(
".py"
), f"Input Path: {input_path} isn't a python file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Undefined variable input_path.
Replace input_path with the correct variable or define it in scope.

- assert full_input_path.endswith(".py"), f"Input Path: {input_path} isn't a python file"
+ assert full_input_path.endswith(".py"), f"Input Path: {full_input_path} isn't a python file"

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

81-81: Undefined name input_path

(F821)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (12)
api/py/ai/chronon/repo/runner.py (2)

7-7: Make OUTPUT_ROOT configurable

- OUTPUT_ROOT = "production"
+ OUTPUT_ROOT = os.getenv("OUTPUT_ROOT", "production")

63-67: Backfill not implemented

Let me know if you'd like a stub implementation or usage example.

api/py/ai/chronon/repo/extract_objects.py (2)

Line range hint 25-37: Redundant folder processing

from_folderV2 partly duplicates from_folder; consider merging functionality.


43-66: Error capture

If multiple files fail, consider collating failures into a single error log for better visibility.

api/py/ai/chronon/repo/compile.py (2)

92-92: Drop .keys()

- d in FOLDER_NAME_TO_CLASS.keys()
+ d in FOLDER_NAME_TO_CLASS

123-125: Raise specialized exception

Use a more specific exception type for clarity.

api/py/ai/chronon/repo/compilev2.py (6)

55-58: Use ternary

- if debug:
-     log_level = logging.DEBUG
- else:
-     log_level = logging.INFO
+ log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


65-65: Simplify membership check

- if d in FOLDER_NAME_TO_CLASS.keys():
+ if d in FOLDER_NAME_TO_CLASS:
🧰 Tools
🪛 Ruff (0.8.2)

65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


109-109: Unused variable

Remove target_object_written if not used.

🧰 Tools
🪛 Ruff (0.8.2)

109-109: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


152-152: Unused variable

Remove e if it's not used.

🧰 Tools
🪛 Ruff (0.8.2)

152-152: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


189-189: Simplify membership check

- for file in compile_errors.keys()
+ for file in compile_errors
🧰 Tools
🪛 Ruff (0.8.2)

189-189: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


273-273: Unused variable

class_name is never used. Remove it.

🧰 Tools
🪛 Ruff (0.8.2)

273-273: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c968f31 and b45b935.

📒 Files selected for processing (4)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/repo/extract_objects.py (3 hunks)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/compile.py

94-94: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


109-109: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


152-152: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


189-189: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


273-273: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (3)
api/py/ai/chronon/repo/runner.py (1)

31-31: Remove hardcoded "group_bys" if other types are expected

Ensure this reflects all relevant object types.

api/py/ai/chronon/repo/extract_objects.py (1)

83-83: from_file usage

No issues spotted here.

api/py/ai/chronon/repo/compile.py (1)

98-100: Index safety

path_split[0] could fail if path_split is empty. Verify input handling.

@varant-zlai varant-zlai changed the title Creating docs for CLI Python compile and API skeleton/docs Dec 31, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
api/py/ai/chronon/repo/compilev2.py (4)

105-106: Remove unused variable.

The variable target_object_written is assigned but never used.

- if obj == target_object:
-     target_object_written = True
🧰 Tools
🪛 Ruff (0.8.2)

106-106: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


149-150: Use underscore for unused exception.

Replace unused exception variable with underscore.

- except Exception as e:
+ except Exception as _:
🧰 Tools
🪛 Ruff (0.8.2)

149-149: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


186-186: Simplify dictionary iteration.

Remove redundant .keys() call.

- "\n".join([f"- {os.path.relpath(file, chronon_root_path)}" for file in compile_errors.keys()]) +
+ "\n".join([f"- {os.path.relpath(file, chronon_root_path)}" for file in compile_errors]) +
🧰 Tools
🪛 Ruff (0.8.2)

186-186: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


259-259: Remove unused variable.

The variable class_name is assigned but never used.

- class_name = obj_class.__name__
🧰 Tools
🪛 Ruff (0.8.2)

259-259: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between b45b935 and 22e301f.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


106-106: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


149-149: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


186-186: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


259-259: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (2)
api/py/ai/chronon/repo/compilev2.py (2)

190-202: LGTM!

Clean implementation with proper type hints and nested metadata handling.


233-256: LGTM!

Robust implementation with proper validation and error handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
api/py/ai/chronon/repo/runner.py (3)

13-23: Consider adding error handling for directory traversal.

The while loop could potentially reach the root directory without finding target directories.

 while current_path != os.path.dirname(current_path):  # Stop at root directory
+    if current_path == '/':  # Additional safety check
+        break
     dir_name = os.path.basename(current_path)

31-37: Add type hints for better code maintainability.

The function parameters and return values lack type annotations.

-def get_target_compiled_file(zipline_obj):
+def get_target_compiled_file(zipline_obj: Any) -> Tuple[str, str, str]:

102-103: Add logging for better observability.

The placeholder implementations should log their status.

-    print("\n\n TODO -- Implement \n\n")
+    logging.info("TODO: Implement upload functionality")

Also applies to: 117-118, 134-134

api/py/ai/chronon/repo/compilev2.py (1)

55-58: Simplify log level assignment.

Use ternary operator for cleaner code.

-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 22e301f and 825202f.

📒 Files selected for processing (2)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


106-106: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


149-149: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


186-186: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


259-259: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (1)
api/py/ai/chronon/repo/compilev2.py (1)

291-292: ⚠️ Potential issue

Add required arguments to main function.

The function call is missing required arguments.

 if __name__ == "__main__":
-    extract_and_convert()
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--chronon-root", required=True)
+    parser.add_argument("--output-root", required=True)
+    parser.add_argument("--target-object", required=True)
+    parser.add_argument("--target-object-file", required=True)
+    args = parser.parse_args()
+    extract_and_convert(
+        args.chronon_root,
+        args.output_root,
+        args.target_object,
+        args.target_object_file
+    )

Likely invalid or redundant comment.

Comment on lines 80 to 104
module_file, compiled_file, chronon_root = get_target_compiled_file(zipline_obj)
extract_and_convert(chronon_root, chronon_root, zipline_obj, module_file)
# TODO: actually kick off backfill for compiled_file
print("\n\n TODO -- Implement \n\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add compilation status check before backfill

The extract_and_convert function in compilev2.py already handles compilation failures for the target object by exiting with a failure message. However, the runner code doesn't verify this status.

  • Add error handling around extract_and_convert call in runner.py to verify compilation success
  • Check if compiled_file exists before proceeding with backfill
🔗 Analysis chain

Verify compilation success before proceeding.

The backfill function should verify if compilation was successful.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if compiled file exists
fd -t f "$(basename $compiled_file)" "$(dirname $compiled_file)"

Length of output: 282


Script:

#!/bin/bash
# Find extract_and_convert function implementation
ast-grep --pattern 'def extract_and_convert($$$)'

# Find get_target_compiled_file function implementation
ast-grep --pattern 'def get_target_compiled_file($$$)'

Length of output: 108


Script:

#!/bin/bash
# Search for function implementations
rg "def (extract_and_convert|get_target_compiled_file)" -A 10

Length of output: 2564

Copy link
Contributor

@nikhil-zlai nikhil-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits and some refactoring.

I also think coderabbit suggestions on readme are reasonable

import ai.chronon.api.common.ttypes as common
import ai.chronon.utils as utils
from copy import deepcopy
#from ai.chronon.api_helpers import backfill
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

import logging
import inspect
import json
import types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

derivations=derivations,
)
validate_group_by(group_by)
#group_by.backfill = types.MethodType(backfill, group_by)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup

Comment on lines 52 to 66
# Get the file where zipline_obj is defined
# zipline_obj_file = inspect.getmodule(zipline_obj).__file__
# if not zipline_obj_file:
# raise ValueError("Could not determine file location for zipline_obj")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Comment on lines 33 to 54
zipline_obj_module, zipline_var_name = get_mod_and_var_name_from_gc(zipline_obj, zipline_obj_type)

if zipline_obj_module is None or zipline_var_name is None:
raise ValueError("Could not determine module and variable name for object")

import importlib
module = importlib.import_module(zipline_obj_module)

# find the file of the module
module_file = inspect.getmodule(module).__file__
if not module_file:
raise ValueError(f"Could not determine file location for module {zipline_obj_module}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole bit is better encapsulated as a dataclass

@dataclass
class ConfigDetails:
    module: str
    variable: str
    file: str
    compiled_file: str
    
    def __init__(self, obj, cls):
        self.module = ...
        self.variable = ... 
        self.file = ...
        self.compiled_file = ...
    

cli_readme.md Outdated

Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.

After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the streaming job deployed too in some scenarios so we'll need upload + stream (or deploy) prior to fetching..

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 825202f and 5b07ac4.

📒 Files selected for processing (1)
  • cli_readme.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (4)
cli_readme.md (4)

1-1: Add main title and improve document structure

Add a main title (e.g., "Zipline CLI Documentation") at the top.


29-35: Add missing operational details for backfill

Add date formats, error handling, and performance expectations.

🧰 Tools
🪛 Markdownlint (0.37.0)

29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


6-8: 🛠️ Refactor suggestion

Fix grammar and clarify environment terminology

Previous discussions indicate confusion about branches vs environments. Let's clarify this and fix grammar.

-For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` always runs against a specific environment (e.g., development, qa, prod). By default, it uses your current environment. However, when set to `prod`, it will display how your local changes would affect the production environment

Likely invalid or redundant comment.

🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


50-50: ⚠️ Potential issue

Complete the --ds option description

Based on previous discussion, complete the description.

--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)

Likely invalid or redundant comment.

Comment on lines +60 to +68
### Usage: `zipline upload ENTITY_ID [OPTIONS]`

### Valid entity types: `GroupBy`, `Join`

Options:

```
--keys the keys to use in the fetch request, map of key name to value. Required argument.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix command usage and add examples

Command shows incorrect usage and lacks examples.

-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

+Example:
+```bash
+zipline fetch user_metrics --keys '{"user_id": "12345"}'
+```
🧰 Tools
🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

cli_readme.md Outdated

Populates the online serving index with data for the specified `Entity`. If run for a `Join`, it will run for all of the `GroupBy`s included in that join, as well as run the `Join` metadata upload job, which is required for fetching data for the `Join`.

After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clarify streaming job requirements

Add note about streaming job deployment requirements.

-After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
+After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
api/py/ai/chronon/repo/compilev2.py (2)

116-129: Extract online join validation logic.

Move validation logic to a separate function for better maintainability.

+def _validate_online_join(obj: Join):
+    offline_gbs = [jp.groupBy.metaData.name for jp in obj.joinParts if not jp.groupBy.metaData.online]
+    assert not offline_gbs, (
+        "You must make all dependent GroupBys `online` if you want to make your join `online`. "
+        "You can do this by passing the `online=True` argument to the GroupBy constructor. "
+        "Fix the following: {}".format(offline_gbs)
+    )
+

146-155: Track ZiplineHub integration TODO.

The TODO comment indicates missing ZiplineHub integration for lineage tracking.

Would you like me to create a GitHub issue to track the ZiplineHub integration task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 5b07ac4 and 774df11.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


113-113: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


193-193: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


266-266: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (3)
api/py/ai/chronon/repo/compilev2.py (3)

156-159: 🛠️ Refactor suggestion

Include error details in warning message.

Add exception details to help with debugging.

-    except Exception as e:
-        
-        _print_warning("Failed to connect to ZiplineHub (check connection and VPN settings). \n\n " + 
+    except Exception as e:
+        _print_warning(f"Failed to connect to ZiplineHub: {str(e)}\nCheck connection and VPN settings.\n\n" + 

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


112-113: 🛠️ Refactor suggestion

Remove unused variable.

Replace with logging statement.

-                if obj == target_object:
-                    target_object_written = True
+                if obj == target_object:
+                    logging.info(f"Successfully wrote target object: {target_object}")

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

113-113: Local variable target_object_written is assigned to but never used

Remove assignment to unused variable target_object_written

(F841)


298-299: ⚠️ Potential issue

Add required arguments to main function.

The function requires mandatory arguments that are missing.

 if __name__ == "__main__":
-    extract_and_convert()
+    import argparse
+    parser = argparse.ArgumentParser(description='Compile Chronon repository')
+    parser.add_argument('--chronon-root', required=True, help='Path to chronon root')
+    parser.add_argument('--output-root', required=True, help='Path to output root')
+    parser.add_argument('--target-object', required=True, help='Target object name')
+    parser.add_argument('--target-object-file', required=True, help='Target object file')
+    parser.add_argument('--debug', action='store_true', help='Enable debug logging')
+    args = parser.parse_args()
+    
+    extract_and_convert(
+        chronon_root=args.chronon_root,
+        output_root=args.output_root,
+        target_object=args.target_object,
+        target_object_file=args.target_object_file,
+        debug=args.debug
+    )

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
api/py/ai/chronon/repo/extract_objects.py (1)

43-64: Consider adding docstring parameters.

The docstring should document the target_file parameter and return values.

 def from_folderV2(full_path: str, target_file: str, cls: type):
     """
     Recursively consumes a folder, and constructs a map of 
     object qualifier to StagingQuery, GroupBy, or Join
+
+    Args:
+        full_path: Path to the folder to process
+        target_file: Specific file to track errors for
+        cls: Type of object to extract
+
+    Returns:
+        Tuple containing:
+        - Dict mapping object qualifiers to (object, source_file) tuples
+        - Dict mapping files to their errors
+        - Error specific to target_file if any
     """
api/py/ai/chronon/repo/runner.py (2)

7-10: Remove unused imports.

These folder name constants are imported but never used.

 from ai.chronon.repo import (
     FOLDER_NAME_TO_CLASS,
-    JOIN_FOLDER_NAME,
-    GROUP_BY_FOLDER_NAME,
-    STAGING_QUERY_FOLDER_NAME,
-    MODEL_FOLDER_NAME,
     OUTPUT_ROOT
 )
🧰 Tools
🪛 Ruff (0.8.2)

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)


8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)


9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)


10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)


62-66: Remove commented code.

Delete the unused code block for finding zipline_obj file.

-    # Get the file where zipline_obj is defined
-    # zipline_obj_file = inspect.getmodule(zipline_obj).__file__
-    # if not zipline_obj_file:
-    #     raise ValueError("Could not determine file location for zipline_obj")
api/py/ai/chronon/repo/compilev2.py (4)

55-58: Simplify log level assignment.

Use a ternary operator for cleaner code.

-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


67-67: Optimize dictionary key check.

Use direct key membership test.

-    obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()]
+    obj_folder_names = [d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS]
🧰 Tools
🪛 Ruff (0.8.2)

67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


133-143: Add TODO tracking for ZiplineHub integration.

The placeholder implementation needs tracking.

Would you like me to create a GitHub issue to track the ZiplineHub integration task?


253-264: Remove unused variable.

The class_name variable is assigned but never used.

 def _write_obj_as_json(name: str, obj: object, output_file: str, obj_class: type):
-    class_name = obj_class.__name__
     output_folder = os.path.dirname(output_file)
🧰 Tools
🪛 Ruff (0.8.2)

254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 774df11 and e70963a.

📒 Files selected for processing (4)
  • api/py/ai/chronon/repo/__init__.py (1 hunks)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/repo/extract_objects.py (3 hunks)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • api/py/ai/chronon/repo/init.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/runner.py

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)


8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)


9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)


10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


181-181: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (3)
api/py/ai/chronon/repo/extract_objects.py (1)

Line range hint 25-41: LGTM! Clean implementation of folder traversal.

The function correctly processes Python files recursively and handles exceptions appropriately.

api/py/ai/chronon/repo/runner.py (2)

14-28: LGTM! Robust root directory detection.

The function correctly traverses up the directory tree to find the Chronon root.


90-93: Add compilation status check before backfill.

Verify compilation success before proceeding with backfill.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
cli_readme.md (3)

1-2: Add title and introduction

Add a main title and brief introduction to set context.

+# Zipline CLI Documentation
+
+Command-line interface for managing Zipline data pipelines.
+
 # Plan

6-8: Fix sentence structure

Improve readability by properly structuring sentences with "however".

-If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` is always run against a particular environment. By default, it runs against your current development environment. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment.

Also applies to: 8-8

🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


15-17: Add language specifiers to code blocks

Add bash language specifier to CLI command examples.

-```
+```bash
 --branch         the branch against which to compare...


Also applies to: 29-35, 49-53, 66-68

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</blockquote></details>
<details>
<summary>api/py/ai/chronon/repo/extract_objects.py (1)</summary><blockquote>

Line range hint `25-41`: **Improve error handling.**

Consider collecting errors in a dictionary and returning them along with results, similar to `from_folderV2`.

```diff
 def from_folder(full_path: str, cls: type, log_level=logging.INFO):
     if full_path.endswith("/"):
         full_path = full_path[:-1]
 
     python_files = glob.glob(os.path.join(full_path, "**/*.py"), recursive=True)
     result = {}
+    errors = {}
     for f in python_files:
         try:
             result.update(from_file(f, cls, log_level))
         except Exception as e:
-            logging.error(f"Failed to extract: {f}")
-            logging.exception(e)
+            errors[f] = e
-    return result
+    return result, errors
api/py/ai/chronon/repo/compilev2.py (2)

128-160: Implement ZiplineHub integration.

The function is a placeholder with good error handling but needs implementation.

Would you like me to help implement the ZiplineHub integration for lineage tracking?


253-264: Remove unused variable.

The class_name variable is assigned but never used.

 def _write_obj_as_json(name: str, obj: object, output_file: str, obj_class: type):
-    class_name = obj_class.__name__
     output_folder = os.path.dirname(output_file)
🧰 Tools
🪛 Ruff (0.8.2)

254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

api/py/ai/chronon/repo/runner.py (2)

62-66: Remove commented code block.

The commented code block should be removed if no longer needed.

-    # Get the file where zipline_obj is defined
-    # zipline_obj_file = inspect.getmodule(zipline_obj).__file__
-    # if not zipline_obj_file:
-    #     raise ValueError("Could not determine file location for zipline_obj")

5-12: Remove unused imports.

The following imports are not used:

  • JOIN_FOLDER_NAME
  • GROUP_BY_FOLDER_NAME
  • STAGING_QUERY_FOLDER_NAME
  • MODEL_FOLDER_NAME
🧰 Tools
🪛 Ruff (0.8.2)

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)


8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)


9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)


10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)

api/py/ai/chronon/utils.py (1)

195-197: Consider performance optimization.

The iteration through all module variables could be inefficient for large modules. Consider caching results or using more efficient lookup methods if this becomes a bottleneck.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between e70963a and 03443f8.

📒 Files selected for processing (6)
  • api/py/ai/chronon/repo/__init__.py (1 hunks)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/repo/extract_objects.py (3 hunks)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
  • api/py/ai/chronon/utils.py (1 hunks)
  • cli_readme.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/py/ai/chronon/repo/init.py
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/runner.py

7-7: ai.chronon.repo.JOIN_FOLDER_NAME imported but unused

Remove unused import

(F401)


8-8: ai.chronon.repo.GROUP_BY_FOLDER_NAME imported but unused

Remove unused import

(F401)


9-9: ai.chronon.repo.STAGING_QUERY_FOLDER_NAME imported but unused

Remove unused import

(F401)


10-10: ai.chronon.repo.MODEL_FOLDER_NAME imported but unused

Remove unused import

(F401)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


67-67: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


181-181: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


254-254: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (6)
cli_readme.md (2)

60-60: ⚠️ Potential issue

Fix incorrect command in usage section

The fetch command usage shows incorrect command.

-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

Likely invalid or redundant comment.

🧰 Tools
🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


50-50: ⚠️ Potential issue

Complete the --ds option description

The --ds option description is incomplete.

---ds        The date to use for the batch upload. Defaults to 
+--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)

Likely invalid or redundant comment.

api/py/ai/chronon/repo/extract_objects.py (2)

43-64: LGTM! Well-structured error handling.

Good separation of target file errors and tracking of file origins in results.


82-82: LGTM! Clean and focused implementation.

api/py/ai/chronon/repo/runner.py (2)

14-28: LGTM! Clean directory traversal implementation.


90-93: ⚠️ Potential issue

Add error handling for extract_and_convert.

The function should handle potential errors from extract_and_convert.

     module_file, compiled_file, chronon_root = get_target_compiled_file(zipline_obj)
-    extract_and_convert(chronon_root, chronon_root, zipline_obj, module_file)
+    try:
+        extract_and_convert(chronon_root, chronon_root, zipline_obj, module_file)
+    except Exception as e:
+        raise ValueError(f"Failed to compile {module_file}: {str(e)}")
     # TODO: actually kick off backfill for compiled_file

Likely invalid or redundant comment.

Comment on lines 45 to 48
def extract_and_convert(
chronon_root, output_root, target_object, target_object_file, debug=False
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add force_overwrite parameter.

The function uses force_overwrite in _write_obj but it's not in the signature.

 def extract_and_convert(
-    chronon_root, output_root, target_object, target_object_file, debug=False
+    chronon_root, output_root, target_object, target_object_file, debug=False, force_overwrite=False
 ):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_and_convert(
chronon_root, output_root, target_object, target_object_file, debug=False
):
def extract_and_convert(
chronon_root, output_root, target_object, target_object_file, debug=False, force_overwrite=False
):

Comment on lines +194 to +197
module = importlib.import_module(mod_name)
for var_name, value in vars(module).items():
if value is obj:
return mod_name, var_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for module import.

-    module = importlib.import_module(mod_name)
-    for var_name, value in vars(module).items():
-        if value is obj:
-            return mod_name, var_name
+    try:
+        module = importlib.import_module(mod_name)
+        for var_name, value in vars(module).items():
+            if value is obj:
+                return mod_name, var_name
+    except ImportError:
+        return mod_name, None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
module = importlib.import_module(mod_name)
for var_name, value in vars(module).items():
if value is obj:
return mod_name, var_name
try:
module = importlib.import_module(mod_name)
for var_name, value in vars(module).items():
if value is obj:
return mod_name, var_name
except ImportError:
return mod_name, None

Comment on lines 187 to 200
def get_mod_and_var_name_from_gc(obj, mod_prefix):
# Find the variable name within the module
mod_name = get_mod_name_from_gc(obj, mod_prefix)
"""Get the variable name that points to the obj in the module"""
if not mod_name:
return None

module = importlib.import_module(mod_name)
for var_name, value in vars(module).items():
if value is obj:
return mod_name, var_name

return mod_name, None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add type hints and proper docstring.

-def get_mod_and_var_name_from_gc(obj, mod_prefix):
-    # Find the variable name within the module
-    mod_name = get_mod_name_from_gc(obj, mod_prefix)
-    """Get the variable name that points to the obj in the module"""
+def get_mod_and_var_name_from_gc(obj: object, mod_prefix: str) -> tuple[str | None, str | None]:
+    """Get the module and variable name that points to the object.
+    
+    Args:
+        obj: Object to find in garbage collector
+        mod_prefix: Module prefix to filter by
+        
+    Returns:
+        Tuple of (module_name, variable_name) or (module_name, None) if variable not found
+    """
+    mod_name = get_mod_name_from_gc(obj, mod_prefix)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_mod_and_var_name_from_gc(obj, mod_prefix):
# Find the variable name within the module
mod_name = get_mod_name_from_gc(obj, mod_prefix)
"""Get the variable name that points to the obj in the module"""
if not mod_name:
return None
module = importlib.import_module(mod_name)
for var_name, value in vars(module).items():
if value is obj:
return mod_name, var_name
return mod_name, None
def get_mod_and_var_name_from_gc(obj: object, mod_prefix: str) -> tuple[str | None, str | None]:
"""Get the module and variable name that points to the object.
Args:
obj: Object to find in garbage collector
mod_prefix: Module prefix to filter by
Returns:
Tuple of (module_name, variable_name) or (module_name, None) if variable not found
"""
mod_name = get_mod_name_from_gc(obj, mod_prefix)
if not mod_name:
return None
module = importlib.import_module(mod_name)
for var_name, value in vars(module).items():
if value is obj:
return mod_name, var_name
return mod_name, None

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
cli_readme.md (1)

6-8: Improve readability and clarify environment terminology

-If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+If you run `backfill` for the `GroupBy` and then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, the column addition would still be reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` always runs against a particular environment. By default, it runs against your development environment. However, it can also be set to `prod` to display the plan of your local changes against the production environment
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)

api/py/ai/chronon/repo/extract_objects.py (1)

43-47: Complete the docstring

 """
-    Recursively consumes a folder, and constructs a map of 
-    object qualifier to StagingQuery, GroupBy, or Join
+    Recursively consumes a folder and constructs a map of object qualifiers to their implementations.
+    
+    Args:
+        full_path: Path to the folder to process
+        target_file: Specific file to track errors for
+        cls: Type of objects to extract (StagingQuery, GroupBy, or Join)
+    
+    Returns:
+        Tuple of (results dict, errors dict, target file error)
 """
api/py/ai/chronon/repo/runner.py (3)

27-28: Chain exceptions for better debugging

-            raise ValueError(f"Can only run one of {valid_types}, got {type(obj).__name__}")
+            raise ValueError(f"Can only run one of {valid_types}, got {type(obj).__name__}") from None
🧰 Tools
🪛 Ruff (0.8.2)

28-28: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


94-95: Implement backfill functionality

The TODO comment indicates missing implementation.

Would you like me to help implement the backfill functionality or create a GitHub issue to track this task?


98-146: Well-documented placeholder functions

Docstrings are comprehensive. Functions need implementation.

Would you like me to help implement these functions or create GitHub issues to track their implementation?

api/py/ai/chronon/repo/compilev2.py (4)

38-38: Remove unused constant.

The DEFAULT_TEAM_NAME constant is not used anywhere in the code.

-DEFAULT_TEAM_NAME = "default"

55-58: Use ternary operator for brevity.

-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


137-146: Track TODO for ZiplineHub integration.

The TODO comment indicates missing ZiplineHub integration for lineage and schema display.

Would you like me to create a GitHub issue to track this task?


257-257: Remove unused variable.

The class_name variable is assigned but never used.

-    class_name = obj_class.__name__
🧰 Tools
🪛 Ruff (0.8.2)

257-257: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 03443f8 and 0ddaabc.

📒 Files selected for processing (6)
  • api/py/ai/chronon/repo/__init__.py (1 hunks)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/repo/extract_objects.py (3 hunks)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
  • api/py/ai/chronon/utils.py (1 hunks)
  • cli_readme.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/py/ai/chronon/repo/init.py
  • api/py/ai/chronon/utils.py
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/runner.py

28-28: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

api/py/ai/chronon/repo/compilev2.py

55-58: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


70-70: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


184-184: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


257-257: Local variable class_name is assigned to but never used

Remove assignment to unused variable class_name

(F841)

🔇 Additional comments (6)
cli_readme.md (3)

41-41: 🛠️ Refactor suggestion

Add streaming job deployment requirement

-After calling `Upload` for any `Entity`, you can then call `fetch` to get values.
+After calling `Upload` for any `Entity` and ensuring any required streaming jobs are deployed, you can then call `fetch` to get values.

Likely invalid or redundant comment.


60-60: ⚠️ Potential issue

Fix incorrect command in usage section

-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

Likely invalid or redundant comment.

🧰 Tools
🪛 Markdownlint (0.37.0)

60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


50-50: ⚠️ Potential issue

Complete the --ds option description

---ds        The date to use for the batch upload. Defaults to 
+--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)

Likely invalid or redundant comment.

api/py/ai/chronon/repo/extract_objects.py (2)

Line range hint 25-41: LGTM!

Function simplification improves clarity.


Line range hint 82-101: LGTM!

Function simplification improves clarity.

api/py/ai/chronon/repo/compilev2.py (1)

45-47: ⚠️ Potential issue

Add force_overwrite parameter.

The function uses force_overwrite in _write_obj but it's not in the signature.

-def extract_and_convert(
-    chronon_root, target_object, target_object_file, debug=False, output_root=None
-):
+def extract_and_convert(
+    chronon_root, target_object, target_object_file, debug=False, output_root=None, force_overwrite=False
+):

Likely invalid or redundant comment.

Comment on lines +31 to +32
--start start date for which you want data backfilled for this entity. Defaults to the configured start date.
--end end date for which you want data backfilled for this entity. Defaults to today - 2 days.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add date format specifications

---start               start date for which you want data backfilled for this entity. Defaults to the configured start date.
---end                 end date for which you want data backfilled for this entity. Defaults to today - 2 days.
+--start               start date (YYYY-MM-DD format) for which you want data backfilled. Defaults to the configured start date.
+--end                 end date (YYYY-MM-DD format) for which you want data backfilled. Defaults to today - 2 days.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--start start date for which you want data backfilled for this entity. Defaults to the configured start date.
--end end date for which you want data backfilled for this entity. Defaults to today - 2 days.
--start start date (YYYY-MM-DD format) for which you want data backfilled. Defaults to the configured start date.
--end end date (YYYY-MM-DD format) for which you want data backfilled. Defaults to today - 2 days.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
api/py/ai/chronon/repo/runner.py (2)

64-81: Consider normalizing paths

Add os.path.normpath for consistent path handling across platforms.

-    current_path = os.path.dirname(os.path.abspath(filepath))
+    current_path = os.path.dirname(os.path.normpath(os.path.abspath(filepath)))

104-104: Track TODOs systematically

Consider creating GitHub issues for tracking these TODOs.

Would you like me to create GitHub issues for implementing these functions?

Also applies to: 124-124, 141-141, 160-160

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 334c43b and e16558e.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/runner.py

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (2)
api/py/ai/chronon/repo/runner.py (2)

10-62: LGTM! Clean dataclass implementation

Well-structured configuration encapsulation with thorough validation.

🧰 Tools
🪛 Ruff (0.8.2)

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


101-102: Add error handling for compilation

Verify compilation success before proceeding.

-    extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
+    try:
+        extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
+    except Exception as e:
+        raise ValueError(f"Failed to compile object: {e}") from e
+    
+    if not os.path.exists(config_details.compiled_file):
+        raise ValueError("Compilation failed: compiled file not found")

Comment on lines +25 to +29
except StopIteration:
valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
raise ValueError(
f"Can only run one of {valid_types}, got {type(obj).__name__}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add exception chaining

Add from err to preserve traceback:

-        except StopIteration:
+        except StopIteration as err:
             valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
-            raise ValueError(
+            raise ValueError(
                 f"Can only run one of {valid_types}, got {type(obj).__name__}"
-            )
+            ) from err
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except StopIteration:
valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
raise ValueError(
f"Can only run one of {valid_types}, got {type(obj).__name__}"
)
except StopIteration as err:
valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
raise ValueError(
f"Can only run one of {valid_types}, got {type(obj).__name__}"
) from err
🧰 Tools
🪛 Ruff (0.8.2)

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
api/py/ai/chronon/repo/runner.py (1)

107-160: Add type hints to placeholder functions

Add return type annotations to improve code clarity.

-def upload(zipline_obj, date, force_recompute=False, plan=False):
+def upload(zipline_obj, date, force_recompute=False, plan=False) -> None:

-def stream(zipline_obj):
+def stream(zipline_obj) -> None:

-def info(zipline_obj, branch=None):
+def info(zipline_obj, branch=None) -> None:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between e16558e and 8c9fbfe.

📒 Files selected for processing (2)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
  • api/py/ai/chronon/utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/py/ai/chronon/utils.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/runner.py

27-29: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (1)
api/py/ai/chronon/repo/runner.py (1)

64-81: LGTM!

Well-structured helper function with clear error handling.

Comment on lines 10 to 17
@dataclass
class ConfigDetails:
module: str
variable: str
file: str
compiled_file: str
chronon_root: str

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify ConfigDetails implementation

  1. Remove manual __init__ since @dataclass auto-generates it
  2. Add exception chaining
  3. Simplify error messages
@dataclass
class ConfigDetails:
    module: str
    variable: str
    file: str
    compiled_file: str
    chronon_root: str

-    def __init__(self, obj):
+    @classmethod
+    def from_obj(cls, obj):
         try:
             obj_type = next(
                 k for k, v in FOLDER_NAME_TO_CLASS.items() if isinstance(obj, v)
             )
-        except StopIteration:
+        except StopIteration as err:
             valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
             raise ValueError(
                 f"Can only run one of {valid_types}, got {type(obj).__name__}"
-            )
+            ) from err

         module, variable = get_mod_and_var_name_from_gc(obj, obj_type)
         if module is None or variable is None:
             raise ValueError("Could not determine module and variable name for object")

         module_obj = importlib.import_module(module)
         file = inspect.getmodule(module_obj).__file__
         if not file:
-            raise ValueError(
-                f"""
-                                Could not determine file location for module {module}, {variable}.\n
-                                Runner currently only supports working on files saved within a valid Chronon
-                                root directory.\n Make sure you have your Zipline python files within the
-                                right directory, and then you can import them to your desired runtime.
-                             """
-            )
+            raise ValueError(f"Could not determine file location for module {module}, {variable}. Ensure file is within Chronon root directory.")

         path_parts = module.split(".")
         if path_parts[0] != obj_type:
             raise ValueError(f"Expected module path to start with '{obj_type}', got {module}")

         chronon_root = get_chronon_root(file)
         team = path_parts[1]
         path = ".".join(path_parts[2:])
         compiled_file = f"{chronon_root}/{OUTPUT_ROOT}/{obj_type}/{team}/{path}.{variable}"
         
+        return cls(module=module, variable=variable, file=file, 
+                  compiled_file=compiled_file, chronon_root=chronon_root)

Also applies to: 18-62

Comment on lines 83 to 105
def backfill(zipline_obj, start_date, end_date, force_recompute=False, plan=False):
"""
Backfills a Chronon object for a specified date range.
Args:
zipline_obj: The Chronon object (GroupBy, Join, StagingQuery) to backfill
start_date: Start date for the backfill period
end_date: End date for the backfill period
force_recompute: If True, recomputes data even if it already exists (default: False)
plan: If True, only shows execution plan without running backfill (default: False)
Returns:
None
Raises:
ValueError: If the object cannot be compiled or backfilled
"""
config_details = ConfigDetails(zipline_obj)
# module_file, compiled_file, chronon_root = get_target_compiled_file(zipline_obj)
extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
# TODO: actually kick off backfill for compiled_file
print("\n\n TODO -- Implement \n\n")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add compilation verification and update ConfigDetails usage

  1. Verify compilation success
  2. Update to use new factory method
 def backfill(zipline_obj, start_date, end_date, force_recompute=False, plan=False):
-    config_details = ConfigDetails(zipline_obj)
+    config_details = ConfigDetails.from_obj(zipline_obj)
-    extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module)
+    if not extract_and_convert(config_details.chronon_root, zipline_obj, config_details.module):
+        raise ValueError("Compilation failed")
+    if not os.path.exists(config_details.compiled_file):
+        raise ValueError("Compiled file not found")

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
api/py/ai/chronon/repo/runner.py (3)

28-32: ⚠️ Potential issue

Add exception chaining to preserve traceback

Include the original exception when raising a new one.

Apply this diff:

-            except StopIteration:
+            except StopIteration as err:
                 valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
-                raise ValueError(
+                raise ValueError(
                     f"Can only run one of {valid_types}, got {type(obj).__name__}"
-                )
+                ) from err
🧰 Tools
🪛 Ruff (0.8.2)

30-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


21-66: 🛠️ Refactor suggestion

Refactor ConfigDetails using a factory method

Convert __init__ to a @classmethod for clarity.

Apply this diff:

 @dataclass
 class ConfigDetails:
     module: str
     variable: str
     file: str
     compiled_file: str
     chronon_root: str

-    def __init__(self, obj):
+    @classmethod
+    def from_obj(cls, obj):

         try:
             obj_type = next(
                 k for k, v in FOLDER_NAME_TO_CLASS.items() if isinstance(obj, v)
             )
         except StopIteration as err:
             valid_types = [cls.__name__ for cls in FOLDER_NAME_TO_CLASS.values()]
             raise ValueError(
                 f"Can only run one of {valid_types}, got {type(obj).__name__}"
             ) from err

         # Rest of the code...

+        return cls(
+            module=module,
+            variable=variable,
+            file=file,
+            compiled_file=compiled_file,
+            chronon_root=chronon_root
+        )
🧰 Tools
🪛 Ruff (0.8.2)

30-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


144-160: ⚠️ Potential issue

Update docstring to match parameters

The backfill method's docstring references zipline_obj, which isn't a parameter.

🧹 Nitpick comments (8)
api/py/ai/chronon/repo/extract_objects.py (3)

Line range hint 25-41: Fix incomplete docstring.

Docstring ends mid-sentence and lacks return type.

 def from_folder(full_path: str, cls: type, log_level=logging.INFO):
     """
-    Recursively consumes a folder, and constructs a map
-    Creates a map of object qualifier to
+    Recursively processes Python files in a folder.
+
+    Args:
+        full_path: Directory path to process
+        cls: Type of objects to extract
+        log_level: Logging level
+
+    Returns:
+        dict: Map of object qualifiers to extracted objects
     """

44-66: Add return type hints.

Missing return type annotation for clarity.

-def from_folderV2(full_path: str, target_file: str, cls: type):
+def from_folderV2(full_path: str, target_file: str, cls: type) -> tuple[dict, dict[str, Exception], Exception | None]:

83-86: Use f-strings for better readability.

Replace .format() with f-string.

-    logger.debug("Loading objects of type {cls} from {file_path}".format(**locals()))
+    logger.debug(f"Loading objects of type {cls} from {file_path}")
cli_readme.md (3)

6-8: Fix usage of 'however' and punctuation

Separate 'however' into new sentences and add commas.

Apply this diff:

-...plan, however, there would still...
+...plan. However, there would still...
-...on, however, it can also be...
+...on. However, it can also be...
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


12-12: Correct heading level

Change heading from H3 to H2 for consistency.

Apply this diff:

-### Usage: `zipline plan [OPTIONS]`
+## Usage: `zipline plan [OPTIONS]`
🧰 Tools
🪛 Markdownlint (0.37.0)

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: Specify language in code block

Add language specifier for syntax highlighting.

Apply this diff:

-```
+```bash
🧰 Tools
🪛 Markdownlint (0.37.0)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

api/py/ai/chronon/repo/compilev2.py (2)

56-59: Simplify log level assignment

Use a ternary operator for brevity.

Apply this diff:

-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


71-71: Simplify dictionary key check

Use key in dict instead of key in dict.keys().

Apply this diff:

-            d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS.keys()
+            d for d in os.listdir(chronon_root) if d in FOLDER_NAME_TO_CLASS
🧰 Tools
🪛 Ruff (0.8.2)

71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between bb6c942 and fc7909f.

📒 Files selected for processing (12)
  • api/py/ai/chronon/group_by.py (2 hunks)
  • api/py/ai/chronon/join.py (3 hunks)
  • api/py/ai/chronon/repo/__init__.py (1 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
  • api/py/ai/chronon/repo/extract_objects.py (3 hunks)
  • api/py/ai/chronon/repo/hub_uploader.py (1 hunks)
  • api/py/ai/chronon/repo/runner.py (1 hunks)
  • api/py/ai/chronon/repo/serializer.py (2 hunks)
  • api/py/ai/chronon/utils.py (1 hunks)
  • api/thrift/hub.thrift (1 hunks)
  • cli_readme.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • api/py/ai/chronon/repo/init.py
  • api/py/ai/chronon/repo/compile.py
  • api/thrift/hub.thrift
  • api/py/ai/chronon/repo/serializer.py
  • api/py/ai/chronon/utils.py
  • api/py/ai/chronon/group_by.py
  • api/py/ai/chronon/repo/hub_uploader.py
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


43-43: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


60-60: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


49-49: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


66-66: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/runner.py

30-32: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

api/py/ai/chronon/repo/compilev2.py

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


198-198: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
api/py/ai/chronon/repo/extract_objects.py (1)

Line range hint 83-102: Verify module import error handling.

Check if import errors are properly caught and logged.

✅ Verification successful

Import errors are properly handled at caller level

Error handling exists in the calling functions with appropriate logging and error preservation. No changes needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for potential import error handling gaps
rg -n "importlib\.(machinery|util)" --type py

Length of output: 186


Script:

#!/bin/bash
# Check for error handling around imports
rg -B 2 -A 2 "try.*:|except.*:|import_module" api/py/ai/chronon/repo/extract_objects.py

Length of output: 1132

api/py/ai/chronon/repo/compilev2.py (1)

264-265: Address commented-out code

Decide whether to implement or remove the safe_to_overwrite check.

api/py/ai/chronon/join.py (3)

27-27: LGTM!

Clean import of MethodType for dynamic method binding.


Line range hint 645-656: LGTM!

Join object creation is well-structured with appropriate parameters.


657-664: Rename upload to deploy for consistency.

Previous review comment suggested renaming upload to deploy for consistency.

Apply this diff:

-    join.upload = MethodType(deploy, join)
+    join.deploy = MethodType(deploy, join)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
api/py/ai/chronon/repo/compilev2.py (1)

264-265: ⚠️ Potential issue

Implement the safe_to_overwrite check.

The commented code represents a critical safety check. Either implement it or document why it's safe to remove.

🧹 Nitpick comments (2)
api/py/ai/chronon/repo/compilev2.py (2)

56-59: Simplify log_level assignment.

Use ternary operator for better readability.

-    if debug:
-        log_level = logging.DEBUG
-    else:
-        log_level = logging.INFO
+    log_level = logging.DEBUG if debug else logging.INFO
🧰 Tools
🪛 Ruff (0.8.2)

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


54-54: Implement ZiplineHub integration.

The TODO comments indicate missing ZiplineHub integration for lineage and schema functionality.

Would you like me to help implement the ZiplineHub integration or create an issue to track this task?

Also applies to: 143-153

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between fc7909f and aa9830b.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/compilev2.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/compilev2.py

56-59: Use ternary operator log_level = logging.DEBUG if debug else logging.INFO instead of if-else-block

Replace if-else-block with log_level = logging.DEBUG if debug else logging.INFO

(SIM108)


71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


198-198: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (2)
api/py/ai/chronon/repo/compilev2.py (2)

1-40: LGTM!

The license, imports, and constants are properly defined.


46-48: ⚠️ Potential issue

Add force_overwrite parameter.

Based on usage in _write_obj, the parameter is required.

 def extract_and_convert(
-    chronon_root, target_object, target_object_file, debug=False, output_root=None
+    chronon_root, target_object, target_object_file, debug=False, output_root=None, force_overwrite=False
 ):

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
api/py/ai/chronon/group_by.py (1)

605-613: Consider using a helper function for method binding.

The method binding implementation is correct but could be more maintainable.

-    # Import locally to avoid circular dependency issue
-    from ai.chronon.repo.runner import backfill, deploy, info
-
-    # Attach functions directly to group_by
-    group_by.backfill = MethodType(backfill, group_by)
-    group_by.deploy = MethodType(deploy, group_by)
-    group_by.info = MethodType(info, group_by)
+    def _bind_methods(obj):
+        # Import locally to avoid circular dependency issue
+        from ai.chronon.repo.runner import backfill, deploy, info
+        
+        methods = {'backfill': backfill, 'deploy': deploy, 'info': info}
+        for name, func in methods.items():
+            setattr(obj, name, MethodType(func, obj))
+
+    _bind_methods(group_by)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 12f5d47 and 4238e17.

📒 Files selected for processing (3)
  • api/py/ai/chronon/group_by.py (2 hunks)
  • api/py/ai/chronon/join.py (4 hunks)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/py/ai/chronon/repo/compile.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
api/py/ai/chronon/group_by.py (1)

25-25: LGTM!

The import of MethodType is correctly placed with other imports.

api/py/ai/chronon/join.py (4)

27-27: LGTM!

The import is necessary for the method bindings.


73-79: LGTM!

The condition correctly filters out method references from garbage collector.


Line range hint 653-664: LGTM!

The modification allows method binding before returning the Join instance.


665-672: Use consistent method naming: deploy instead of upload.

Rename join.upload to join.deploy for consistency.

-    join.upload = MethodType(deploy, join)
+    join.deploy = MethodType(deploy, join)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cli_readme.md (1)

74-74: Fix typo.

-Provides information about a given `Entity`, including upstream/downstream lineage and schema informatio.
+Provides information about a given `Entity`, including upstream/downstream lineage and schema information.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4238e17 and c7b29fc.

📒 Files selected for processing (1)
  • cli_readme.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
cli_readme.md

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


[grammar] ~21-~21: Did you mean “these”?
Context: ...or the specified entity and date range. This produces computed values for the `Entit...

(THIS_NNS_VB)


[style] ~39-~39: Consider removing “of” to be more concise
Context: .... If run for a Join, it will run for all of the GroupBy`s included in that join, as we...

(ALL_OF_THE)

🪛 Markdownlint (0.37.0)
cli_readme.md

12-12: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


23-23: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


45-45: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


62-62: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


76-76: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


51-51: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


68-68: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
cli_readme.md (5)

6-8: Fix grammatical issues and clarify environment terminology.

-For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan, however, there would still be a column addition reflected in your `upload` plan.
+For example, if you add a column to a `GroupBy` and call `plan`, you'll see that change reflected as a column addition. If you then run `backfill` for the `GroupBy`, then call `plan` again, you would see no change to your `GroupBy` `backfill` plan. However, there would still be a column addition reflected in your `upload` plan.

-`plan` is also always run against a particular environment. By default it is the branch which you are authoring on, however, it can also be set to `prod` in which case it will display the plan of your local changes against the prod environment
+`plan` is always run against a particular environment. By default, it runs against your current development branch. However, it can also be set to `prod`, in which case it will display the plan of your local changes against the production environment
🧰 Tools
🪛 LanguageTool

[typographical] ~6-~6: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...change to your GroupBy backfill plan, however, there would still be a column addition ...

(HOWEVER_SENTENCE)


[uncategorized] ~8-~8: Did you mean: “By default,”?
Context: ...s run against a particular environment. By default it is the branch which you are authorin...

(BY_DEFAULT_COMMA)


[typographical] ~8-~8: The word “however” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...is the branch which you are authoring on, however, it can also be set to prod in which c...

(HOWEVER_SENTENCE)


15-17: Add language specifier to code block.

-```
+```bash
 --branch         the branch against which to compare the local state. Can be set to `prod` or a different `branch_id`. Defaults to the current branch if one is set, else `prod`.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint (0.37.0)</summary>

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

---

`31-32`: **Add date format specifications.**

```diff
---start               start date for which you want data backfilled for this entity. Defaults to the configured start date.
---end                 end date for which you want data backfilled for this entity. Defaults to today - 2 days.
+--start               start date (YYYY-MM-DD format) for which you want data backfilled. Defaults to the configured start date.
+--end                 end date (YYYY-MM-DD format) for which you want data backfilled. Defaults to today - 2 days.

52-52: Complete the --ds option description.

---ds        The date to use for the batch upload. Defaults to 
+--ds        The date to use for the batch upload. Defaults to two days prior to today (UTC)

62-70: Fix command usage and add examples.

-### Usage: `zipline upload ENTITY_ID [OPTIONS]`
+### Usage: `zipline fetch ENTITY_ID [OPTIONS]`

+Example:
+```bash
+zipline fetch user_metrics --keys '{"user_id": "12345"}'
+```
🧰 Tools
🪛 Markdownlint (0.37.0)

62-62: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


68-68: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Copy link
Contributor

@nikhil-zlai nikhil-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varant-zlai varant-zlai merged commit d07aca9 into main Jan 23, 2025
5 checks passed
@varant-zlai varant-zlai deleted the cli_documentation branch January 23, 2025 19:07
@coderabbitai coderabbitai bot mentioned this pull request Feb 4, 2025
12 tasks
@coderabbitai coderabbitai bot mentioned this pull request Mar 3, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Apr 16, 2025
4 tasks
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary

- Initial implementation of Python `compile` -- compiles the whole repo
- Placeholder implementation of `backfill` -- next step will be to wire
up to the orchestrator for implementation
- Documentation for the broader user API surface area and how it should
behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

<img width="729" alt="image"
src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed"
/>

Next steps:
- Connect with ZiplineHub for lineage, and backfill execution

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [x] Documentation update




<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Expanded CLI documentation with detailed descriptions of `plan`,
`backfill`, `deploy`, and `fetch` commands.
	- Added comprehensive usage instructions and examples for each command.

- **New Features**
	- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration
details.
- Introduced new utility functions for module and variable name
retrieval.
- Added functionality to compute and upload differences in a local
repository.
	- New function to convert JSON to Thrift binary format.

- **Improvements**
- Streamlined function signatures in extraction and compilation modules.
	- Added error handling and logging improvements.
	- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for `backfill`, `deploy`, and `info` functions
in `GroupBy` and `Join`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary

- Initial implementation of Python `compile` -- compiles the whole repo
- Placeholder implementation of `backfill` -- next step will be to wire
up to the orchestrator for implementation
- Documentation for the broader user API surface area and how it should
behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

<img width="729" alt="image"
src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed"
/>

Next steps:
- Connect with ZiplineHub for lineage, and backfill execution

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [x] Documentation update




<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Expanded CLI documentation with detailed descriptions of `plan`,
`backfill`, `deploy`, and `fetch` commands.
	- Added comprehensive usage instructions and examples for each command.

- **New Features**
	- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration
details.
- Introduced new utility functions for module and variable name
retrieval.
- Added functionality to compute and upload differences in a local
repository.
	- New function to convert JSON to Thrift binary format.

- **Improvements**
- Streamlined function signatures in extraction and compilation modules.
	- Added error handling and logging improvements.
	- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for `backfill`, `deploy`, and `info` functions
in `GroupBy` and `Join`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

- Initial implementation of Python `compile` -- compiles the whole repo
- Placeholder implementation of `backfill` -- next step will be to wire
up to the orchestrator for implementation
- Documentation for the broader user API surface area and how it should
behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

<img width="729" alt="image"
src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed"
/>

Next steps:
- Connect with ZiplineHub for lineage, and backfill execution

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [x] Documentation update




<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Expanded CLI documentation with detailed descriptions of `plan`,
`backfill`, `deploy`, and `fetch` commands.
	- Added comprehensive usage instructions and examples for each command.

- **New Features**
	- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration
details.
- Introduced new utility functions for module and variable name
retrieval.
- Added functionality to compute and upload differences in a local
repository.
	- New function to convert JSON to Thrift binary format.

- **Improvements**
- Streamlined function signatures in extraction and compilation modules.
	- Added error handling and logging improvements.
	- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for `backfill`, `deploy`, and `info` functions
in `GroupBy` and `Join`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

- Initial implementation of Python `compile` -- compiles the whole repo
- Placeholder implementation of `backfill` -- next step will be to wire
up to the orchestrator for implementation
- Documentation for the broader user API surface area and how it should
behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

<img width="729" alt="image"
src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed"
/>

Next steps:
- Connect with ZiplineHub for lineage, and backfill execution

## Checklist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [x] Documentation update




<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Expanded CLI documentation with detailed descriptions of `plan`,
`backfill`, `deploy`, and `fetch` commands.
	- Added comprehensive usage instructions and examples for each command.

- **New Features**
	- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration
details.
- Introduced new utility functions for module and variable name
retrieval.
- Added functionality to compute and upload differences in a local
repository.
	- New function to convert JSON to Thrift binary format.

- **Improvements**
- Streamlined function signatures in extraction and compilation modules.
	- Added error handling and logging improvements.
	- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for `backfill`, `deploy`, and `info` functions
in `GroupBy` and `Join`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
@coderabbitai coderabbitai bot mentioned this pull request May 15, 2025
4 tasks
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

- Initial implementation of Python `compile` -- compiles the whole repo
- Placeholder implementation of `baour clientsfill` -- next step will be to wire
up to the orchestrator for implementation
- Documentation for the broader user API surface area and how it should
behave (should maybe move this to pydocs)

Currently, the interaction looks like this:

<img width="729" alt="image"
src="https://github.com/user-attachments/assets/a52cf582-4394-449a-8567-32c0f108d2ed"
/>

Next steps:
- Connect with ZiplineHub for lineage, and baour clientsfill execution

## Cheour clientslist
- [ ] Added Unit Tests
- [x] Covered by existing CI
- [ ] Integration tested
- [x] Documentation update




<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Documentation**
- Expanded CLI documentation with detailed descriptions of `plan`,
`baour clientsfill`, `deploy`, and `fetch` commands.
	- Added comprehensive usage instructions and examples for each command.

- **New Features**
	- Enhanced repository management functions for Chronon objects.
- Added methods for compiling, uploading, and managing configuration
details.
- Introduced new utility functions for module and variable name
retrieval.
- Added functionality to compute and upload differences in a local
repository.
	- New function to convert JSON to Thrift binary format.

- **Improvements**
- Streamlined function signatures in extraction and compilation modules.
	- Added error handling and logging improvements.
	- Enhanced object serialization and diff computation capabilities.
- Dynamic method bindings for `baour clientsfill`, `deploy`, and `info` functions
in `GroupBy` and `Join`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants