Skip to content

Conversation

@ken-zlai
Copy link
Contributor

@ken-zlai ken-zlai commented Feb 7, 2025

Summary

Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend.

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics.
    • Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users.
    • Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations.
  • New Features

    • Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation.

@ken-zlai ken-zlai requested a review from nikhil-zlai February 7, 2025 17:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2025

Walkthrough

The pull request modifies null handling in the backend and updates percentile data mapping in the frontend. Specifically, the transpose method in PivotUtils.scala now returns a constant value (Constants.magicNullDouble) instead of null when processing null lists. In the PercentileLineChart.svelte, the indices for the percentile series (p95, p50, p5) are updated, affecting how data is accessed from the data.percentiles array.

Changes

File Summary
online/.../PivotUtils.scala Modified transpose to return Constants.magicNullDouble instead of null when encountering null lists.
online/.../DriftStore.scala Added percentileToIndex and filterPercentiles methods for handling percentiles in TileSummary.
spark/.../DriftTest.scala Adjusted null ratio validation from 0.1 to 0.2 and modified breaks in time series conversion.
frontend/.../PercentileLineChart.svelte Updated percentile series indices from [19, 10, 1] to [2, 1, 0], altering data mapping in the chart.

Possibly related PRs

  • zipline-ai/chronon#279: Directly related to the modification of the transpose method to use Constants.magicNullDouble.
  • zipline-ai/chronon#293: Addresses handling of null values in data processing, similar to changes in PivotUtils.scala.
  • zipline-ai/chronon#347: Enhances null handling in the transpose method, indicating a direct relationship with the current PR.

Suggested reviewers

  • nikhil-zlai
  • sean-zlai

Poem

In lines of code, a twist so neat,
Nulls transformed with magic beat ✨,
Charts now sing a fresher tune,
Percentiles re-indexed, set to swoon,
Changes weave a joyful code retreat!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between dd234ef and 7110bf5.

📒 Files selected for processing (1)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: streaming_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: frontend_tests
  • GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ken-zlai ken-zlai mentioned this pull request Feb 7, 2025
4 tasks
@ken-zlai
Copy link
Contributor Author

ken-zlai commented Feb 7, 2025

# Conflicts:
#	hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala
#	hub/src/test/scala/ai/chronon/hub/handlers/TimeSeriesHandlerTest.scala
@ken-zlai ken-zlai removed the request for review from nikhil-zlai February 7, 2025 23:42
@ken-zlai ken-zlai changed the title optimize: only return 3 percentiles from backend [wip] optimize: only return 3 percentiles from backend Feb 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (1)

137-138: Consider filtering length percentiles.

lengthPercentiles and stringLengthPercentiles could also benefit from filtering.

-.setLengthPercentiles(iterator(_.getLengthPercentiles).toArray |> transpose)
-.setStringLengthPercentiles(iterator(_.getStringLengthPercentiles).toArray |> transpose)
+.setLengthPercentiles(iterator(_.getLengthPercentiles).toArray |> filterPercentiles |> transpose)
+.setStringLengthPercentiles(iterator(_.getStringLengthPercentiles).toArray |> filterPercentiles |> transpose)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f862e81 and a6a62e0.

📒 Files selected for processing (1)
  • online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: non_spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (2)

48-48: LGTM!

Using Constants.magicNullDouble for consistency.


131-131: LGTM!

Pipeline modification correctly filters percentiles before transpose.

@ken-zlai ken-zlai changed the title [wip] optimize: only return 3 percentiles from backend optimize: only return 3 percentiles from backend Feb 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (1)

54-61: Extract percentile generation into a helper method.

Reduce duplication in test data setup.

+  private def generatePercentiles(offset: Double = 0.0): java.util.List[java.lang.Double] = {
+    (0 to 20).map(i => Double.box(i * 1.0 + offset)).asJava
+  }
+
-    val percentiles1 = (0 to 20).map(i => (i * 1.0)).map(Double.box).asJava
+    val percentiles1 = generatePercentiles()
-    val percentiles2 = (0 to 20).map(i => (i * 1.0 + 21)).map(Double.box).asJava
+    val percentiles2 = generatePercentiles(21.0)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 973de0b and bef3a20.

📒 Files selected for processing (1)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (2)

25-27: LGTM! Test data and assertions match the new percentile filtering.

Also applies to: 34-35


177-186: LGTM! Comprehensive null handling test.

Also applies to: 193-197

@ken-zlai
Copy link
Contributor Author

ken-zlai commented Feb 8, 2025

@nikhil-zlai After reviewing parameterizing the percentiles end-to-end, I found it requires significant changes. Given the time and risk, I suggest postponing this until we need more than p5, p50, and p95.

For now, we’ll filter the percentiles in filterPercentiles. When we have cycles, we can implement this using this ticket.

@ken-zlai ken-zlai requested a review from nikhil-zlai February 8, 2025 00:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (1)

1-419: Consider adding edge case tests.

Add test cases for:

  • Empty percentiles list
  • Single null percentile list
  • List with all null values
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between bef3a20 and d96b2f4.

📒 Files selected for processing (1)
  • online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3)

24-26: LGTM! Test case properly validates the filtering of percentiles.

The test correctly verifies that only p5, p50, and p95 are returned from a full list of 21 percentiles.

Also applies to: 33-34


53-55: LGTM! Test case thoroughly validates multiple entries.

The test effectively verifies the filtering and transposition of percentiles across multiple entries.

Also applies to: 58-60, 68-75


180-183: LGTM! Test case properly handles null values.

The test effectively verifies that null values are correctly replaced with magicNullDouble after filtering.

Also applies to: 186-189, 197-201

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

66-75: Extract percentile indices as constants.

Define the indices (1, 10, 19) as named constants to improve maintainability.

+  private val P5_INDEX = 1
+  private val P50_INDEX = 10
+  private val P95_INDEX = 19
   private def filterPercentiles(summary: TileSummary): TileSummary = {
     val filtered = new TileSummary(summary)
     if (summary.getPercentiles != null) {
       val filteredPercentiles = new java.util.ArrayList[java.lang.Double]()
-      // Keep only 5%, 50%, 95% percentiles (indices 1, 10, 19)
-      Seq(1, 10, 19).foreach(i => filteredPercentiles.add(summary.getPercentiles.get(i)))
+      Seq(P5_INDEX, P50_INDEX, P95_INDEX).foreach(i => filteredPercentiles.add(summary.getPercentiles.get(i)))
       filtered.setPercentiles(filteredPercentiles)
     }
     filtered
   }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0594c6e and cb8e4f5.

📒 Files selected for processing (2)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: non_spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

83-90: LGTM!

The implementation correctly filters percentiles before pivoting, aligning with the optimization goal.

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)

147-147: Verify the increased null ratio threshold.

The null ratio threshold has been increased from 0.1 to 0.2. Please confirm if this relaxation is intentional.


221-221: LGTM!

The breaks correctly align with the filtered percentiles (5%, 50%, 95%).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between cb8e4f5 and 48c3aaf.

📒 Files selected for processing (2)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (2 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: streaming_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

91-98: LGTM!

The implementation efficiently filters percentiles before pivoting, aligning with the optimization goal.

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)

221-221: LGTM!

Fixed sequence of breaks aligns with the optimization goal.


235-281: LGTM!

Comprehensive test coverage for both methods, including edge cases and error conditions.

@ken-zlai ken-zlai merged commit 7ada9c0 into main Feb 19, 2025
13 checks passed
@ken-zlai ken-zlai deleted the ken/reduce-percentiles-from-backend branch February 19, 2025 22:47

summaryTotals should be > 0
summaryNulls.toDouble / summaryTotals.toDouble should be < 0.1
summaryNulls.toDouble / summaryTotals.toDouble should be < 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has become flaky since the pr - can we revert or forward fix?

kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary
Changed the backend code to only compute 3 percentiles (p5, p50, p95)
for returning to the frontend.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced statistical data processing to consistently handle cases with
missing values by using a robust placeholder, ensuring clearer
downstream analytics.
- Adjusted the percentile chart configuration so that the 95th, 50th,
and 5th percentiles are accurately rendered, providing more reliable
insights for users.
- Relaxed the null ratio validation in summary data, allowing for a
broader acceptance of null values, which may affect drift metric
interpretations.

- **New Features**
- Introduced methods for converting percentile strings to index values
and filtering percentiles based on user-defined requests, improving data
handling and representation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary
Changed the backend code to only compute 3 percentiles (p5, p50, p95)
for returning to the frontend.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced statistical data processing to consistently handle cases with
missing values by using a robust placeholder, ensuring clearer
downstream analytics.
- Adjusted the percentile chart configuration so that the 95th, 50th,
and 5th percentiles are accurately rendered, providing more reliable
insights for users.
- Relaxed the null ratio validation in summary data, allowing for a
broader acceptance of null values, which may affect drift metric
interpretations.

- **New Features**
- Introduced methods for converting percentile strings to index values
and filtering percentiles based on user-defined requests, improving data
handling and representation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
Changed the backend code to only compute 3 percentiles (p5, p50, p95)
for returning to the frontend.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced statistical data processing to consistently handle cases with
missing values by using a robust placeholder, ensuring clearer
downstream analytics.
- Adjusted the percentile chart configuration so that the 95th, 50th,
and 5th percentiles are accurately rendered, providing more reliable
insights for users.
- Relaxed the null ratio validation in summary data, allowing for a
broader acceptance of null values, which may affect drift metric
interpretations.

- **New Features**
- Introduced methods for converting percentile strings to index values
and filtering percentiles based on user-defined requests, improving data
handling and representation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
Changed the backend code to only compute 3 percentiles (p5, p50, p95)
for returning to the frontend.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced statistical data processing to consistently handle cases with
missing values by using a robust placeholder, ensuring clearer
downstream analytics.
- Adjusted the percentile chart configuration so that the 95th, 50th,
and 5th percentiles are accurately rendered, providing more reliable
insights for users.
- Relaxed the null ratio validation in summary data, allowing for a
broader acceptance of null values, which may affect drift metric
interpretations.

- **New Features**
- Introduced methods for converting percentile strings to index values
and filtering percentiles based on user-defined requests, improving data
handling and representation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary
Changed the baour clientsend code to only compute 3 percentiles (p5, p50, p95)
for returning to the frontend.

## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Enhanced statistical data processing to consistently handle cases with
missing values by using a robust placeholder, ensuring clearer
downstream analytics.
- Adjusted the percentile chart configuration so that the 95th, 50th,
and 5th percentiles are accurately rendered, providing more reliable
insights for users.
- Relaxed the null ratio validation in summary data, allowing for a
broader acceptance of null values, which may affect drift metric
interpretations.

- **New Features**
- Introduced methods for converting percentile strings to index values
and filtering percentiles based on user-defined requests, improving data
handling and representation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants