optimize: only return 3 percentiles from backend #346

ken-zlai · 2025-02-07T17:04:54Z

Summary

Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

Bug Fixes
- Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics.
- Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users.
- Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations.
New Features
- Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation.

coderabbitai · 2025-02-07T17:05:02Z

Walkthrough

The pull request modifies null handling in the backend and updates percentile data mapping in the frontend. Specifically, the transpose method in PivotUtils.scala now returns a constant value (Constants.magicNullDouble) instead of null when processing null lists. In the PercentileLineChart.svelte, the indices for the percentile series (p95, p50, p5) are updated, affecting how data is accessed from the data.percentiles array.

Changes

File	Summary
online/.../PivotUtils.scala	Modified `transpose` to return `Constants.magicNullDouble` instead of `null` when encountering null lists.
online/.../DriftStore.scala	Added `percentileToIndex` and `filterPercentiles` methods for handling percentiles in `TileSummary`.
spark/.../DriftTest.scala	Adjusted null ratio validation from 0.1 to 0.2 and modified breaks in time series conversion.
frontend/.../PercentileLineChart.svelte	Updated percentile series indices from `[19, 10, 1]` to `[2, 1, 0]`, altering data mapping in the chart.

Possibly related PRs

zipline-ai/chronon#279: Directly related to the modification of the transpose method to use Constants.magicNullDouble.
zipline-ai/chronon#293: Addresses handling of null values in data processing, similar to changes in PivotUtils.scala.
zipline-ai/chronon#347: Enhances null handling in the transpose method, indicating a direct relationship with the current PR.

Suggested reviewers

nikhil-zlai
sean-zlai

Poem

In lines of code, a twist so neat,
Nulls transformed with magic beat ✨,
Charts now sing a fresher tune,
Percentiles re-indexed, set to swoon,
Changes weave a joyful code retreat!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between dd234ef and 7110bf5.

📒 Files selected for processing (1)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: streaming_tests
GitHub Check: fetcher_tests
GitHub Check: spark_tests
GitHub Check: non_spark_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: groupby_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: frontend_tests
GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ken-zlai · 2025-02-07T20:20:50Z

todo on frontend: https://github.com/zipline-ai/chronon/pull/318/files#r1946935499

# Conflicts: # hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala # hub/src/test/scala/ai/chronon/hub/handlers/TimeSeriesHandlerTest.scala

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (1)
137-138: Consider filtering length percentiles.

lengthPercentiles and stringLengthPercentiles could also benefit from filtering.
-.setLengthPercentiles(iterator(_.getLengthPercentiles).toArray |> transpose)
-.setStringLengthPercentiles(iterator(_.getStringLengthPercentiles).toArray |> transpose)
+.setLengthPercentiles(iterator(_.getLengthPercentiles).toArray |> filterPercentiles |> transpose)
+.setStringLengthPercentiles(iterator(_.getStringLengthPercentiles).toArray |> filterPercentiles |> transpose)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f862e81 and a6a62e0.

📒 Files selected for processing (1)

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: scala_compile_fmt_fix
GitHub Check: non_spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (2)

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (2)

48-48: LGTM!

Using Constants.magicNullDouble for consistency.

131-131: LGTM!

Pipeline modification correctly filters percentiles before transpose.

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (1)

54-61: Extract percentile generation into a helper method.

Reduce duplication in test data setup.

+  private def generatePercentiles(offset: Double = 0.0): java.util.List[java.lang.Double] = {
+    (0 to 20).map(i => Double.box(i * 1.0 + offset)).asJava
+  }
+
-    val percentiles1 = (0 to 20).map(i => (i * 1.0)).map(Double.box).asJava
+    val percentiles1 = generatePercentiles()
-    val percentiles2 = (0 to 20).map(i => (i * 1.0 + 21)).map(Double.box).asJava
+    val percentiles2 = generatePercentiles(21.0)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 973de0b and bef3a20.

📒 Files selected for processing (1)

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (2)

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (2)

25-27: LGTM! Test data and assertions match the new percentile filtering.

Also applies to: 34-35

177-186: LGTM! Comprehensive null handling test.

Also applies to: 193-197

ken-zlai · 2025-02-08T00:56:14Z

@nikhil-zlai After reviewing parameterizing the percentiles end-to-end, I found it requires significant changes. Given the time and risk, I suggest postponing this until we need more than p5, p50, and p95.

For now, we’ll filter the percentiles in filterPercentiles. When we have cycles, we can implement this using this ticket.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (1)

1-419: Consider adding edge case tests.

Add test cases for:

Empty percentiles list

Single null percentile list

List with all null values

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between bef3a20 and d96b2f4.

📒 Files selected for processing (1)

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (3)

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3)

24-26: LGTM! Test case properly validates the filtering of percentiles.

The test correctly verifies that only p5, p50, and p95 are returned from a full list of 21 percentiles.

Also applies to: 33-34

53-55: LGTM! Test case thoroughly validates multiple entries.

The test effectively verifies the filtering and transposition of percentiles across multiple entries.

Also applies to: 58-60, 68-75

180-183: LGTM! Test case properly handles null values.

The test effectively verifies that null values are correctly replaced with magicNullDouble after filtering.

Also applies to: 186-189, 197-201

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

66-75: Extract percentile indices as constants.

Define the indices (1, 10, 19) as named constants to improve maintainability.

+  private val P5_INDEX = 1
+  private val P50_INDEX = 10
+  private val P95_INDEX = 19
   private def filterPercentiles(summary: TileSummary): TileSummary = {
     val filtered = new TileSummary(summary)
     if (summary.getPercentiles != null) {
       val filteredPercentiles = new java.util.ArrayList[java.lang.Double]()
-      // Keep only 5%, 50%, 95% percentiles (indices 1, 10, 19)
-      Seq(1, 10, 19).foreach(i => filteredPercentiles.add(summary.getPercentiles.get(i)))
+      Seq(P5_INDEX, P50_INDEX, P95_INDEX).foreach(i => filteredPercentiles.add(summary.getPercentiles.get(i)))
       filtered.setPercentiles(filteredPercentiles)
     }
     filtered
   }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 0594c6e and cb8e4f5.

📒 Files selected for processing (2)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: streaming_tests
GitHub Check: analyzer_tests
GitHub Check: groupby_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: non_spark_tests
GitHub Check: spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (3)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

83-90: LGTM!

The implementation correctly filters percentiles before pivoting, aligning with the optimization goal.

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)

147-147: Verify the increased null ratio threshold.

The null ratio threshold has been increased from 0.1 to 0.2. Please confirm if this relaxation is intentional.

221-221: LGTM!

The breaks correctly align with the filtered percentiles (5%, 50%, 95%).

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between cb8e4f5 and 48c3aaf.

📒 Files selected for processing (2)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (2 hunks)
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: streaming_tests
GitHub Check: join_tests
GitHub Check: fetcher_tests
GitHub Check: groupby_tests
GitHub Check: analyzer_tests
GitHub Check: non_spark_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: spark_tests
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (3)

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)

91-98: LGTM!

The implementation efficiently filters percentiles before pivoting, aligning with the optimization goal.

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)

221-221: LGTM!

Fixed sequence of breaks aligns with the optimization goal.

235-281: LGTM!

Comprehensive test coverage for both methods, including edge cases and error conditions.

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala

…{}.format

nikhil-zlai · 2025-02-21T01:17:56Z

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala


    summaryTotals should be > 0
-    summaryNulls.toDouble / summaryTotals.toDouble should be < 0.1
+    summaryNulls.toDouble / summaryTotals.toDouble should be < 0.2


this has become flaky since the pr - can we revert or forward fix?

## Summary Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation.

## Summary Changed the baour clientsend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation.

optimize: only return 3 percentiles from backend

ec0278a

ken-zlai requested a review from nikhil-zlai February 7, 2025 17:04

ken-zlai mentioned this pull request Feb 7, 2025

Observability frontend update #318

Merged

4 tasks

ken-zlai added 3 commits February 7, 2025 15:40

Merge branch 'main' into ken/reduce-percentiles-from-backend

bdb7a92

# Conflicts: # hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala # hub/src/test/scala/ai/chronon/hub/handlers/TimeSeriesHandlerTest.scala

cant put null in percentile list

a25cd73

breaks is no longer used, put it back

f862e81

ken-zlai removed the request for review from nikhil-zlai February 7, 2025 23:42

ken-zlai changed the title ~~optimize: only return 3 percentiles from backend~~ [wip] optimize: only return 3 percentiles from backend Feb 7, 2025

filterPercentiles

a6a62e0

coderabbitai bot reviewed Feb 8, 2025

View reviewed changes

online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala Outdated Show resolved Hide resolved

ken-zlai changed the title ~~[wip] optimize: only return 3 percentiles from backend~~ optimize: only return 3 percentiles from backend Feb 8, 2025

ken-zlai added 2 commits February 7, 2025 17:26

update frontend to use new percentile indexes

973de0b

update tests

bef3a20

coderabbitai bot reviewed Feb 8, 2025

View reviewed changes

ken-zlai requested a review from nikhil-zlai February 8, 2025 00:56

Merge branch 'main' into ken/reduce-percentiles-from-backend

d96b2f4

coderabbitai bot reviewed Feb 18, 2025

View reviewed changes

nikhil-zlai reviewed Feb 19, 2025

View reviewed changes

online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala Outdated Show resolved Hide resolved

ken-zlai added 3 commits February 19, 2025 09:53

Merge branch 'main' into ken/reduce-percentiles-from-backend

d5eebdd

restore to main versions

0594c6e

filter in DriftStore

cb8e4f5

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

nikhil-zlai reviewed Feb 19, 2025

View reviewed changes

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala Outdated Show resolved Hide resolved

nikhil-zlai reviewed Feb 19, 2025

View reviewed changes

spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala Outdated Show resolved Hide resolved

ken-zlai added 2 commits February 19, 2025 11:54

use p5, p50, p95

1b7428a

add requestedPercentiles to filterPercentiles

48c3aaf

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala Outdated Show resolved Hide resolved

online/src/main/scala/ai/chronon/online/stats/DriftStore.scala Show resolved Hide resolved

ken-zlai added 4 commits February 19, 2025 13:23

format

4721df6

bazel run //spark:test_lib.format

dd234ef

bazel query 'kind("scala_library.*", //...)' | xargs -I {} bazel run …

7110bf5

…{}.format

Merge branch 'main' into ken/reduce-percentiles-from-backend

1e94231

nikhil-zlai approved these changes Feb 19, 2025

View reviewed changes

ken-zlai merged commit 7ada9c0 into main Feb 19, 2025
13 checks passed

ken-zlai deleted the ken/reduce-percentiles-from-backend branch February 19, 2025 22:47

nikhil-zlai reviewed Feb 21, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Feb 25, 2025

modify functions to allow frontend to pass percentiles #434

Merged

4 tasks

coderabbitai bot mentioned this pull request Mar 5, 2025

Improve PercentileLineChart perf (and thus Summary page) by over 20x #469

Merged

4 tasks

optimize: only return 3 percentiles from backend #346

optimize: only return 3 percentiles from backend #346

Uh oh!

Conversation

ken-zlai commented Feb 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

ken-zlai commented Feb 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ken-zlai commented Feb 8, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikhil-zlai Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ken-zlai commented Feb 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)