-
Notifications
You must be signed in to change notification settings - Fork 9
optimize: only return 3 percentiles from backend #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request modifies null handling in the backend and updates percentile data mapping in the frontend. Specifically, the Changes
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (10)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
# Conflicts: # hub/src/main/scala/ai/chronon/hub/handlers/TimeSeriesHandler.scala # hub/src/test/scala/ai/chronon/hub/handlers/TimeSeriesHandlerTest.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (1)
137-138: Consider filtering length percentiles.lengthPercentiles and stringLengthPercentiles could also benefit from filtering.
-.setLengthPercentiles(iterator(_.getLengthPercentiles).toArray |> transpose) -.setStringLengthPercentiles(iterator(_.getStringLengthPercentiles).toArray |> transpose) +.setLengthPercentiles(iterator(_.getLengthPercentiles).toArray |> filterPercentiles |> transpose) +.setStringLengthPercentiles(iterator(_.getStringLengthPercentiles).toArray |> filterPercentiles |> transpose)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: non_spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
online/src/main/scala/ai/chronon/online/stats/PivotUtils.scala (2)
48-48: LGTM!Using Constants.magicNullDouble for consistency.
131-131: LGTM!Pipeline modification correctly filters percentiles before transpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (1)
54-61: Extract percentile generation into a helper method.Reduce duplication in test data setup.
+ private def generatePercentiles(offset: Double = 0.0): java.util.List[java.lang.Double] = { + (0 to 20).map(i => Double.box(i * 1.0 + offset)).asJava + } + - val percentiles1 = (0 to 20).map(i => (i * 1.0)).map(Double.box).asJava + val percentiles1 = generatePercentiles() - val percentiles2 = (0 to 20).map(i => (i * 1.0 + 21)).map(Double.box).asJava + val percentiles2 = generatePercentiles(21.0)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (2)
25-27: LGTM! Test data and assertions match the new percentile filtering.Also applies to: 34-35
177-186: LGTM! Comprehensive null handling test.Also applies to: 193-197
|
@nikhil-zlai After reviewing parameterizing the percentiles end-to-end, I found it requires significant changes. Given the time and risk, I suggest postponing this until we need more than p5, p50, and p95. For now, we’ll filter the percentiles in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (1)
1-419: Consider adding edge case tests.Add test cases for:
- Empty percentiles list
- Single null percentile list
- List with all null values
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala (3)
24-26: LGTM! Test case properly validates the filtering of percentiles.The test correctly verifies that only p5, p50, and p95 are returned from a full list of 21 percentiles.
Also applies to: 33-34
53-55: LGTM! Test case thoroughly validates multiple entries.The test effectively verifies the filtering and transposition of percentiles across multiple entries.
Also applies to: 58-60, 68-75
180-183: LGTM! Test case properly handles null values.The test effectively verifies that null values are correctly replaced with
magicNullDoubleafter filtering.Also applies to: 186-189, 197-201
online/src/test/scala/ai/chronon/online/test/stats/PivotUtilsTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)
66-75: Extract percentile indices as constants.Define the indices (1, 10, 19) as named constants to improve maintainability.
+ private val P5_INDEX = 1 + private val P50_INDEX = 10 + private val P95_INDEX = 19 private def filterPercentiles(summary: TileSummary): TileSummary = { val filtered = new TileSummary(summary) if (summary.getPercentiles != null) { val filteredPercentiles = new java.util.ArrayList[java.lang.Double]() - // Keep only 5%, 50%, 95% percentiles (indices 1, 10, 19) - Seq(1, 10, 19).foreach(i => filteredPercentiles.add(summary.getPercentiles.get(i))) + Seq(P5_INDEX, P50_INDEX, P95_INDEX).foreach(i => filteredPercentiles.add(summary.getPercentiles.get(i))) filtered.setPercentiles(filteredPercentiles) } filtered }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: streaming_tests
- GitHub Check: analyzer_tests
- GitHub Check: groupby_tests
- GitHub Check: join_tests
- GitHub Check: fetcher_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: non_spark_tests
- GitHub Check: spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)
83-90: LGTM!The implementation correctly filters percentiles before pivoting, aligning with the optimization goal.
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)
147-147: Verify the increased null ratio threshold.The null ratio threshold has been increased from 0.1 to 0.2. Please confirm if this relaxation is intentional.
221-221: LGTM!The breaks correctly align with the filtered percentiles (5%, 50%, 95%).
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: streaming_tests
- GitHub Check: join_tests
- GitHub Check: fetcher_tests
- GitHub Check: groupby_tests
- GitHub Check: analyzer_tests
- GitHub Check: non_spark_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: spark_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (3)
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)
91-98: LGTM!The implementation efficiently filters percentiles before pivoting, aligning with the optimization goal.
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)
221-221: LGTM!Fixed sequence of breaks aligns with the optimization goal.
235-281: LGTM!Comprehensive test coverage for both methods, including edge cases and error conditions.
|
|
||
| summaryTotals should be > 0 | ||
| summaryNulls.toDouble / summaryTotals.toDouble should be < 0.1 | ||
| summaryNulls.toDouble / summaryTotals.toDouble should be < 0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has become flaky since the pr - can we revert or forward fix?
## Summary Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Changed the baour clientsend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Enhanced statistical data processing to consistently handle cases with missing values by using a robust placeholder, ensuring clearer downstream analytics. - Adjusted the percentile chart configuration so that the 95th, 50th, and 5th percentiles are accurately rendered, providing more reliable insights for users. - Relaxed the null ratio validation in summary data, allowing for a broader acceptance of null values, which may affect drift metric interpretations. - **New Features** - Introduced methods for converting percentile strings to index values and filtering percentiles based on user-defined requests, improving data handling and representation. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Changed the backend code to only compute 3 percentiles (p5, p50, p95) for returning to the frontend.
Checklist
Summary by CodeRabbit
Bug Fixes
New Features