Skip to content

Conversation

@varant-zlai
Copy link
Collaborator

@varant-zlai varant-zlai commented Apr 12, 2025

Summary

Modify label join behavior -- if using temporal accuracy be sure to execute the temporalEvents aggregator for PITC accuracy. Also modify to support sub-day windows.

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Enhanced temporal data handling with support for multiple time windows, enabling dynamic processing of labeled events.
    • Introduced refined grouping and filtering capabilities for more accurate time-based event computations.
    • Added a method to shift time ranges in milliseconds for improved temporal adjustments.
    • New documentation sections on debugging unit tests and details regarding tailHops behavior.
  • Refactor
    • Updated join processing logic to seamlessly handle various data inputs and event types, resulting in improved flexibility and performance in temporal computations.
  • Tests
    • Enhanced test suite with new temporal dimensions and refined naming conventions in SQL queries.
    • Added a new test case for validating temporal label joins.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 12, 2025

Walkthrough

The changes enhance the LabelJoinV2 module by introducing a new windows field in the LabelPartOutputInfo case class to store a sequence of api.Window objects. The compute method now iterates over multiple right DataFrames to handle various accuracy types. A dedicated method, computeTemporalLabelJoinPart, manages temporal label joins by adjusting timestamps based on each window. Additionally, new methods genGroupBy and filterGroupByWindows have been added to handle grouping and filtering of windows during computations.

Changes

File Change Summary
spark/.../LabelJoinV2.scala • Added windows field to LabelPartOutputInfo.
• Modified compute to iterate over multiple right DataFrames.
• Added computeTemporalLabelJoinPart method.
• Introduced genGroupBy and filterGroupByWindows methods for window grouping/filtering.
api/.../DataRange.scala • Added shiftMillis method to PartitionRange for time adjustments in milliseconds.
docs/source/authoring_features/Labels.md • Added section regarding tailHops behavior related to inference timestamps.
docs/source/dev/devnotes.md • Introduced "Debugging Unit Tests with Spark SQL" section, detailing steps for debugging Spark unit tests.
spark/.../LabelJoinV2Test.scala • Added variables for partition specifications.
• Modified SQL queries to include unit_test prefixes.
• Introduced a new test case for temporal label parts.

Sequence Diagram(s)

sequenceDiagram
    participant LDF as Left DataFrame
    participant Comp as compute method
    participant Temp as computeTemporalLabelJoinPart
    participant Group as GroupBy Utils
    LDF->>Comp: Provide input data & window info
    Comp->>Temp: Invoke temporal label join
    loop For each window
        Temp->>Group: Generate groupBy configuration
        Group-->>Temp: Return filtered grouping
    end
    Temp-->>Comp: Return computed DataFrame(s)
Loading

Possibly related PRs

  • Simple LabelJoin flow #546: Introduced the initial LabelJoinV2 class and basic label join flow; this PR extends it with temporal window support and enhanced join logic.

Suggested reviewers

  • nikhil-zlai
  • piyush-zlai

Poem

In the realm of code so keen,
Windows open where data's seen.
Joins grow temporal, crisp and clear,
Grouped and filtered with a cheer.
CodeRabbit crafts with subtle art,
Bringing logic and time to heart.

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.


// Cache the left DF because it's used multiple times in offsetting in the case that there are Temporal Events
if (joinOutputInfo.labelPartOutputInfos.exists(_.labelPart.groupBy.dataModel == Accuracy.TEMPORAL)) {
joinBaseDf.cache()
Copy link
Collaborator

@tchow-zlai tchow-zlai Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign the cached result to a val and use it later. Otherwise I think you may not be referencing the cached dataframe down below.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)

244-248: Consider storage level
Caching is fine, but for large data, specify a persist level if needed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a4fcedf and c7a5ac0.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (6)
api/src/main/scala/ai/chronon/api/DataRange.scala (1)
  • toTimeRange (169-173)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (5)
  • spark (393-397)
  • GroupBy (54-406)
  • GroupBy (409-769)
  • temporalEvents (291-369)
  • from (484-578)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (1)
  • JoinUtils (37-622)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
  • range (125-139)
spark/src/main/scala/ai/chronon/spark/LabelJoin.scala (1)
  • joinWithLeft (258-292)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
  • genGroupBy (158-168)
⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: join_tests
  • GitHub Check: batch_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: batch_tests
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (8)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (8)

3-3: Imports for sub-day logic
They appear correct and consistent.

Also applies to: 5-5, 9-9, 11-11


23-25: Add test coverage
Ensure sub-day windows are verified in your unit tests.


84-100: Check window rounding
Using math.ceil could miss partial windows. Test edge cases carefully.


105-105: Aggregates parted windows
No issues spotted.


251-289: Validate sub-day accuracy
Logic looks consistent for snapshot vs. temporal. Add thorough tests.


301-330: Time shift caution
DST or offset issues might arise with minute-based shifts. Confirm correctness.


332-345: GroupBy generation
Implementation is clear. No issues found.


346-362: Filtering windows
Good approach to keep only relevant windows.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)

252-256: Consider storing the cached DataFrame in a val
This repeats a prior suggestion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between c7a5ac0 and 7d10bcf.

📒 Files selected for processing (2)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (8 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (6)
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
  • DataFrameGen (39-158)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
  • save (141-150)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
  • Join (64-539)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
  • computeBackfill (693-768)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)
  • compute (126-132)
spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)
  • Comparison (31-126)
  • sideBySide (63-120)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (4)
api/src/main/scala/ai/chronon/api/DataRange.scala (1)
  • toTimeRange (169-173)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (4)
  • spark (393-397)
  • GroupBy (54-406)
  • GroupBy (409-769)
  • from (484-578)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
  • JoinUtils (37-622)
  • leftDf (68-95)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
  • range (125-139)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (18)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)

25-25: No concerns.

spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (17)

3-14: Imports look correct.


25-27: Helpful comment additions.


43-43: No issues with shifted column name.


52-56: Schema usage is fine.


90-113: Sub-day logic rounding
Confirm that rounding hours to full days is desired. Sub-day windows effectively become day offsets.


260-260: No problem.


264-265: Snapshot vs. temporal
Logic is concise.


279-291: Temporal accuracy flow
Implementation looks correct for multiple sub-day windows.


293-293: Tuple creation is clear.


297-299: Sequential join is good.


312-350: Temporal label join
Liking the shift-based approach. Confirm it handles edge cases for partial hour windows.


352-384: Window filtering
Simple and effective.


387-387: isTemporal usage
Appropriately toggles join behavior.


403-403: Prefix naming
No concern.


406-406: Dropping old label columns
Helps avoid collisions.


408-408: Column prefixing
Works well for clarity.


422-422: Outer join
Logic is correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)

379-478: Test name vs. actual windows mismatch.

The test name mentions an hour window, but only day-based windows are defined. Consider enabling the commented 12-hour window to fully exercise sub-day logic.

-          windows = Seq(new Window(1, TimeUnit.DAYS), new Window(7, TimeUnit.DAYS))) // new Window(12, TimeUnit.HOURS),
+          windows = Seq(new Window(12, TimeUnit.HOURS), new Window(1, TimeUnit.DAYS), new Window(7, TimeUnit.DAYS)))
🧹 Nitpick comments (1)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)

458-458: Label naming inconsistency.

This label name doesn't follow the pattern used elsewhere in the file. Should include the namespace prefix.

-         | SELECT j.*, gb.time_spent_ms_sum_7d as label__time_spent_ms_sum_7d FROM
+         | SELECT j.*, gb.time_spent_ms_sum_7d as label__unit_test_item_views_label_time_spent_ms_sum_7d FROM
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7d10bcf and 3d677d5.

📒 Files selected for processing (2)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (8 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (4)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
  • Join (64-539)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (1)
  • computeBackfill (693-768)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)
  • compute (126-132)
spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)
  • Comparison (31-126)
  • sideBySide (63-120)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (5)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (5)

25-25: Variable added for new test cases.

The new variable thirtyOneDaysAgo is correctly defined to support the new test cases.


480-603: Well-structured temporal label parts test.

The test provides comprehensive coverage of sub-day windows (2h) and daily windows (1d), properly validating the temporal aggregation functionality.


512-512: Sub-day window properly implemented.

Good use of 2-hour window to test sub-day temporal aggregation capabilities.


548-554: Helpful debugging SQL.

Good addition of intermediate SQL validation to assist with troubleshooting.


563-588: Thorough temporal CASE expression testing.

The SQL correctly validates point-in-time consistency with proper time window calculations using millisecond offsets.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)

377-500: Test name doesn't accurately reflect window sizes.

The test includes 2-hour window (sub-day) but test name is generic. Consider renaming to clearly indicate sub-day window testing.

The test effectively validates the temporal label join functionality with sub-day windows as required in the PR objectives.

🧹 Nitpick comments (2)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (2)

409-410: Include additional sub-day windows for comprehensive testing.

Currently only testing 2-hour and 1-day windows. Consider adding a 12-hour window to more thoroughly test different sub-day scenarios.

-                             windows = Seq(new Window(2, TimeUnit.HOURS), new Window(1, TimeUnit.DAYS)))
+                             windows = Seq(new Window(2, TimeUnit.HOURS), new Window(12, TimeUnit.HOURS), new Window(1, TimeUnit.DAYS)))

416-417: Remove debug logging in production code.

This debug log statement should be removed before merging.

-    logger.info(s"Labels group by: ${labelsGroupBy.accuracy} ${labelsGroupBy.inferredAccuracy}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a4734a9 and 71440f6.

📒 Files selected for processing (1)
  • spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (9 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (4)
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
  • DataFrameGen (39-158)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
  • Join (64-539)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)
  • compute (126-132)
spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)
  • Comparison (31-126)
  • sideBySide (63-120)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: groupby_tests
  • GitHub Check: batch_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: batch_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (4)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (4)

25-25: LGTM - Added variable for test time window.

This new variable provides consistency with existing date variables.


116-116: Label name update for clarity.

Changed label name to include the feature source identifier, improving readability.


240-275: Improved label naming convention.

Updated label names to consistently include feature source identifiers.


312-352: Label naming updates for second run test case.

Consistent renaming pattern applied to second run test.

@tchow-zlai tchow-zlai force-pushed the vz--make_label_join_temporally_accurate branch from 71440f6 to 13612d5 Compare April 13, 2025 20:23
@varant-zlai varant-zlai force-pushed the vz--make_label_join_temporally_accurate branch from 5744504 to bec88bc Compare April 13, 2025 23:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
docs/source/authoring_features/Labels.md (1)

1-3: Consider using more concise wording in the documentation.

Replace "prior to" with "before" for conciseness.

-Especially call out that tailHops can sometimes join a label prior to the inference timestamp if the join key doesn't guarantee that not to happen.
+Especially call out that tailHops can sometimes join a label before the inference timestamp if the join key doesn't guarantee that not to happen.
🧰 Tools
🪛 LanguageTool

[style] ~3-~3: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...hat tailHops can sometimes join a label prior to the inference timestamp if the join key...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

docs/source/dev/devnotes.md (2)

535-537: Add language specifier to code block.

Specify the language for the fenced code block.

-   ```
+   ```text
    Setting default warehouse directory: file:/tmp/chronon/spark-warehouse_f33f00
    ```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

535-535: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


580-583: Standardize markdown list style.

Use asterisks instead of dashes for consistency with other lists.

-This approach is useful for various test debugging scenarios, such as:
-- Examining actual vs. expected data in failed assertions
-- Checking if data was written correctly by your test
-- Understanding join and aggregation results
-- Verifying partitioning is correct
+This approach is useful for various test debugging scenarios, such as:
+* Examining actual vs. expected data in failed assertions
+* Checking if data was written correctly by your test
+* Understanding join and aggregation results
+* Verifying partitioning is correct
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

580-580: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


581-581: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


582-582: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


583-583: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

api/src/main/scala/ai/chronon/api/DataRange.scala (1)

123-123: Consider adding a comment explaining the end-of-day calculation.

The magic number calculation for end-of-day could use a clarifying comment.

-        val endTimeMillis = partitionSpec.epochMillis(end) + (24 * 60 * 60 * 1000 - 1) // End of day (23:59:59.999)
+        // Add milliseconds for end of day (23:59:59.999) to ensure we cover the full day
+        val endTimeMillis = partitionSpec.epochMillis(end) + (24 * 60 * 60 * 1000 - 1) // End of day (23:59:59.999)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)

410-410: Test with more sub-day window sizes.

The test currently uses 2-hour and 1-day windows. Consider adding tests with different hour values.

        Builders.Aggregation(operation = Operation.SUM,
                             inputColumn = "time_spent_ms",
-                             windows = Seq(new Window(2, TimeUnit.HOURS), new Window(1, TimeUnit.DAYS)))
+                             windows = Seq(new Window(2, TimeUnit.HOURS), new Window(12, TimeUnit.HOURS), new Window(1, TimeUnit.DAYS)))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 5744504 and bec88bc.

📒 Files selected for processing (5)
  • api/src/main/scala/ai/chronon/api/DataRange.scala (1 hunks)
  • docs/source/authoring_features/Labels.md (1 hunks)
  • docs/source/dev/devnotes.md (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala
🧰 Additional context used
🧬 Code Graph Analysis (2)
api/src/main/scala/ai/chronon/api/DataRange.scala (2)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)
  • millis (54-60)
  • millis (82-82)
api/src/main/scala/ai/chronon/api/PartitionSpec.scala (2)
  • epochMillis (41-43)
  • at (46-46)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (5)
spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala (1)
  • DataFrameGen (39-158)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (1)
  • save (141-150)
spark/src/main/scala/ai/chronon/spark/Join.scala (1)
  • Join (64-539)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)
  • compute (125-131)
spark/src/main/scala/ai/chronon/spark/Comparison.scala (2)
  • Comparison (31-126)
  • sideBySide (63-120)
🪛 markdownlint-cli2 (0.17.2)
docs/source/dev/devnotes.md

535-535: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


580-580: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


581-581: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


582-582: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


583-583: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

🪛 LanguageTool
docs/source/authoring_features/Labels.md

[style] ~3-~3: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...hat tailHops can sometimes join a label prior to the inference timestamp if the join key...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: batch_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: batch_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (5)
docs/source/dev/devnotes.md (1)

518-584: Great addition for debugging tests!

Well-structured guide that will help developers diagnose issues in Spark unit tests.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

535-535: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


580-580: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


581-581: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


582-582: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


583-583: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

api/src/main/scala/ai/chronon/api/DataRange.scala (1)

107-131: New method correctly handles millisecond shifts.

The implementation correctly handles edge cases like null values and zero shifts.

spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (3)

25-25: Added new time variables for test cases.

New variables provide additional test points for more comprehensive coverage.

Also applies to: 33-33


117-117: Updated column naming for clarity.

Column aliases now clearly indicate the source groupBy.

Also applies to: 241-275, 313-353


378-524:

✅ Verification successful

Added test for sub-day windows with temporal accuracy.

Good test case that exercises the new temporal accuracy functionality with both hour and day windows.


🏁 Script executed:

#!/bin/bash
# Check for any commented-out window definitions that might be testing candidates
grep -A5 -B5 "TimeUnit.HOURS" spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala

Length of output: 609


Sub-day window test confirmed

The test effectively exercises temporal accuracy using both hour and day windows, with the window definitions correctly implemented. No further changes are necessary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)

251-253: Good caching for temporal accuracy cases

Caching the base DataFrame when temporal accuracy is involved improves performance when it's used multiple times for offset calculations.

🧹 Nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)

329-329: Fix incomplete log message

The log statement ends abruptly with "and".

-        s"Computing temporal label join for ${labelOutputInfo.labelPart.groupBy.metaData.name} with shifted range: $shiftedLeftPartitionRange and ")
+        s"Computing temporal label join for ${labelOutputInfo.labelPart.groupBy.metaData.name} with shifted range: $shiftedLeftPartitionRange")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between bec88bc and 59aea68.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (7)
api/src/main/scala/ai/chronon/api/DataRange.scala (3)
  • PartitionRange (38-154)
  • PartitionRange (156-200)
  • shiftMillis (107-131)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (4)
  • spark (393-397)
  • GroupBy (54-406)
  • GroupBy (409-769)
  • from (484-578)
spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala (1)
  • GroupBy (43-202)
spark/src/main/scala/ai/chronon/spark/streaming/JoinSourceRunner.scala (1)
  • outputSchema (154-162)
api/src/main/scala/ai/chronon/api/Builders.scala (3)
  • Builders (27-371)
  • AggregationPart (68-85)
  • Query (41-66)
api/src/main/scala/ai/chronon/api/Constants.scala (1)
  • Constants (23-100)
spark/src/main/scala/ai/chronon/spark/batch/JoinPartJob.scala (1)
  • genGroupBy (158-168)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: batch_tests
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (9)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (9)

26-26: Added windows field to LabelPartOutputInfo case class

New field helps limit computation needed in TemporalEvents scenarios.


90-107: Correctly handles sub-day windows by converting to day-equivalent lengths

The implementation properly calculates effective window length for various time units, ensuring day-level partitioning works with sub-day windows.


259-283: Proper handling of accuracy types with structured pattern matching

The code clearly separates logic for SNAPSHOT vs TEMPORAL accuracy, with appropriate comments explaining each path.


288-291: Improved join handling for multiple DataFrames

Logic now supports multiple right DataFrames for each join part, allowing sub-day windows to be processed correctly.


303-339: Well-implemented temporal label join computation

The method correctly:

  1. Calculates time offsets for different window units
  2. Shifts timestamps appropriately
  3. Uses the GenGroupBy method with filtered windows
  4. Shifts timestamps back after computation

341-350: Good modularization with genGroupBy method

Creates GroupBy objects for specific windows with needed filtering, improving code reusability.


352-363: Well-designed filtering of GroupBy windows

Method efficiently filters GroupBy configuration to include only specified windows, preventing unnecessary computation.


367-370: Proper handling of temporal join keys

Join keys now include time and partition columns when temporal accuracy is used, ensuring correct join behavior.


1-406:

✅ Verification successful

Add unit tests for sub-day window support

PR objective mentions that unit tests are still needed. This is critical for validating new sub-day window functionality.


🏁 Script executed:

#!/bin/bash
# Check if any tests exist for the new temporal label join functionality

echo "Searching for existing test cases for temporal label joins..."
rg -A 10 "test.*temporal.*label" --type=scala

Length of output: 4794


Sub-day window tests verified.

  • Found tests for temporal label parts in spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (e.g., testing 2-hour windows).
  • Existing unit tests validate new sub-day window functionality.

val isTemporal = joinPart.groupBy.inferredAccuracy == Accuracy.TEMPORAL
val partLeftKeys =
joinPart.rightToLeft.values.toArray ++ (if (isTemporal) Seq(Constants.TimeColumn, tableUtils.partitionColumn)
else Seq.empty)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the partition column even if it's not a temporal join?

@varant-zlai varant-zlai changed the title WIP -- Update LabelJoin to support sub-day windows and temporal accuracy Update LabelJoin to support sub-day windows and temporal accuracy Apr 14, 2025
@varant-zlai varant-zlai force-pushed the vz--make_label_join_temporally_accurate branch from 7fb6263 to 56babf3 Compare April 14, 2025 19:52
@varant-zlai varant-zlai force-pushed the vz--make_label_join_temporally_accurate branch from a35a299 to 901859a Compare April 14, 2025 21:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
docs/source/authoring_features/Labels.md (2)

1-2: Address the TODO.

Add more details or remove the placeholder.


3-3: Use simpler phrase.

Consider changing "prior to" → "before" for brevity.

🧰 Tools
🪛 LanguageTool

[style] ~3-~3: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...hat tailHops can sometimes join a label prior to the inference timestamp if the join key...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

docs/source/dev/devnotes.md (2)

535-535: Specify language in code block.

Add a language tag (e.g., ```bash) to fix lint warning.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

535-535: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


580-583: Use consistent list markers.

Switch from dashes to asterisks to match the existing style.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

580-580: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


581-581: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


582-582: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


583-583: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)

379-527: Good coverage of temporal logic.

Optional: add boundary cases or negative scenarios for sub-day windows.

spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (1)

51-56: Check repeated calls.

Caching or reusing the schema might improve performance if invoked often.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 56babf3 and 901859a.

📒 Files selected for processing (5)
  • api/src/main/scala/ai/chronon/api/DataRange.scala (1 hunks)
  • docs/source/authoring_features/Labels.md (1 hunks)
  • docs/source/dev/devnotes.md (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (7 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/src/main/scala/ai/chronon/api/DataRange.scala
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (7)
api/src/main/scala/ai/chronon/api/DataRange.scala (3)
  • PartitionRange (38-154)
  • PartitionRange (156-200)
  • toTimeRange (195-199)
online/src/main/scala/ai/chronon/online/serde/SparkConversions.scala (1)
  • SparkConversions (56-159)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (5)
  • spark (393-397)
  • GroupBy (54-406)
  • GroupBy (409-769)
  • from (484-578)
  • temporalEvents (291-369)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
  • JoinUtils (37-622)
  • leftDf (68-95)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (3)
  • range (125-139)
  • partitionColumn (289-291)
  • prefixColumnNames (152-156)
spark/src/main/scala/ai/chronon/spark/LabelJoin.scala (1)
  • joinWithLeft (258-292)
api/src/main/scala/ai/chronon/api/Constants.scala (1)
  • Constants (23-100)
🪛 LanguageTool
docs/source/authoring_features/Labels.md

[style] ~3-~3: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...hat tailHops can sometimes join a label prior to the inference timestamp if the join key...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

🪛 markdownlint-cli2 (0.17.2)
docs/source/dev/devnotes.md

535-535: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


580-580: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


581-581: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


582-582: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


583-583: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (9)
docs/source/dev/devnotes.md (1)

518-519: Good addition.

Nice explanation for Spark SQL debugging.

spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (4)

24-24: No issues found.

Definition is consistent with existing fields.


32-32: No issues found.

Definition is consistent with naming style.


36-37: Looks good.

Test name is descriptive and code is clear.


138-139: Clean addition.

Multiple label parts test is well-structured.

spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (4)

25-27: Helpful addition.

Storing windows per label part clarifies sub-day handling.


253-254: Effective caching.

Good decision to cache when temporal logic is used.


305-340: Clear temporal join logic.

Ensure thorough testing of timestamp offsets, especially sub-day edge cases.


367-405: Join key extension is solid.

Temporal condition looks consistent. No major issues spotted.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (2)

252-254: Consider explicitly uncaching the DataFrame.

The cached DataFrame is never explicitly uncached, which could lead to memory pressure.

 // Cache the left DF because it's used multiple times in offsetting in the case that there are Temporal Events
 if (joinOutputInfo.labelPartOutputInfos.exists(_.labelPart.groupBy.dataModel == Accuracy.TEMPORAL)) {
-  joinBaseDf.cache()
+  val cachedJoinBaseDf = joinBaseDf.cache()
+  // Use cachedJoinBaseDf instead of joinBaseDf in subsequent operations
+  // Add an uncache call after all operations are complete
 }

330-331: Fix incomplete log message.

The log message is cut off.

logger.info(
-  s"Computing temporal label join for ${labelOutputInfo.labelPart.groupBy.metaData.name} with shifted range: $shiftedLeftPartitionRange and ")
+  s"Computing temporal label join for ${labelOutputInfo.labelPart.groupBy.metaData.name} with shifted range: $shiftedLeftPartitionRange")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 901859a and a4ccb8a.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (4)
spark/src/main/scala/ai/chronon/spark/GroupBy.scala (5)
  • spark (393-397)
  • GroupBy (54-406)
  • GroupBy (409-769)
  • from (484-578)
  • temporalEvents (291-369)
spark/src/main/scala/ai/chronon/spark/JoinUtils.scala (2)
  • JoinUtils (37-622)
  • leftDf (68-95)
spark/src/main/scala/ai/chronon/spark/Extensions.scala (3)
  • range (125-139)
  • partitionColumn (289-291)
  • prefixColumnNames (152-156)
api/src/main/scala/ai/chronon/api/Constants.scala (1)
  • Constants (23-100)
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: batch_tests
  • GitHub Check: batch_tests
  • GitHub Check: join_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (9)
spark/src/main/scala/ai/chronon/spark/batch/LabelJoinV2.scala (9)

3-15: Good import organization.

Required imports added for new temporal accuracy support.


26-27: Appropriate case class update.

Added windows field to store relevant Window objects for temporal calculations.


51-55: Good optimization in schema derivation.

Using GroupBy schema instead of scanning output table directly.


89-107: Well-implemented sub-day window support.

Effective handling of different time units with proper rounding to maintain day-level partitioning.


260-284: Well-structured accuracy handling.

Clean separation between SNAPSHOT and TEMPORAL accuracy types.


289-293: Efficient multiple DataFrame folding.

Good approach for handling multiple right DataFrames per label join part.


304-340: Well-implemented temporal label join computation.

Correctly shifts timestamps for PITC accuracy with proper handling of different time units.


342-364: Good helper methods for window filtering.

These methods improve code organization and performance by limiting computation to relevant windows.


367-371: Correct join key handling for temporal accuracy.

Including time and partition columns for temporal joins ensures proper alignment.

@varant-zlai varant-zlai merged commit 21e1c5c into main Apr 14, 2025
21 checks passed
@varant-zlai varant-zlai deleted the vz--make_label_join_temporally_accurate branch April 14, 2025 23:09
@coderabbitai coderabbitai bot mentioned this pull request Apr 25, 2025
4 tasks
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary

Modify label join behavior -- if using temporal accuracy be sure to
execute the temporalEvents aggregator for PITC accuracy. Also modify to
support sub-day windows.

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced temporal data handling with support for multiple time
windows, enabling dynamic processing of labeled events.
- Introduced refined grouping and filtering capabilities for more
accurate time-based event computations.
- Added a method to shift time ranges in milliseconds for improved
temporal adjustments.
- New documentation sections on debugging unit tests and details
regarding `tailHops` behavior.
- **Refactor**
- Updated join processing logic to seamlessly handle various data inputs
and event types, resulting in improved flexibility and performance in
temporal computations.
- **Tests**
- Enhanced test suite with new temporal dimensions and refined naming
conventions in SQL queries.
	- Added a new test case for validating temporal label joins.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary

Modify label join behavior -- if using temporal accuracy be sure to
execute the temporalEvents aggregator for PITC accuracy. Also modify to
support sub-day windows.

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced temporal data handling with support for multiple time
windows, enabling dynamic processing of labeled events.
- Introduced refined grouping and filtering capabilities for more
accurate time-based event computations.
- Added a method to shift time ranges in milliseconds for improved
temporal adjustments.
- New documentation sections on debugging unit tests and details
regarding `tailHops` behavior.
- **Refactor**
- Updated join processing logic to seamlessly handle various data inputs
and event types, resulting in improved flexibility and performance in
temporal computations.
- **Tests**
- Enhanced test suite with new temporal dimensions and refined naming
conventions in SQL queries.
	- Added a new test case for validating temporal label joins.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

Modify label join behavior -- if using temporal accuracy be sure to
execute the temporalEvents aggregator for PITC accuracy. Also modify to
support sub-day windows.

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced temporal data handling with support for multiple time
windows, enabling dynamic processing of labeled events.
- Introduced refined grouping and filtering capabilities for more
accurate time-based event computations.
- Added a method to shift time ranges in milliseconds for improved
temporal adjustments.
- New documentation sections on debugging unit tests and details
regarding `tailHops` behavior.
- **Refactor**
- Updated join processing logic to seamlessly handle various data inputs
and event types, resulting in improved flexibility and performance in
temporal computations.
- **Tests**
- Enhanced test suite with new temporal dimensions and refined naming
conventions in SQL queries.
	- Added a new test case for validating temporal label joins.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

Modify label join behavior -- if using temporal accuracy be sure to
execute the temporalEvents aggregator for PITC accuracy. Also modify to
support sub-day windows.

## Checklist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced temporal data handling with support for multiple time
windows, enabling dynamic processing of labeled events.
- Introduced refined grouping and filtering capabilities for more
accurate time-based event computations.
- Added a method to shift time ranges in milliseconds for improved
temporal adjustments.
- New documentation sections on debugging unit tests and details
regarding `tailHops` behavior.
- **Refactor**
- Updated join processing logic to seamlessly handle various data inputs
and event types, resulting in improved flexibility and performance in
temporal computations.
- **Tests**
- Enhanced test suite with new temporal dimensions and refined naming
conventions in SQL queries.
	- Added a new test case for validating temporal label joins.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

Modify label join behavior -- if using temporal accuracy be sure to
execute the temporalEvents aggregator for PITC accuracy. Also modify to
support sub-day windows.

## Cheour clientslist
- [x] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Enhanced temporal data handling with support for multiple time
windows, enabling dynamic processing of labeled events.
- Introduced refined grouping and filtering capabilities for more
accurate time-based event computations.
- Added a method to shift time ranges in milliseconds for improved
temporal adjustments.
- New documentation sections on debugging unit tests and details
regarding `tailHops` behavior.
- **Refactor**
- Updated join processing logic to seamlessly handle various data inputs
and event types, resulting in improved flexibility and performance in
temporal computations.
- **Tests**
- Enhanced test suite with new temporal dimensions and refined naming
conventions in SQL queries.
	- Added a new test case for validating temporal label joins.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: ezvz <[email protected]>
Co-authored-by: Nikhil Simha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants