Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Nov 30, 2025

Description

source=web_logs
| stats count() as request_count by client_ip
| join type=inner client_ip ip_geodata

The query above is optimized to HashJoin since the left output comes from an aggregation. Unlike MergeJoin, the HashJoin uses hashCode() to get the equivalence key from a HashMap. The value object ExprIpValue of ExprIPType should override the hashCode() and equals() methods. (If the join is MergeJoin, there is no issue since ExprIpValue has already implemented Comparable.

Related Issues

Resolves #4726

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Improvements

    • Enhanced IP value type handling to ensure proper equality and comparison operations
  • Tests

    • Added integration test validating IP type operations in hash join queries across multiple indices

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 30, 2025

Walkthrough

This pull request addresses a regression where joins on IP-type fields returned empty results. It adds a Lombok annotation to ExprIpValue to generate equals() and hashCode() methods, clarifies requirements in ExprJavaType's Javadoc, and introduces an integration test validating hash joins on IP types function correctly.

Changes

Cohort / File(s) Summary
Javadoc documentation
core/src/main/java/org/opensearch/sql/calcite/type/ExprJavaType.java
Added descriptive note to Javadoc explaining that javaClazz should override equals() and hashCode() methods with reference example.
IP value type implementation
core/src/main/java/org/opensearch/sql/data/model/ExprIpValue.java
Added Lombok import and @EqualsAndHashCode(callSuper = false) annotation to enable auto-generated equals() and hashCode() methods without including superclass state.
Integration test
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4726.yml
New test case "hash join on IP type should work" that sets up two indices with IP fields, executes a hash-join query, and validates result correctness.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify that @EqualsAndHashCode(callSuper = false) is the correct configuration for ExprIpValue's inheritance hierarchy
  • Confirm the test data and assertions properly validate hash join functionality on IP types
  • Ensure the integration test setup (bulk operations, index mapping) accurately reproduces the original issue scenario

Poem

🐰 A hop, a join, through IP fields we go,
Equals and hashes now help them flow,
Where once results were lost to the void,
Hash joins on IPs can't now be foiled!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding hashCode() and equals() methods to ExprIpValue, which is the core fix implemented across the modified files.
Linked Issues check ✅ Passed The PR successfully addresses issue #4726 by implementing hashCode() and equals() in ExprIpValue to fix HashJoin operations on IP fields, enabling the PPL aggregate-join-aggregate query to work correctly.
Out of Scope Changes check ✅ Passed All changes are within scope: Javadoc clarification for ExprJavaType, Lombok annotation for ExprIpValue equals/hashCode generation, and integration tests validating the fix for IP field hash joins.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@LantaoJin LantaoJin changed the title Add hashCode() and equals() to the implementation classes of ExprJavaType Add hashCode() and equals() to the value class of ExprJavaType Nov 30, 2025
@LantaoJin LantaoJin marked this pull request as ready for review November 30, 2025 12:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/calcite/type/ExprJavaType.java (1)

16-19: Javadoc clearly encodes the equals/hashCode contract

The added note that javaClazz must override equals() and hashCode() makes the hashing/join requirements explicit and ties nicely to the ExprIpValue example. One small follow-up: please verify that {@link ExprIPType} resolves correctly in Javadoc (and fully-qualify it if needed) so the example link doesn’t break.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4726.yml (2)

32-57: Minor mapping/name mismatch in test data

The index mapping for test1 defines a timestamp field, but the bulk docs use datetime. It doesn’t affect this test (the query never touches the time field), but for clarity you may want to either rename the mapping property to datetime or adjust the docs to use timestamp so index structure and data are aligned.


67-83: Make result ordering deterministic to avoid potential test flakiness

The match on datarows asserts an exact ordered array of rows. Unless the PPL pipeline guarantees ordering for this query, this can be brittle. Consider adding an explicit sort client_ip (or similar) at the end of the PPL query so the result order is deterministic, or relaxing the assertion if the test framework supports order-insensitive checks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f7eae0 and e695cbe.

📒 Files selected for processing (3)
  • core/src/main/java/org/opensearch/sql/calcite/type/ExprJavaType.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/data/model/ExprIpValue.java (1 hunks)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4726.yml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/data/model/ExprIpValue.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java (1)
  • EqualsAndHashCode (21-323)
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java (1)
  • EqualsAndHashCode (1072-1089)
🔇 Additional comments (1)
core/src/main/java/org/opensearch/sql/data/model/ExprIpValue.java (1)

8-17: Value-based equals/hashCode for ExprIpValue is appropriate for HashJoin

Using @EqualsAndHashCode(callSuper = false) on this value class makes equals()/hashCode() depend solely on the IPAddress value field, which is exactly what HashMap/HashJoin need and keeps semantics aligned with the existing equal(ExprValue)/compare behavior. If additional fields are ever added to ExprIpValue, it’d be worth double-checking that they’re intended to participate in equality, but for the current shape this looks correct.

@LantaoJin LantaoJin merged commit 96370bf into opensearch-project:main Dec 2, 2025
63 of 64 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 2, 2025
…ype (#4885)

(cherry picked from commit 96370bf)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
LantaoJin pushed a commit that referenced this pull request Dec 2, 2025
…ype (#4885) (#4891)

(cherry picked from commit 96370bf)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@LantaoJin LantaoJin deleted the pr/issues/4726 branch December 2, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Regression (3.1 -> 3.3): PPL join with aggregates on IPs is now returning empty results.

4 participants