Skip to content

Fix graph merge stats size calculation#1844

Merged
ryanbogan merged 9 commits intoopensearch-project:mainfrom
ryanbogan:graph_size_bug_v2
Aug 8, 2024
Merged

Fix graph merge stats size calculation#1844
ryanbogan merged 9 commits intoopensearch-project:mainfrom
ryanbogan:graph_size_bug_v2

Conversation

@ryanbogan
Copy link
Copy Markdown
Member

@ryanbogan ryanbogan commented Jul 17, 2024

Description

Fixes the calculations for size of merges in the graph stats section of KNNStats API. This PR changes the logic to properly round values to the correct number of bytes.

Continuation of #1818, which had too many merge conflicts to fix cleanly.

Issues Resolved

#1789

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan
Copy link
Copy Markdown
Member Author

BWC failures are unrelated to this PR

luyuncheng
luyuncheng previously approved these changes Jul 18, 2024
Copy link
Copy Markdown
Collaborator

@luyuncheng luyuncheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Comment thread src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java Outdated
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan ryanbogan requested a review from heemin32 July 18, 2024 19:14
@ryanbogan ryanbogan added v2.17.0 and removed v2.16.0 labels Jul 23, 2024
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
heemin32
heemin32 previously approved these changes Jul 26, 2024
vectorsSize += vectorsSize % JAVA_ROUNDING_NUMBER;
}
return vectorsSize;
if (serializationMode == SerializationMode.COLLECTIONS_OF_BYTES) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of this serializationMode attribute completely?

Copy link
Copy Markdown
Member Author

@ryanbogan ryanbogan Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to calculate array size from a Pair class typically. The issue is that the KNNCodecUtil.Pair class only has doc id's, a vector address, dimension, and serialization mode as instance variables. Therefore, without reading memory from the vector address I don't think it's possible to differentiate whether the data is floats or bytes without the serialization mode.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use vector datatype to know if the vector is byte or float?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that would work, binary type would be the same calculation as byte right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes same thing

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Comment thread CHANGELOG.md Outdated
Comment thread src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java
Signed-off-by: Ryan Bogan <rbogan@amazon.com>
@ryanbogan ryanbogan merged commit e3158f9 into opensearch-project:main Aug 8, 2024
@ryanbogan ryanbogan deleted the graph_size_bug_v2 branch August 8, 2024 00:18
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1844-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e3158f990d058b02568da617688fd4857d0d521b
# Push it to GitHub
git push --set-upstream origin backport/backport-1844-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1844-to-2.x.

ryanbogan added a commit that referenced this pull request Aug 8, 2024
* Fix graph merge stats size calculation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog entry

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add javadocs

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Make calculations easier to read

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove java overhead from calculations

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Change from serialization mode to vector data type for calculations

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Minor change to if statements

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit e3158f9)
ryanbogan added a commit that referenced this pull request Aug 9, 2024
* Fix graph merge stats size calculation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog entry

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add javadocs

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Make calculations easier to read

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove java overhead from calculations

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Change from serialization mode to vector data type for calculations

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Minor change to if statements

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
(cherry picked from commit e3158f9)
jingqimao77-spec pushed a commit to jingqimao77-spec/k-NN that referenced this pull request Mar 15, 2026
* Fix graph merge stats size calculation

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add changelog entry

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Add javadocs

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Make calculations easier to read

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Remove java overhead from calculations

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Change from serialization mode to vector data type for calculations

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

* Minor change to if statements

Signed-off-by: Ryan Bogan <rbogan@amazon.com>

---------

Signed-off-by: Ryan Bogan <rbogan@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants