Add squared Euclidean distance (l2_squared) in java implementation#25409
Add squared Euclidean distance (l2_squared) in java implementation#25409feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
|
|
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
8315029 to
0466079
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
0466079 to
43039c7
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
43039c7 to
be31392
Compare
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
be31392 to
a5d9289
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
a5d9289 to
ed26cd8
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
|
|
presto-main-base/src/main/java/com/facebook/presto/operator/scalar/MathFunctions.java
Show resolved
Hide resolved
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
ed26cd8 to
e0d31d4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
Feilong, thank you for the review and I have added support for array(double) and follow convention used for other math functions. I will look into the internal checks. |
Summary: l2Squared is commonly used for computing similarity for image and video embeddings. Differential Revision: D77157252
e0d31d4 to
cae80e0
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77157252 |
|
By the way, can you also add the cpp version too to maintain availability of prestissimo engine |
@feilong-liu I have a separate PR for cpp support, and thank you! |
Does documentation exist for this function? |
|
This function is not handling null values? |
Hi @kaikalur |
Presto UDF are general purpose so someone could call it on things with nulls. So we should always keep it general. Also the check is simple in java. Look at the mayHaveNulls() method in Block - error out if that's true for either of the arrays |
Sounds good and thanks for the code pointer. |
|
Also if we expect a lot of 0's (I guess for dense vectors that won't happen?) maybe good to short circuit on either element being 0. |
Good suggestions, and thank you! Note that we also have c++ implementations of these functions, and the cpp version relies on the FAISS library. |
Description
This PR introduces the squared Euclidean distance (l2_squared) function between identical sized vectors represented as arrays(real). The l2_squared distance is commonly used to measure similarities between embeddings of multimedia data.
Differential Revision: D77157252
Motivation and Context
We are introducing vector search capabilities into Presto, and this PR takes the first step by adding common distance functions. This functionality will enable users to perform efficient similarity searches on multimedia data.
Impact
The addition of the l2_squared distance function will enhance Presto's capabilities in handling multimedia data and enable users to perform more complex analytics tasks.
Test Plan
Contributor checklist
Release Notes
== RELEASE NOTES ==
General Changes
l2_squaredfunction to calculate the squared Euclidean distance between two identically sized vectors represented as arrays. This function supports botharray(real)andarray(double)input types. For more information, refer to the Euclidean distance definition.Example Usage:
-- Using real arrays
SELECT l2_squared(ARRAY[1.0, 2.0, 3.0], ARRAY[4.0, 5.0, 6.0]);
-- Returns: 27.0
-- Using double arrays
SELECT l2_squared(ARRAY[1.0E0, 2.0E0, 3.0E0], ARRAY[4.0E0, 5.0E0, 6.0E0]);
-- Returns: 27.0