-
Notifications
You must be signed in to change notification settings - Fork 0
#639: allow metadata fields and score opensearch function #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## integ-metadata-fields #228 +/- ##
========================================================
Coverage ? 98.47%
Complexity ? 3875
========================================================
Files ? 345
Lines ? 9648
Branches ? 626
========================================================
Hits ? 9501
Misses ? 142
Partials ? 5
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
...rc/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScanQueryBuilder.java
Outdated
Show resolved
Hide resolved
|
not sure if opensearch-project#788 can be resolved as well? |
This problem will occur for any field that shares a name with a function. The workaround for now will be to use backticks. |
I think the issue you're referring to is already fixed here: opensearch-project#1191. I tried to fix |
| * @param context analysis context for the query | ||
| * @return resolved relevance function | ||
| */ | ||
| public Expression visitScoreFunction(ScoreFunction node, AnalysisContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add this to opensearch module as storage-specific function. Personally I think we should prioritize opensearch-project#811. It will become more and more difficult as we keep adding more OpenSearch logic to core engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've got it on our radar, and we can start to scope it out. As is, there's quite a bit of work to do to pull out the opensearch specific classes, but I think it's do-able.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had storage-specific function support in opensearch-project#1354. Can we start this now instead of adding more and more special logic to core? Agreed there is quite lots of work to move all to opensearch but maybe easy to do this for single PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look. There's already a lot in this PR.
Would you mind if we move the score function out as a proof of concept for a set of OpenSearch storage engine functions?
| public static final String METADATA_FIELD_SCORE = "_score"; | ||
| public static final String METADATA_FIELD_MAXSCORE = "_maxscore"; | ||
| public static final String METADATA_FIELD_SORT = "_sort"; | ||
| public static final java.util.Map<String, ExprCoreType> METADATAFIELD_TYPE_MAP = new HashMap<>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not want to hardcode this in core. Not sure if adding these fields in OpenSearchIndex.getFieldMapping() can work for you, however we need to consider meta column in other storage engine. For example, S3 may have $path, $partition in each row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use OpenSearchIndex as it would create a dependency on opensearch in core. Not ideal. We kind of need to wait until we have a user-defined functions that could override core visitors, like QualifiedName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't mean you need to depend on opensearch module. OpenSearchIndex is subclass of our Table interface. As I understand, Analyzer will fetch field list from Table during query analysis. I saw you've added isMetaField flag, so I don't see why we want to hardcode metadata field list in core module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Interesting. I'll see if I can hook that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see changes in:
- core/src/main/java/org/opensearch/sql/analysis/TypeEnvironment.java (for interface changes in the environment)
- opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java (specific changes for OpenSearch indexes), and
- core/src/main/java/org/opensearch/sql/analysis/ExpressionAnalyzer.java the visitQualifiedName function
b4912da to
c8f0520
Compare
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
… function Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
| Iterator<ExprValue> hits = response1.iterator(); | ||
| assertTrue(hits.hasNext()); | ||
| assertEquals(exprTupleValue, hits.next()); | ||
| assertEquals(exprTupleValue.tupleValue().get("id"), hits.next().tupleValue().get("id")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was matching the whole object - and failing because the Boolean object wasn't .equal. This could be removed (I think) if I change the Boolean to a boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I got that wrong. We are now returning the metafields in the response from OpenSearch.. so they're appearing in the tuple. This way the spirit of the test remains the same.
sql/src/main/java/org/opensearch/sql/sql/parser/AstExpressionBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testMetafieldIdentifierTest() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when getting a meta field from a prometheus data source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser will fail. There's no workaround for prometheus data sources for _id, _score, _index, _maxscore and _sort. But it will work for other fields that begin with an underscore (e.g. _routing or __fieldname) that is not strictly defined as an opensearch meta-field.
In other words: this is an improvement on the existing functionality... which would just fail for ALL fields starting with an underscore.
docs/user/dql/functions.rst
Outdated
| ``score_query(search_expression, boost)`` | ||
| ``scorequery(search_expression, boost)`` | ||
|
|
||
| The score function returns the _score of any documents matching the enclosed relevance-search expression. The SCORE function expects two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a users it is unexpected that SELECT _score, phrase FROM phrases returns _score value yet SELECT _score, phrase FROM phrases WHERE match('phrase', 'my') returns _score of null.
I'd expect _score to be returned if requested since it can be calculated for any OpenSearch query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an OpenSearch value. I would think users need to understand the OpenSearch side of things. Once could ask why OpenSearch doesn't also just calculate the score for everything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin could also set track_scores = true if the user asks for _score field and is using relevance search.
At the very least, it should be mentioned here that _score will be null unless relevance query is wrapped in score(...).
...rc/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScanQueryBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/function/OpenSearchFunctions.java
Show resolved
Hide resolved
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/analysis/ExpressionAnalyzerTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
| public TypeEnvironment(TypeEnvironment parent, SymbolTable symbolTable) { | ||
| this.parent = parent; | ||
| this.symbolTable = symbolTable; | ||
| this.reservedSymbolTable = new SymbolTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a separate symbol table needed? Why can't we add metafields to symbolTable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To separate out 'reserved fields' (that are in all tables) from fields that are specific to the index. The code needs to know the difference between these two lists later on.
…ore OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <[email protected]>
…ore OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <[email protected]> (cherry picked from commit e805151)
…ore OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <[email protected]> (cherry picked from commit e805151)
…e opensearch function (#228) (opensearch-project#1508) * opensearch-project#639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (opensearch-project#1456) Signed-off-by: Andrew Carbonetto <[email protected]> Co-authored-by: Andrew Carbonetto <[email protected]>
…e opensearch function (#228) (opensearch-project#1509) * opensearch-project#639: Support OpenSearch metadata fields and the score OpenSearch function (#228) (opensearch-project#1456) Signed-off-by: Andrew Carbonetto <[email protected]> Co-authored-by: Andrew Carbonetto <[email protected]>
…elds and the score OpenSearch function (#228) (opensearch-project#1456) Allow metadata fields and score OpenSearch function. Signed-off-by: Andrew Carbonetto <[email protected]>
Description
OpenSearch reserved fields (_id, _index, _sort, _score, _max_score) are not allowed to be used in SQL clauses (SELECT, WHERE, ORDER BY) because the field format starting with underscore _ is not allowed.
This ticket allows for the
score(),score_query()andscorequery()function to be used to wrap around relevance-search queries to force the_scoreand_max_scoremetadata fields to be returned with values. For some queries,_scorereturnsnullunless thescore()function is included.scorefunction also allows for an optionalboostargument to be included that boosts the score of the child relevance function.Example - metadata fields returned:
Example - Metadata fields not requested are not displayed:
Example - relevance search without and with score function
Example - boost score on the _sql/_explain call:
Issues Resolved
__is missing in the returned result when query by SQL with extrafilteropensearch-project/sql#783Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.