-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'key' field to 'function_score' query function definition in explanation response #1711
Comments
@anasalkouz hi there! 👋 Only asking whether you consider this enhancement worth implementing in any of the future versions of OpenSearch? Thanks for any update / prognosis on that! 🙂 |
How do you use this today (when do you match on query body)? How would a uuid make that easier? The details provided seem to be an ordered list. So if "key" was 1, 2, 3, would it be the same or different from a uuid? Note that a summary could easily have a key of 1, 2, 3, ..... |
@dblock the problem is that it's not always present on the same level of the |
@lrynek thanks Feels like I am not against adding a key, but that is an API change and still feels redundant to me. |
@dblock Yes, I know but I don't mean the order or the top level number of functions and their respective results but the fact that the computed value sometimes can be present on the deeper level of the tree within one of those two example outputs. Here is a sample PHP code for the lookup I mean (as for the sake of simplicity and security I haven't shared real query from our system and thus it doesn't serves well a purpose maybe): public function findFactors(FactorInterface ...$factors): Factors
{
$matchedFactors = [];
foreach ($factors as $factor)
{
$factorName = $factor->name();
$factorMatchingDescriptionPart = $factor->matchingDescriptionPart();
$explainedFactorsData = $this->data['details'][0]['details'][0]['details'] ?? [];
foreach ($explainedFactorsData as $data)
{
$descriptionLevel1 = $data['description'] ?? '';
$descriptionLevel2 = $data['details'][0]['description'] ?? '';
$description = $descriptionLevel1 . $descriptionLevel2;
if (false !== \strpos($description, $factorMatchingDescriptionPart))
{
$value = (float)($data['value'] ?? 0);
$initialValue = isset($data['details'][0]['value']) ? (float)($data['details'][0]['value']) : null;
$weight = isset($data['details'][1]['value']) ? (float)($data['details'][1]['value']) : null;
$matchedFactors[] = new Factor($factorName, $value, $initialValue, $weight);
}
}
}
return new Factors(...$matchedFactors);
} As it is mostly related to the modular approach of OpenSearch query building in our PHP app (so I cannot rely on the order of added functions) from the higher level code architecture. With unique keys/ids/whatever I would be able to not care about the query building order of the Regarding your statement:
I think it can be implemented as an optional parameter of the function definition, similar to existing I imagine it looks complicated, so I'm open to have a short call on that topic if you see it's beneficial 🙂 |
@dblock wdyt? 🙂 |
A few thoughts.
For sure, but just because we can still doesn't mean we should :) |
I don't love the idea of an optional parameter that changes the structure of the response as the proposed solution would. It means any code that parses the response would have to be able to handle two different structures based on a parameter in the request.
^ This proposal seems a lot simpler. @lrynek would this solve the problem? There are potential variants of this as well, such as adding a "tag" (or similar) field to the request that then is included as a "tag" field in each corresponding response. The response structure remains the same so it would be backward compatible (assuming an additional field in the response is ignored by anything that doesn't care about it). |
@andrross @dblock Thank you for your detailed responses and help in trying to understand each other 👍 However, looks that we cannot reach the understanding on both sides - would be awesome (and maybe even faster to reach any outcome / consensus) to have a short 15 minutes call around this topic, wdyt? ☎️ (Zoom / Google Meets / whatever) I fully get and understood that we have somewhere deep in the details of each function explanation and those details count matches with the number of functions passed in the request ✔️ However it's not what is a problem. The problem at hand is to match these explanations precisely not depending only on the order of functions as you would have in the higher level application code (as in my case) no control over the order of requested functions mapped to the particular service responsible for the singular function definition building. In that matter is far more convenient and even only this way possible to have that logic segregated and matching afterwards i.e. via the name of the factor (the naming used on higher level in my app) involved in the query.
It won't work as it is what we already do with the only change to pass the entire function definition JSON string to match with that value, so even worse than the actual workaround + it breaks the requirement you guys trying to pursue:
But actual optional Have you checked both {
"_explanation": {
"key_value_pairs": {
"af59aa50-19f4-45c8-90d2-c1a0b91416e1": 65.0,
"f4ff6d9e-96d6-401c-8da7-ff99d8228457": 35.0
},
"value": 100.0,
"description": "sum of:",
"details": []
} |
Generally, I'd rather not, because it's important we provide visibility to the community in all discussions and I don't know how else to do it than in writing. If you still want to just brainstorm, I'd be happy to jump on a call, dblock[at]amazon[dot]com, I'm in EST, and I can loop in @andrross. Back to the problem I actually think we totally understand each-other. Additional input does seem to solve your problem in one easy way, but it is an API change and we must think hard about such things and avoid introducing something that will be here for years. My goal is to provide input and feedback to you so you (or someone else) can implement the best possible solution.
This still sounds like a convenience, and something that can be solved in your application. I am reading that you are saying "I have a client that doesn't know in which order it passed the query parts in, so it cannot rely on the order of the explanation returned, please modify the service so that I don't have to make any changes in the client to track the order of queries". Am I reading this correctly? If so, that's the textbook definition of "designing an API for a specific client" and is an anti-pattern. Reproducing the query does make changing the client a bit easier because you would not need to compare strings, but I understand that doesn't solve the order problem.
This is not a fair comparison because weight is meaningful input to ranking, as opposed to an ID which will be unused by the query engine other than to be reproduced in the results.
I have read that carefully and I disagree with the approach for the many reasons I gave above.
|
It wasn't intended to slip around this GitHub issue thread and prevent full transparency - only a brainstorm and of course I can add a summary back here after that brainstorming call 👍 I'm in CEST timezone, so very close to yours, please provide any timing that works best for you guys to have a call on the topic.
Yes, in terms of the order maintenance itself it's exactly this but please take into account also varying depth of the final value calculated as well. So the convenience is not only for the order but depth included.
Although I fully agree with the one-client API design anti-pattern, I disagree that our case is only one-client one. It's a matter of every client that uses any high level compound solution that would benefit from such a feature. If we consider only direct JSON calls and responses with explanation for direct usage it wouldn't make sense to me either to implement such a feature. However, considering the ease of use on that higher level I still think it's fair enough to consider such move.
OK, agree 👍
No, it's the first time I ran to such situation, I cannot provide at hand any comparison of this sort // maybe any other person from the followers would have some? 🤔
I already have them implemented in the code 😉 But I considered both script parts matching and the order depth manual lookup something that can be leveraged by the API itself as, exactly, a convenience of use. That was the main and only reason to file this issue and as such it won't make any sense to have an outcome decision to go back to the application where I've already been 😜 If you mind some call would be awesome to have a brainstorm on that at some moment, if not I'll accept the refusal but still we'll be facing IMHO unnecessary hassle and complexity in terms of retrieving explained values. The engine itself has the exact knowledge of the level and depth of the final calculated value. For me it can even provide it repeated on the higher level without any dynamic input key from my query and it would be far more convenient than the actual version. I am open to have any conversation on that, thx! 🙂 |
@dblock @lrynek @andrross I have run into couple of scenarios with the similar problem at the end (try to match input and output), and it looks to me that Opensearch has a good foundation to that - named queries [1]. At this moment it does not work exactly the same way but if we add / extend the existing implementation to propagate back the query / filter / function names into the explanation block (we need to introduce named functions), it should by and large address the problem (not sure how difficult it is though). It is very close to option [1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/query-dsl-bool-query.html#named-queries |
@reta for me it would perfectly work - this is the missing example of such request / response custom tag propagation @dblock you were looking for. For me it's perfectly usable with the same |
Expanding the named queries approach seems like a great solution to me. It's an existing solution to a very similar problem so expanding it to functions is a natural extension of the feature. |
Looking into it ... :-) |
@lrynek @andrross the things turned out to be a bit more complicated than expected, the
The
The complete explanation looks like this:
|
Thanks @reta for resolving our conversation productively! Looking forward to a working |
…anation response (opensearch-project#1711) Signed-off-by: Andriy Redko <[email protected]>
…anation response (#1711) (#2346) Signed-off-by: Andriy Redko <[email protected]>
Documentation: opensearch-project/documentation-website#432 |
Is your feature request related to a problem? Please describe.
When trying to extract current function value from
_explanation
part of OpenSearch JSON response (i.e. for debugging or logging purposes), I can do it only with text matching of a script body (and only with those functions that operates on script language, the filter ones are out of reach).Describe the solution you'd like
I would add a new field
key
(or whatever name suits best) to thefunction_score
queryfunctions
array, as follows:ACTUAL
(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html)
Request
Response
EXPECTED
Request
Response
The retrieval of specific computed values will be more precise after such or similar implementation.
Describe alternatives you've considered
🅰️ Another possibility would be to expose these key value pairs on a particular function explanation details if the root one makes it more difficult to implement:
Request
Response
Request
Response
Additional context
The feature has been originally requested on Elasticsearch GitHub repository.
The text was updated successfully, but these errors were encountered: