Integrate "fields" API into QL#68467
Integrate "fields" API into QL#68467astefan merged 3 commits intoelastic:ql_fields_api_implementationfrom
Conversation
|
Pinging @elastic/es-ql (Team:QL) |
|
@elasticmachine run elasticsearch-ci/default-distro |
costin
left a comment
There was a problem hiding this comment.
LGTM. Glad to see this PR land. The -700 lines is just icing on the cake.
Curious about the impact it will have in benchmarks once it lands.
I would mark it as breaking change even if only to indicate the response of the translate endpoint changes.
| } | ||
|
|
||
| public FieldHitExtractor(String name, String fullFieldName, DataType dataType, ZoneId zoneId, boolean useDocValue, String hitName, | ||
| public FieldHitExtractor(String name, DataType dataType, ZoneId zoneId, String hitName, |
| scriptFields.forEach(sourceBuilder::scriptField); | ||
| } | ||
|
|
||
| public boolean noSource() { |
There was a problem hiding this comment.
Likely this needs to go in a future PR since the notion of source is removed.
There was a problem hiding this comment.
Yes, I also believe there is room for improvement when it comes to removing code. There may be other places where things are not needed anymore.
There was a problem hiding this comment.
(seems to only be used once?)
|
|
||
| city:s | location:s | ||
| Amsterdam |POINT (4.850311987102032 52.347556999884546) | ||
| Amsterdam |POINT (4.850312 52.347557) |
There was a problem hiding this comment.
Any idea where does the rounding come from?
There was a problem hiding this comment.
@costin "fields" API is relying (atm) on extraction from _source, which is basically the input the user used when indexing. Before "fields" API we were relying mainly on extraction from _source but there were field types (geo_point and shape being two of them) that we were taking from docvalues (meaning the actual values that were indexed). And in docvalues, things are just a bit different when it comes to floating point numbers and geo stuff.
That specific document with Amsterdam city has been indexed as "city": "Amsterdam", "location": "52.347557,4.850312" (location is a geo_point) and the returned value is pretty much what has been actually sent as _source to ES.
| @@ -187,8 +187,5 @@ private static void optimize(QueryContainer query, SearchSourceBuilder builder) | |||
|
|
|||
| private static void disableSource(SearchSourceBuilder builder) { | |||
| builder.fetchSource(FetchSourceContext.DO_NOT_FETCH_SOURCE); | |||
There was a problem hiding this comment.
Is there any downside to disabling stored fields?
There was a problem hiding this comment.
I hope not @costin . Tests passed. I don't see why we would want to return stored fields... can't think of a reason why we had it there in the first place. In theory, we shouldn't need any kind of way of returning values from Elasticsearch except "fields".
There was a problem hiding this comment.
The why not disable explicitly stored fields:
builder.storedFields(NO_STORED_FIELD);
There was a problem hiding this comment.
@costin I forgot about this part. "fields" API does not allow disabling stored fields.
matriv
left a comment
There was a problem hiding this comment.
LGTM, left a comment for the unit test.
| expected.put("columns", Arrays.asList(columnInfo("plain", "keyword_field", "keyword", JDBCType.VARCHAR, Integer.MAX_VALUE))); | ||
| expected.put("rows", singletonList(singletonList(ignoreAbove ? null : keyword))); | ||
| assertResponse(expected, runSql("SELECT keyword_field FROM test")); | ||
| if (explicitSourceSetting == false || enableSource) { |
There was a problem hiding this comment.
Maybe you can extract a method, where you pass the expected Map and the query.
There was a problem hiding this comment.
Makes sense. Will look into simplifying the code here.
| } | ||
|
|
||
| protected Object unwrapMultiValue(Object values) { | ||
| protected Object unwrapFieldsMultiValue(Object values) { |
There was a problem hiding this comment.
Good question. If I remember well, it's a leftover from a previous attempt where both strategies were used at the same time in code: extraction from source and docvalues (unwrapSourceMultiValue) and extraction from "fields" API output (unwrapFieldsMultiValue). No need for the rename, indeed. I'll revert.
| scriptFields.forEach(sourceBuilder::scriptField); | ||
| } | ||
|
|
||
| public boolean noSource() { |
There was a problem hiding this comment.
(seems to only be used once?)
| * } | ||
| */ | ||
| public void testKeywordField() throws IOException { | ||
| String query = "SELECT keyword_field FROM test"; |
There was a problem hiding this comment.
(nice, prepares the reader for what to expect.)
* Integrate "fields" API into QL (elastic#68467) * QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (elastic#68602) * Adapt nested fields extraction from "fields" API output to the new un-flattened structure (elastic#68745) (cherry picked from commit ee5cc54)
This is a first step towards switching from using
docvalue_fieldsand_sourceextraction to using the fields API. It helps simplify the QL code, but does also come with a small drawback which will become visible in FieldHitExtractorTests where some tests are not needed anymore. Any query (especially SQL) on fields that have_sourcedisabled, but are indexed (accessible throughdocvalue_fields), will not be possible anymore. The tests in the above mentioned class are changed to adapt to this new restriction by checking that an exception is being thrown in those cases.Future PR(s) against this branch will probably simplify the code even further.
Also, there will be another PR that will add BWC tests similar to this one.
Relates to #67727.