Support nested documents in time-series indices with synthetic id#143151
Support nested documents in time-series indices with synthetic id#143151tlrx merged 17 commits intoelastic:mainfrom
Conversation
|
Hi @tlrx, I've created a changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
fcofdez
left a comment
There was a problem hiding this comment.
Looks good, I left a couple of comments.
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Show resolved
Hide resolved
| cluster_features: ["mapper.tsdb_nested_field_support"] | ||
| reason: "tsdb index with nested field support enabled" | ||
|
|
||
| - skip: |
There was a problem hiding this comment.
Would this test suite be executed anytime then?
There was a problem hiding this comment.
I think it would be executed on snapshot builds today, not on release tests. Once we enable the feature by default then we can remove the test suite and only rely on the two other.
There was a problem hiding this comment.
The yml tests run in mixed cluster environments and the cluster_feature is only "present" if all members of the cluster has the feature, so this test will run in all cluster combinations pre-9.4.0. So basically
Mixed cluster with old version <9.4.0, run this test.
Mixed cluster with old version >=9.4.0 run the other two.
I think of cluster_feature as a shortcut to check max shared IndexVersion across cluster.
I have never seen nested usage in tsdb. But it is possible to configure, so I think this change make sense. Especially later when we plan to reuse synthetic id for other index modes like logsdb. |
martijnvg
left a comment
There was a problem hiding this comment.
Looks good! I left a question.
| final var parentTimeSeriesId = parentTsIdField.binaryValue(); | ||
| assert parentTimeSeriesId.equals(timeSeriesId); | ||
| assert parentTimeSeriesId.equals(TsidExtractingIdFieldMapper.extractTimeSeriesIdFromSyntheticId(uidEncoded)); | ||
| if (this.useDocValuesSkipper) { |
There was a problem hiding this comment.
The default is that useDocValuesSkipper is enabled for tsdb. Users would need to manually set index.mapping.use_doc_values_skipper to false. Did tests fail if these checks were not added? Just out of curiosity.
There was a problem hiding this comment.
Users would need to manually set index.mapping.use_doc_values_skipper to false
I thought that was some undocumented setting that users would not modify. But if they can, then we should also run our tests with doc values skippers rarely disabled, do you agree?
Asking because I haven't randomized this setting so far, assuming it would be removed and always enabled in a short future.
Did tests fail if these checks were not added?
I would expect the tests to pass but I haven't tried for the previous reason.
Quickly trying on TSDBSyntheticIdsIT returns a lot of failures.
There was a problem hiding this comment.
This setting is reused in logsdb, but there it is disabled by default and we do document this setting iirc.
I don't think we have to support the case were index.mapping.use_doc_values_skipper is disabled. Maybe in another change we can add validation that prevents the use of synthetic id if this this setting is disabled?
There was a problem hiding this comment.
Maybe in another change we can add validation that prevents the use of synthetic id if this this setting is disabled?
Thanks for the details. I think we can support use_doc_values_skipper to be disabled with a couple of extra changes that I can do if a follow up. That would be easier to maintain compare to preventing the use of synthetic id when doc values skippers are disabled.
There was a problem hiding this comment.
I opened #143389 to support disabling the doc values skippers with TSDB synthetic ids.
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Show resolved
Hide resolved
06439d7 to
e8f6c49
Compare
This change disables `index.mapping.use_doc_values_skipper` in about 10% of tests executions of the main TSDB with synthetic id tests, as well as in some unit tests. Relates ES-14224 Relates elastic#143151 (comment)
| assert nestedDoc.getField(TimeSeriesRoutingHashFieldMapper.NAME) == null; | ||
|
|
||
| if (rootParentDoc != null) { | ||
| assert nestedDocFields.isEmpty() == false; |
There was a problem hiding this comment.
nit: maybe we can assert that nestedDocFields size is 4?
burqen
left a comment
There was a problem hiding this comment.
Changes look good. I just had some clarifying comments around cluster feature and suggestions for javadoc improvements. Great job!
.../src/yamlRestTest/resources/rest-api-spec/test/tsdb/161_nested_fields_synthetic_id_false.yml
Outdated
Show resolved
Hide resolved
...c/src/yamlRestTest/resources/rest-api-spec/test/tsdb/162_nested_fields_synthetic_id_true.yml
Outdated
Show resolved
Hide resolved
| cluster_features: ["mapper.tsdb_nested_field_support"] | ||
| reason: "tsdb index with nested field support enabled" | ||
|
|
||
| - skip: |
There was a problem hiding this comment.
The yml tests run in mixed cluster environments and the cluster_feature is only "present" if all members of the cluster has the feature, so this test will run in all cluster combinations pre-9.4.0. So basically
Mixed cluster with old version <9.4.0, run this test.
Mixed cluster with old version >=9.4.0 run the other two.
I think of cluster_feature as a shortcut to check max shared IndexVersion across cluster.
server/src/main/java/org/elasticsearch/index/mapper/IdFieldMapper.java
Outdated
Show resolved
Hide resolved
| return CONTENT_TYPE; | ||
| } | ||
|
|
||
| private void addSyntheticIdFieldsToNestedDocs(DocumentParserContext context, BytesRef timeSeriesId, BytesRef uidEncoded) { |
There was a problem hiding this comment.
Great assertions in this method! 👍
|
Thanks everyone! |
…astic#143151) This change adds the doc values fields that are required for synthetic id to work in nested documents. It also fixes an assertion that tripped during tests execution. The changes in TSDBSyntheticIdsIT allows to run the existing tests with nested docs in 10% executions (it uses rarely()). Relates ES-14224
…astic#143151) This change adds the doc values fields that are required for synthetic id to work in nested documents. It also fixes an assertion that tripped during tests execution. The changes in TSDBSyntheticIdsIT allows to run the existing tests with nested docs in 10% executions (it uses rarely()). Relates ES-14224
…astic#143151) This change adds the doc values fields that are required for synthetic id to work in nested documents. It also fixes an assertion that tripped during tests execution. The changes in TSDBSyntheticIdsIT allows to run the existing tests with nested docs in 10% executions (it uses rarely()). Relates ES-14224
This change adds the doc values fields that are required for synthetic id to work in nested documents. It also fixes an assertion that tripped during tests execution.
The changes in
TSDBSyntheticIdsITallows to run the existing tests with nested docs in 10% executions (it usesrarely(), should we increase this? I have no idea if nested docs are widely used in time-series).Also, I'm chasing a flaky test in
testRecoveredOperationsbut I think we can move forward with the reviews.Relates ES-14224