Skip to content

Support nested documents in time-series indices with synthetic id#143151

Merged
tlrx merged 17 commits intoelastic:mainfrom
tlrx:2026/02/26-es-14224
Mar 5, 2026
Merged

Support nested documents in time-series indices with synthetic id#143151
tlrx merged 17 commits intoelastic:mainfrom
tlrx:2026/02/26-es-14224

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Feb 26, 2026

This change adds the doc values fields that are required for synthetic id to work in nested documents. It also fixes an assertion that tripped during tests execution.

The changes in TSDBSyntheticIdsIT allows to run the existing tests with nested docs in 10% executions (it uses rarely(), should we increase this? I have no idea if nested docs are widely used in time-series).

Also, I'm chasing a flaky test in testRecoveredOperations but I think we can move forward with the reviews.

Relates ES-14224

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@tlrx tlrx marked this pull request as ready for review February 27, 2026 08:36
@tlrx tlrx requested review from burqen, fcofdez and martijnvg February 27, 2026 08:36
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a couple of comments.

cluster_features: ["mapper.tsdb_nested_field_support"]
reason: "tsdb index with nested field support enabled"

- skip:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this test suite be executed anytime then?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be executed on snapshot builds today, not on release tests. Once we enable the feature by default then we can remove the test suite and only rely on the two other.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yml tests run in mixed cluster environments and the cluster_feature is only "present" if all members of the cluster has the feature, so this test will run in all cluster combinations pre-9.4.0. So basically

Mixed cluster with old version <9.4.0, run this test.
Mixed cluster with old version >=9.4.0 run the other two.
I think of cluster_feature as a shortcut to check max shared IndexVersion across cluster.

@martijnvg
Copy link
Copy Markdown
Member

I have no idea if nested docs are widely used in time-series

I have never seen nested usage in tsdb. But it is possible to configure, so I think this change make sense. Especially later when we plan to reuse synthetic id for other index modes like logsdb.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left a question.

final var parentTimeSeriesId = parentTsIdField.binaryValue();
assert parentTimeSeriesId.equals(timeSeriesId);
assert parentTimeSeriesId.equals(TsidExtractingIdFieldMapper.extractTimeSeriesIdFromSyntheticId(uidEncoded));
if (this.useDocValuesSkipper) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is that useDocValuesSkipper is enabled for tsdb. Users would need to manually set index.mapping.use_doc_values_skipper to false. Did tests fail if these checks were not added? Just out of curiosity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users would need to manually set index.mapping.use_doc_values_skipper to false

I thought that was some undocumented setting that users would not modify. But if they can, then we should also run our tests with doc values skippers rarely disabled, do you agree?

Asking because I haven't randomized this setting so far, assuming it would be removed and always enabled in a short future.

Did tests fail if these checks were not added?

I would expect the tests to pass but I haven't tried for the previous reason.

Quickly trying on TSDBSyntheticIdsIT returns a lot of failures.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is reused in logsdb, but there it is disabled by default and we do document this setting iirc.
I don't think we have to support the case were index.mapping.use_doc_values_skipper is disabled. Maybe in another change we can add validation that prevents the use of synthetic id if this this setting is disabled?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in another change we can add validation that prevents the use of synthetic id if this this setting is disabled?

Thanks for the details. I think we can support use_doc_values_skipper to be disabled with a couple of extra changes that I can do if a follow up. That would be easier to maintain compare to preventing the use of synthetic id when doc values skippers are disabled.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #143389 to support disabling the doc values skippers with TSDB synthetic ids.

@tlrx tlrx force-pushed the 2026/02/26-es-14224 branch from 06439d7 to e8f6c49 Compare March 2, 2026 09:46
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 2, 2026
This change disables `index.mapping.use_doc_values_skipper` in about
10% of tests executions of the main TSDB with synthetic id tests, as
well as in some unit tests.

Relates ES-14224
Relates elastic#143151 (comment)
@tlrx tlrx requested a review from fcofdez March 3, 2026 09:16
@tlrx tlrx requested a review from martijnvg March 3, 2026 09:16
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

assert nestedDoc.getField(TimeSeriesRoutingHashFieldMapper.NAME) == null;

if (rootParentDoc != null) {
assert nestedDocFields.isEmpty() == false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can assert that nestedDocFields size is 4?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 0777769

Copy link
Copy Markdown
Contributor

@burqen burqen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. I just had some clarifying comments around cluster feature and suggestions for javadoc improvements. Great job!

cluster_features: ["mapper.tsdb_nested_field_support"]
reason: "tsdb index with nested field support enabled"

- skip:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yml tests run in mixed cluster environments and the cluster_feature is only "present" if all members of the cluster has the feature, so this test will run in all cluster combinations pre-9.4.0. So basically

Mixed cluster with old version <9.4.0, run this test.
Mixed cluster with old version >=9.4.0 run the other two.
I think of cluster_feature as a shortcut to check max shared IndexVersion across cluster.

return CONTENT_TYPE;
}

private void addSyntheticIdFieldsToNestedDocs(DocumentParserContext context, BytesRef timeSeriesId, BytesRef uidEncoded) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great assertions in this method! 👍

@tlrx tlrx requested a review from romseygeek March 3, 2026 15:37
Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit ec5e325 into elastic:main Mar 5, 2026
35 checks passed
@tlrx tlrx deleted the 2026/02/26-es-14224 branch March 5, 2026 07:46
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Mar 5, 2026

Thanks everyone!

burqen pushed a commit to burqen/elasticsearch that referenced this pull request Mar 5, 2026
…astic#143151)

This change adds the doc values fields that are
required for synthetic id to work in nested
documents. It also fixes an assertion that
tripped during tests execution.

The changes in TSDBSyntheticIdsIT allows to
run the existing tests with nested docs in 10%
executions (it uses rarely()).

Relates ES-14224
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 5, 2026
…astic#143151)

This change adds the doc values fields that are
required for synthetic id to work in nested
documents. It also fixes an assertion that
tripped during tests execution.

The changes in TSDBSyntheticIdsIT allows to
run the existing tests with nested docs in 10%
executions (it uses rarely()).

Relates ES-14224
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
…astic#143151)

This change adds the doc values fields that are
required for synthetic id to work in nested
documents. It also fixes an assertion that
tripped during tests execution.

The changes in TSDBSyntheticIdsIT allows to
run the existing tests with nested docs in 10%
executions (it uses rarely()).

Relates ES-14224
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants