-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Auto create data streams using index templates v2 #55377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto create data streams using index templates v2 #55377
Conversation
This commit adds the ability to auto create data streams using index templates v2. Index templates (v2) now have a data_steam field that includes a timestamp field, if provided and index name matches with that template then a data stream (plus first backing index) is auto created. The index/bulk apis will redirect to the create data stream api instead of the create index api. Relates to elastic#53100
|
Pinging @elastic/es-core-features (:Core/Features/Data streams) |
…w to de-serialize it
removed unused imports
danhermann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some questions about minor, non-essential items.
| allowed_warnings: | ||
| - "index template [test] has index patterns [test-*] matching patterns from existing older templates [global] with patterns (global => [*]); this template [test] will take precedence during new index creation" | ||
| indices.put_index_template: | ||
| name: generic_logs_template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to remove this ITv2 at the end of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V2 templates are removed at the end of each test automatically, so shouldn't be a concern
| if (!(ExceptionsHelper.unwrapCause(e) instanceof ResourceAlreadyExistsException)) { | ||
| // fail all requests involving this index, if create didn't work | ||
| for (int i = 0; i < bulkRequest.requests.size(); i++) { | ||
| DocWriteRequest<?> item = bulkRequest.requests.get(i); | ||
| if (item != null && setResponseFailureIfIndexMatches(responses, i, item, name, e)) { | ||
| bulkRequest.requests.set(i, null); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates code in the onFailure handler in the doExecute method. Might be worth consolidating.
| Map<String, DataStreamTemplate> result = TransportBulkAction.resolveAutoCreateDataStreams(metadata, Set.of()); | ||
| assertThat(result, anEmptyMap()); | ||
|
|
||
| Set<String> autoCreateIndices = new HashSet<>(Set.of("logs-foobar", "logs-barbaz")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this variable be better named autoCreateDataStreams?
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg. I added a few comments that I believe need to be addressed.
| } | ||
| } | ||
|
|
||
| static Map<String, IndexTemplateV2.DataStreamTemplate> resolveAutoCreateDataStreams(Metadata metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, resolving this here on the coordinator is slightly trappy in that the coordinator could be disconnected while the request could still succeed. It may thus create an index if it does not see the template.
I think we could (and should) repair this by checking when creating indices if the template is for a data stream and if so, reject the request. But I also think it would be better to do the switching on master instead by adding a new action that handles the auto-creation of indices/streams. This would also be a good hook for security to validate auto_create privilege against anyway. I also think it could simplify the code in this class a fair bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is trappy indeed, thanks for spotting this. I will update the PR, so that most of the auto create logic here is moved to the to be built auto create action.
| Iterator<String> autoCreateIndicesIterator = autoCreateIndices.iterator(); | ||
| while (autoCreateIndicesIterator.hasNext()) { | ||
| String indexName = autoCreateIndicesIterator.next(); | ||
| String v2Template = MetadataIndexTemplateService.findV2Template(metadata, indexName, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means that if you have both v1 and v2 templates matching the same pattern and the v2 one is a data stream, it takes precedence even when prefer_v2_templates=false. I think this will cause some confusion and I would prefer to stick to respecting the option.
…ansportAction to MetadataCreateDataStreamService class. This new class can then also be used from the to be created auto create transport action.
For index and bulk requests that refer to indices/data streams that don't exist yet, the TransportBulkAction will redirect to the auto create action which will either create an index or data stream. Previously this decision was made in the TransportBulkAction, on the coordinating node, but if that node is currently unaware of the existence of specific data streams it will create an index instead of a data stream.
Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is unaware of certain index templates then the TransportBulkAction could create an index instead of a data stream. Therefor the decision of whether an index or data stream should be created should happen on the master node. See elastic#55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation.
|
After adding the auto create action, the size of the PR increased too much to be able to do a careful review. Also the PRs making two different changes. So I've split the auto create action in a new pr (#55858). I will update this pr after that PR gets merged. |
Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is slow in processing cluster state updates then it may be unaware of the existence of certain index templates, which then can load to the TransportBulkAction creating an index instead of a data stream. Therefor the coordination of creating an index or data stream should occur on the master node. See #55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation. This change adds the following new transport actions: AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created. The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege. Relates to #53100
Backport of elastic#55858 to 7.x branch. Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is slow in processing cluster state updates then it may be unaware of the existence of certain index templates, which then can load to the TransportBulkAction creating an index instead of a data stream. Therefor the coordination of creating an index or data stream should occur on the master node. See elastic#55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation. This change adds the following new transport actions: AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via elastic#55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created. The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege. Relates to elastic#53100
…ed in the same update that creates the index or data stream.
when creating data streams via api
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
|
||
| Boolean preferV2Templates = randomBoolean() ? null : true; | ||
| if (randomBoolean()) { | ||
| PutIndexTemplateRequest v1Request = new PutIndexTemplateRequest("logs-foo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably works in master, but will break in 7.x. I think we need to add preferV2Templates=true in this block? OK to just do this in backport (or before, up to you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this randomization now that the prefer flag has been removed.
| return createDataStream(metadataCreateIndexService, current, request); | ||
| } | ||
|
|
||
| public static final class CreateDataSteamClusterStateUpdateRequest extends ClusterStateUpdateRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class name is missing an 'r', should be CreateDataStreamClusterStateUpdateRequest
| response -> { | ||
| if (response.isAcknowledged()) { | ||
| String firstBackingIndexName = firstBackingIndexRef.get(); | ||
| assert finalListener != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
| assert finalListener != null; | |
| assert firstBackingIndexName != null; |
This commit adds the ability to auto create data streams using index templates v2. Index templates (v2) now have a data_steam field that includes a timestamp field, if provided and index name matches with that template then a data stream (plus first backing index) is auto created. Relates to elastic#53100
Backport: #55377 This commit adds the ability to auto create data streams using index templates v2. Index templates (v2) now have a data_steam field that includes a timestamp field, if provided and index name matches with that template then a data stream (plus first backing index) is auto created. Relates to #53100
…egration Relates to elastic#55377
This PR adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created. The index/bulk apis will redirect
to the create data stream api instead of the create index api.
Relates to #53100