Skip to content

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Apr 17, 2020

This PR adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created. The index/bulk apis will redirect
to the create data stream api instead of the create index api.

Relates to #53100

This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created. The index/bulk apis will redirect
to the create data stream api instead of the create index api.

Relates to elastic#53100
@martijnvg martijnvg added the :Data Management/Data streams Data streams and their lifecycles label Apr 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Data streams)

@martijnvg martijnvg marked this pull request as ready for review April 22, 2020 12:40
removed unused imports
Copy link
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some questions about minor, non-essential items.

allowed_warnings:
- "index template [test] has index patterns [test-*] matching patterns from existing older templates [global] with patterns (global => [*]); this template [test] will take precedence during new index creation"
indices.put_index_template:
name: generic_logs_template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to remove this ITv2 at the end of the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2 templates are removed at the end of each test automatically, so shouldn't be a concern

Comment on lines 337 to 345
if (!(ExceptionsHelper.unwrapCause(e) instanceof ResourceAlreadyExistsException)) {
// fail all requests involving this index, if create didn't work
for (int i = 0; i < bulkRequest.requests.size(); i++) {
DocWriteRequest<?> item = bulkRequest.requests.get(i);
if (item != null && setResponseFailureIfIndexMatches(responses, i, item, name, e)) {
bulkRequest.requests.set(i, null);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicates code in the onFailure handler in the doExecute method. Might be worth consolidating.

Map<String, DataStreamTemplate> result = TransportBulkAction.resolveAutoCreateDataStreams(metadata, Set.of());
assertThat(result, anEmptyMap());

Set<String> autoCreateIndices = new HashSet<>(Set.of("logs-foobar", "logs-barbaz"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this variable be better named autoCreateDataStreams?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martijnvg. I added a few comments that I believe need to be addressed.

}
}

static Map<String, IndexTemplateV2.DataStreamTemplate> resolveAutoCreateDataStreams(Metadata metadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, resolving this here on the coordinator is slightly trappy in that the coordinator could be disconnected while the request could still succeed. It may thus create an index if it does not see the template.

I think we could (and should) repair this by checking when creating indices if the template is for a data stream and if so, reject the request. But I also think it would be better to do the switching on master instead by adding a new action that handles the auto-creation of indices/streams. This would also be a good hook for security to validate auto_create privilege against anyway. I also think it could simplify the code in this class a fair bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is trappy indeed, thanks for spotting this. I will update the PR, so that most of the auto create logic here is moved to the to be built auto create action.

Iterator<String> autoCreateIndicesIterator = autoCreateIndices.iterator();
while (autoCreateIndicesIterator.hasNext()) {
String indexName = autoCreateIndicesIterator.next();
String v2Template = MetadataIndexTemplateService.findV2Template(metadata, indexName, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that if you have both v1 and v2 templates matching the same pattern and the v2 one is a data stream, it takes precedence even when prefer_v2_templates=false. I think this will cause some confusion and I would prefer to stick to respecting the option.

…ansportAction to

MetadataCreateDataStreamService class. This new class can then also be used from
the to be created auto create transport action.
For index and bulk requests that refer to indices/data streams that don't exist yet,
the TransportBulkAction will redirect to the auto create action which will either
create an index or data stream.

Previously this decision was made in the TransportBulkAction,
on the coordinating node, but if that node is currently unaware of the existence of
specific data streams it will create an index instead of a data stream.
@martijnvg martijnvg marked this pull request as draft April 28, 2020 11:40
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Apr 28, 2020
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:
* Auto creation of data streams can't occur on the coordinating node.
  Based on the index template (v2) either a regular index or a data stream should be created.
  However if the coordinating node is unaware of certain index templates then the TransportBulkAction
  could create an index instead of a data stream. Therefor the decision of whether an index or
  data stream should be created should happen on the master node. See elastic#55377
* From a security perspective it is useful to know whether index creation originates from the
  create index api or from auto creating a new index via the bulk or index api. For example
  a user would be allowed to auto create an index, but not to use the create index api. The
  auto create action will allow security to distinguish these two different patterns of
  index creation.
@martijnvg
Copy link
Member Author

After adding the auto create action, the size of the PR increased too much to be able to do a careful review. Also the PRs making two different changes. So I've split the auto create action in a new pr (#55858). I will update this pr after that PR gets merged.

martijnvg added a commit that referenced this pull request May 4, 2020
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377

* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:

AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Relates to #53100
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 4, 2020
Backport of elastic#55858 to 7.x branch.

Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See elastic#55377

* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:

AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via elastic#55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Relates to elastic#53100
@martijnvg martijnvg requested a review from henningandersen May 6, 2020 14:19
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


Boolean preferV2Templates = randomBoolean() ? null : true;
if (randomBoolean()) {
PutIndexTemplateRequest v1Request = new PutIndexTemplateRequest("logs-foo");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably works in master, but will break in 7.x. I think we need to add preferV2Templates=true in this block? OK to just do this in backport (or before, up to you).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this randomization now that the prefer flag has been removed.

return createDataStream(metadataCreateIndexService, current, request);
}

public static final class CreateDataSteamClusterStateUpdateRequest extends ClusterStateUpdateRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class name is missing an 'r', should be CreateDataStreamClusterStateUpdateRequest

response -> {
if (response.isAcknowledged()) {
String firstBackingIndexName = firstBackingIndexRef.get();
assert finalListener != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Suggested change
assert finalListener != null;
assert firstBackingIndexName != null;

@martijnvg martijnvg merged commit 74e2c01 into elastic:master May 12, 2020
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 12, 2020
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.

Relates to elastic#53100
martijnvg added a commit that referenced this pull request May 12, 2020
Backport: #55377

This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.

Relates to #53100
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 12, 2020
martijnvg added a commit that referenced this pull request May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants