diff --git a/docs/reference/docs/reindex.asciidoc b/docs/reference/docs/reindex.asciidoc index 00c606442aa3f..203c7746b7252 100644 --- a/docs/reference/docs/reindex.asciidoc +++ b/docs/reference/docs/reindex.asciidoc @@ -252,7 +252,7 @@ POST _reindex // CONSOLE // TEST[setup:twitter] - +[[reindex-scripts]] Like `_update_by_query`, `_reindex` supports a script that modifies the document. Unlike `_update_by_query`, the script is allowed to modify the document's metadata. This example bumps the version of the source document: diff --git a/docs/reference/ilm/ilm-with-existing-indices.asciidoc b/docs/reference/ilm/ilm-with-existing-indices.asciidoc new file mode 100644 index 0000000000000..60aff62b714c2 --- /dev/null +++ b/docs/reference/ilm/ilm-with-existing-indices.asciidoc @@ -0,0 +1,416 @@ +[role="xpack"] +[testenv="basic"] +[[ilm-with-existing-indices]] +== Using {ilm-init} with existing indices + +While it is recommended to use {ilm-init} to manage the index lifecycle from +start to finish, it may be useful to use {ilm-init} with existing indices, +particularly when transitioning from an alternative method of managing the index +lifecycle such as Curator, or when migrating from daily indices to +rollover-based indices. Such use cases are fully supported, but there are some +configuration differences from when {ilm-init} can manage the complete index +lifecycle. + +This section describes strategies to leverage {ilm-init} for existing periodic +indices when migrating to fully {ilm-init}-manged indices, which can be done in +a few different ways, each providing different tradeoffs. As an example, we'll +walk through a use case of a very simple logging index with just a field for the +log message and a timestamp. + +First, we need to create a template for these indices: + +[source,js] +----------------------- +PUT _template/mylogs_template +{ + "index_patterns": [ + "mylogs-*" + ], + "settings": { + "number_of_shards": 1, + "number_of_replicas": 1 + }, + "mappings": { + "properties": { + "message": { + "type": "text" + }, + "@timestamp": { + "type": "date" + } + } + } +} +----------------------- +// CONSOLE +// TEST + +And we'll ingest a few documents to create a few daily indices: + +[source,js] +----------------------- +POST mylogs-pre-ilm-2019.06.24/_doc +{ + "@timestamp": "2019-06-24T10:34:00", + "message": "this is one log message" +} +----------------------- +// CONSOLE +// TEST[continued] + +[source,js] +----------------------- +POST mylogs-pre-ilm-2019.06.25/_doc +{ + "@timestamp": "2019-06-25T17:42:00", + "message": "this is another log message" +} +----------------------- +// CONSOLE +// TEST[continued] + +Now that we have these indices, we'll look at a few different ways of migrating +these indices to ILM. + +[[ilm-with-existing-periodic-indices]] +=== Managing existing periodic indices with {ilm-init} + +NOTE: The examples in this section assume daily indices as set up in +<>. + +The simplest way to manage existing indices while transitioning to fully +{ilm-init}-managed indices is to allow all new indices to be fully managed by +{ilm-init} before attaching {ilm-init} policies to existing indices. To do this, +all new documents should be directed to {ilm-init}-managed indices - if you are +using Beats or Logstash data shippers, upgrading all of those shippers to +version 7.0.0 or higher will take care of that part for you. If you are not +using Beats or Logstash, you may need to set up ILM for new indices yourself as +demonstrated in the <>. + +NOTE: If you are using Beats through Logstash, you may need to change your +Logstash output configuration and invoke the Beats setup to use ILM for new +data. + +Once all new documents are being written to fully {ilm-init}-managed indices, it +is easy to add an {ilm-init} policy to existing indices. However, there are two +things to keep in mind when doing this, and a trick that makes those two things +much easier to handle. + +The two biggest things to keep in mind are: + +1. Existing periodic indices shouldn't use policies with rollover, because +rollover is used to manage where new data goes. Since existing indices should no +longer be receiving new documents, there is no point to using rollover for them. + +2. {ilm-init} policies attached to existing indices will compare the `min_age` +for each phase to the original creation date of the index, and so might proceed +through multiple phases immediately. + +The first one is the most important, because it makes it difficult to use the +same policy for new and existing periodic indices. But that's easy to solve +with one simple trick: Create a second policy for existing indices, in addition +to the one for new indices. {ilm-init} policies are cheap to create, so don't be +afraid to have more than one. Modifying a policy designed for new indices to be +used on existing indices is generally very simple: just remove the `rollover` +action. + +For example, if you created a policy for your new indices with each phase +like so: +[source,js] +----------------------- +PUT _ilm/policy/mylogs_policy +{ + "policy": { + "phases": { + "hot": { + "actions": { + "rollover": { + "max_size": "25GB" + } + } + }, + "warm": { + "min_age": "1d", + "actions": { + "forcemerge": { + "max_num_segments": 1 + } + } + }, + "cold": { + "min_age": "7d", + "actions": { + "freeze": {} + } + }, + "delete": { + "min_age": "30d", + "actions": { + "delete": {} + } + } + } + } +} +----------------------- +// CONSOLE +// TEST[continued] + +You can create a policy for pre-existing indices by removing the `rollover` +action, and in this case, the `hot` phase is now empty so we can remove that +too: + +[source,js] +----------------------- +PUT _ilm/policy/mylogs_policy_existing +{ + "policy": { + "phases": { + "warm": { + "min_age": "1d", + "actions": { + "forcemerge": { + "max_num_segments": 1 + } + } + }, + "cold": { + "min_age": "7d", + "actions": { + "freeze": {} + } + }, + "delete": { + "min_age": "30d", + "actions": { + "delete": {} + } + } + } + } +} +----------------------- +// CONSOLE +// TEST[continued] + +Creating a separate policy for existing indices will also allow using different +`min_age` values. You may want to use higher values to prevent many indices from +running through the policy at once, which may be important if your policy +includes potentially resource-intensive operations like force merge. + +You can configure the lifecycle for many indices at once by using wildcards in +the index name when calling the <> +to set the policy name, but be careful that you don't include any indices that +you don't want to change the policy for: + +[source,js] +----------------------- +PUT mylogs-pre-ilm*/_settings <1> +{ + "index": { + "lifecycle": { + "name": "mylogs_policy_existing" + } + } +} +----------------------- +// CONSOLE +// TEST[continued] + +<1> This pattern will match all indices with names that start with +`mylogs-pre-ilm` + +Once all pre-{ilm-init} indices have aged out and been deleted, the policy for +older periodic indices can be deleted. + +[[ilm-reindexing-into-rollover]] +=== Reindexing via {ilm-init} + +NOTE: The examples in this section assume daily indices as set up in +<>. + +In some cases, it may be useful to reindex data into {ilm-init}-managed indices. +This is more complex than simply attaching policies to existing indices as +described in <>, and +requires pausing indexing during the reindexing process. However, this technique +may be useful in cases where periodic indices were created with very small +amounts of data leading to excessive shard counts, or for indices which grow +steadily over time, but have not been broken up into time-series indices leading +to shards which are much too large, situations that cause significant +performance problems. + +Before getting started with reindexing data, the new index structure should be +set up. For this section, we'll be using the same setup described in +<>. + +First, we'll set up a policy with rollover, and can include any additional +phases required. For simplicity, we'll just use rollover: + +[source,js] +----------------------- +PUT _ilm/policy/sample_policy +{ + "policy": { + "phases": { + "hot": { + "actions": { + "rollover": { + "max_age": "7d", + "max_size": "50G" + } + } + } + } + } +} +----------------------- +// CONSOLE +// TEST[continued] + +And now we'll update the index template for our indices to include the relevant +{ilm-init} settings: + +[source,js] +----------------------- +PUT _template/mylogs_template +{ + "index_patterns": [ + "ilm-mylogs-*" <1> + ], + "settings": { + "number_of_shards": 1, + "number_of_replicas": 1, + "index": { + "lifecycle": { + "name": "mylogs_condensed_policy", <2> + "rollover_alias": "mylogs" <3> + } + } + }, + "mappings": { + "properties": { + "message": { + "type": "text" + }, + "@timestamp": { + "type": "date" + } + } + } +} +----------------------- +// CONSOLE +// TEST[continued] +<1> The new index pattern has a prefix compared to the old one, this will + make it easier to reindex later +<2> The name of the policy we defined above +<3> The name of the alias we'll use to write to and query + +And create the first index with the alias specified in the `rollover_alias` +setting in the index template: + +[source,js] +----------------------- +PUT ilm-mylogs-000001 +{ + "aliases": { + "mylogs": { + "is_write_index": true + } + } +} +----------------------- +// CONSOLE +// TEST[continued] + +All new documents should be indexed via the `mylogs` alias at this point. Adding +new data to the old indices during the reindexing process can cause data to be +added to the old indices, but not be reindexed into the new indices. + +NOTE: If you do not want to mix new data and old data in the new ILM-managed +indices, indexing of new data should be paused entirely while the reindex +completes. Mixing old and new data within one index is safe, but keep in mind +that the indices with mixed data should be retained in their entirety until you +are ready to delete both the old and new data. + +By default, {ilm-init} only checks rollover conditions every 10 minutes. Under +normal indexing load, this usually works well, but during reindexing, indices +can grow very, very quickly. We'll need to set the poll interval to something +shorter to ensure that the new indices don't grow too large while waiting for +the rollover check: + +[source,js] +----------------------- +PUT _cluster/settings +{ + "transient": { + "indices.lifecycle.poll_interval": "1m" <1> + } +} +----------------------- +// CONSOLE +// TEST[skip:don't want to overwrite this setting for other tests] +<1> This tells ILM to check for rollover conditions every minute + +We're now ready to reindex our data using the <>. If +you have a timestamp or date field in your documents, as in this example, it may +be useful to specify that the documents should be sorted by that field - this +will mean that all documents in `ilm-mylogs-000001` come before all documents in +`ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting +the sort will allow the data to be reindexed more quickly. + +IMPORTANT: If your data uses document IDs generated by means other than +Elasticsearch's automatic ID generation, you may need to do additional +processing to ensure that the document IDs don't conflict during the reindex, as +documents will retain their original IDs. One way to do this is to use a +<> in the reindex call to append the original index name +to the document ID. + +[source,js] +----------------------- +POST _reindex +{ + "source": { + "index": "mylogs-*", <1> + "sort": { "@timestamp": "desc" } + }, + "dest": { + "index": "mylogs", <2> + "op_type": "create" <3> + } +} +----------------------- +// CONSOLE +// TEST[continued] +<1> This index pattern matches our existing indices. Using the prefix for + the new indices makes using this index pattern much easier. +<2> The alias set up above +<3> This option will cause the reindex to abort if it encounters multiple + documents with the same ID. This is optional, but recommended to prevent + accidentally overwriting documents if two documents from different indices + have the same ID. + +Once this completes, indexing new data can be resumed, as long as all new +documents are indexed into the alias used above. All data, existing and new, can +be queried using that alias as well. We should also be sure to set the +{ilm-init} poll interval back to its default value, because keeping it set too +low can cause unnecessary load on the current master node: + +[source,js] +----------------------- +PUT _cluster/settings +{ + "transient": { + "indices.lifecycle.poll_interval": null + } +} + +----------------------- +// CONSOLE +// TEST[skip:don't want to overwrite this setting for other tests] + +All of the reindexed data should now be accessible via the alias set up above, +in this case `mylogs`. Once you have verified that all the data has been +reindexed and is available in the new indices, the existing indices can be +safely removed. \ No newline at end of file diff --git a/docs/reference/ilm/index.asciidoc b/docs/reference/ilm/index.asciidoc index 50d2e5f6dac22..3ace2efe95bfd 100644 --- a/docs/reference/ilm/index.asciidoc +++ b/docs/reference/ilm/index.asciidoc @@ -84,4 +84,6 @@ include::ilm-and-snapshots.asciidoc[] include::start-stop-ilm.asciidoc[] +include::ilm-with-existing-indices.asciidoc[] + include::getting-started-slm.asciidoc[]