elastic
diff --git a/‎docs/reference/monitoring/collectors.asciidoc‎
Lines changed: 150 additions & 0 deletions b/‎docs/reference/monitoring/collectors.asciidoc‎
Lines changed: 150 additions & 0 deletions
diff --git a/‎docs/reference/monitoring/exporters.asciidoc‎
Lines changed: 172 additions & 0 deletions b/‎docs/reference/monitoring/exporters.asciidoc‎
Lines changed: 172 additions & 0 deletions
@@ -0,0 +1,150 @@
+[role="xpack"]
+[testenv="basic"]
+[[es-monitoring-collectors]]
+== Collectors
+
+Collectors, as their name implies, collect things. Each collector runs once for
+each collection interval to obtain data from the public APIs in {es} and {xpack}
+that it chooses to monitor. When the data collection is finished, the data is
+handed in bulk to the <<es-monitoring-exporters,exporters>> to be sent to the
+monitoring clusters. Regardless of the number of exporters, each collector only
+runs once per collection interval.
+
+There is only one collector per data type gathered. In other words, for any
+monitoring document that is created, it comes from a single collector rather
+than being merged from multiple collectors. {monitoring} for {es} currently has
+a few collectors because the goal is to minimize overlap between them for
+optimal performance.
+
+Each collector can create zero or more monitoring documents. For example,
+the `index_stats` collector collects all index statistics at the same time to
+avoid many unnecessary calls.
+
+[options="header"]
+|=======================
+| Collector       | Data Types | Description
+| Cluster Stats   | `cluster_stats`
+| Gathers details about the cluster state, including parts of
+the actual cluster state (for example `GET /_cluster/state`) and statistics
+about it (for example, `GET /_cluster/stats`). This produces a single document
+type. In versions prior to X-Pack 5.5, this was actually three separate collectors
+that resulted in three separate types: `cluster_stats`, `cluster_state`, and
+`cluster_info`. In 5.5 and later, all three are combined into `cluster_stats`.
++
+This only runs on the _elected_ master node and the data collected
+(`cluster_stats`) largely controls the UI. When this data is not present, it
+indicates either a misconfiguration on the elected master node, timeouts related
+to the collection of the data, or issues with storing the data. Only a single
+document is produced per collection.
+| Index Stats     | `indices_stats`, `index_stats`
+| Gathers details about the indices in the cluster, both in summary and
+individually. This creates many documents that represent parts of the index
+statistics output (for example, `GET /_stats`).
++
+This information only needs to be collected once, so it is collected on the
+_elected_ master node. The most common failure for this collector relates to an
+extreme number of indices -- and therefore time to gather them -- resulting in
+timeouts. One summary `indices_stats` document is produced per collection and one
+`index_stats` document is produced per index, per collection.
+| Index Recovery  | `index_recovery`
+| Gathers details about index recovery in the cluster. Index recovery represents
+the assignment of _shards_ at the cluster level. If an index is not recovered,
+it is not usable. This also corresponds to shard restoration via snapshots.
++
+This information only needs to be collected once, so it is collected on the
+_elected_ master node. The most common failure for this collector relates to an
+extreme number of shards -- and therefore time to gather them -- resulting in
+timeouts. This creates a single document that contains all recoveries by default,
+which can be quite large, but it gives the most accurate picture of recovery in
+the production cluster.
+| Shards          | `shards`
+| Gathers details about all _allocated_ shards for all indices, particularly
+including what node the shard is allocated to.
++
+This information only needs to be collected once, so it is collected on the
+_elected_ master node. The collector uses the local cluster state to get the
+routing table without any network timeout issues unlike most other collectors.
+Each shard is represented by a separate monitoring document.
+| Jobs            | `job_stats`
+| Gathers details about all machine learning job statistics (for example,
+`GET /_xpack/ml/anomaly_detectors/_stats`).
++
+This information only needs to be collected once, so it is collected on the
+_elected_ master node. However, for the master node to be able to perform the
+collection, the master node must have `xpack.ml.enabled` set to true (default)
+and a license level that supports {ml}.
+| Node Stats      | `node_stats`
+| Gathers details about the running node, such as memory utilization and CPU
+usage (for example, `GET /_nodes/_local/stats`).
++
+This runs on _every_ node with {monitoring} enabled. One common failure
+results in the timeout of the node stats request due to too many segment files.
+As a result, the collector spends too much time waiting for the file system
+stats to be calculated until it finally times out. A single `node_stats`
+document is created per collection. This is collected per node to help to
+discover issues with nodes communicating with each other, but not with the
+monitoring cluster (for example, intermittent network issues or memory pressure).
+|=======================
+
+{monitoring} uses a single threaded scheduler to run the collection of {es} 
+monitoring data by all of the appropriate collectors on each node. This 
+scheduler is managed locally by each node and its interval is controlled by 
+specifying the `xpack.monitoring.collection.interval`, which defaults to 10 
+seconds (`10s`), at either the node or cluster level.
+
+Fundamentally, each collector works on the same principle. Per collection
+interval, each collector is checked to see whether it should run and then the 
+appropriate collectors run. The failure of an individual collector does not 
+impact any other collector.
+
+Once collection has completed, all of the monitoring data is passed to the
+exporters to route the monitoring data to the monitoring clusters. 
+
+If gaps exist in the monitoring charts in {kib}, it is typically because either
+a collector failed or the monitoring cluster did not receive the data (for
+example, it was being restarted). In the event that a collector fails, a logged
+error should exist on the node that attempted to perform the collection.
+
+NOTE: Collection is currently done serially, rather than in parallel, to avoid
+      extra overhead on the elected master node. The downside to this approach
+      is that collectors might observe a different version of the cluster state
+      within the same collection period. In practice, this does not make a
+      significant difference and running the collectors in parallel would not
+      prevent such a possibility.
+
+For more information about the configuration options for the collectors, see
+<<monitoring-collection-settings>>.
+
+[float]
+[[es-monitoring-stack]]
+=== Collecting data from across the Elastic Stack
+
+{monitoring} in {es} also receives monitoring data from other parts of the
+Elastic Stack. In this way, it serves as an unscheduled monitoring data
+collector for the stack.
+
+By default, data collection is disabled. {es} monitoring data is not
+collected and all monitoring data from other sources such as {kib}, Beats, and
+Logstash is ignored. You must set `xpack.monitoring.collection.enabled` to `true`
+to enable the collection of monitoring data. See <<monitoring-settings>>.
+
+Once data is received, it is forwarded to the exporters
+to be routed to the monitoring cluster like all monitoring data.
+
+WARNING: Because this stack-level "collector" lives outside of the collection
+interval of {monitoring} for {es}, it is not impacted by the
+`xpack.monitoring.collection.interval` setting. Therefore, data is passed to the
+exporters whenever it is received. This behavior can result in indices for {kib},
+Logstash, or Beats being created somewhat unexpectedly.
+
+While the monitoring data is collected and processed, some production cluster
+metadata is added to incoming documents. This metadata enables {kib} to link the
+monitoring data to the appropriate cluster. If this linkage is unimportant to
+the infrastructure that you're monitoring, it might be simpler to configure
+Logstash and Beats to report monitoring data directly to the monitoring cluster.
+This scenario also prevents the production cluster from adding extra overhead
+related to monitoring data, which can be very useful when there are a large
+number of Logstash nodes or Beats.
+
+For more information about typical monitoring architectures, see
+{xpack-ref}/how-monitoring-works.html[How Monitoring Works].
@@ -0,0 +1,172 @@
+[role="xpack"]
+[testenv="basic"]
+[[es-monitoring-exporters]]
+== Exporters
+
+The purpose of exporters is to take data collected from any Elastic Stack
+source and route it to the monitoring cluster. It is possible to configure
+more than one exporter, but the general and default setup is to use a single
+exporter.
+
+There are two types of exporters in {es}:
+
+`local`::
+The default exporter used by {monitoring} for {es}. This exporter routes data
+back into the _same_ cluster. See <<local-exporter>>. 
+
+`http`::
+The preferred exporter, which you can use to route data into any supported
+{es} cluster accessible via HTTP. Production environments should always use a
+separate monitoring cluster. See <<http-exporter>>. 
+
+Both exporters serve the same purpose: to set up the monitoring cluster and route
+monitoring data. However, they perform these tasks in very different ways. Even
+though things happen differently, both exporters are capable of sending all of
+the same data.
+
+Exporters are configurable at both the node and cluster level. Cluster-wide
+settings, which are updated with the
+<<cluster-update-settings,`_cluster/settings` API>>, take precedence over
+settings in the `elasticsearch.yml` file on each node. When you update an
+exporter, it is completely replaced by the updated version of the exporter.
+
+IMPORTANT: It is critical that all nodes share the same setup. Otherwise,
+monitoring data might be routed in different ways or to different places.
+
+When the exporters route monitoring data into the monitoring cluster, they use
+`_bulk` indexing for optimal performance. All monitoring data is forwarded in 
+bulk to all enabled exporters on the same node. From there, the exporters 
+serialize the monitoring data and send a bulk request to the monitoring cluster.
+There is no queuing--in memory or persisted to disk--so any failure during the 
+export results in the loss of that batch of monitoring data. This design limits 
+the impact on {es} and the assumption is that the next pass will succeed.
+
+Routing monitoring data involves indexing it into the appropriate monitoring 
+indices. Once the data is indexed, it exists in a monitoring index that, by 
+default, is named with a daily index pattern. For {es} monitoring data, this is 
+an index that matches `.monitoring-es-6-*`. From there, the data lives inside 
+the monitoring cluster and must be curated or cleaned up as necessary. If you do 
+not curate the monitoring data, it eventually fills up the nodes and the cluster 
+might fail due to lack of disk space. 
+
+TIP: You are strongly recommended to manage the curation of indices and 
+particularly the monitoring indices. To do so, you can take advantage of the
+<<local-exporter-cleaner,cleaner service>> or 
+{curator-ref-current}/index.html[Elastic Curator].
+
+//TO-DO: Add information about index lifecycle management https://github.com/elastic/x-pack-elasticsearch/issues/2814
+
+When using cluster alerts, {watcher} creates daily `.watcher_history*` indices. 
+These are not managed by {monitoring} and they are not curated automatically. It 
+is therefore critical that you curate these indices to avoid an undesirable and 
+unexpected increase in the number of shards and indices and eventually the 
+amount of disk usage. If you are using a `local` exporter, you can set the 
+`xpack.watcher.history.cleaner_service.enabled` setting to `true` and curate the 
+`.watcher_history*` indices by using the 
+<<local-exporter-cleaner,cleaner service>>. See <<general-notification-settings>>. 
+
+There is also a disk watermark (known as the flood stage 
+watermark), which protects clusters from running out of disk space. When this 
+feature is triggered, it makes all indices (including monitoring indices) 
+read-only until the issue is fixed and a user manually makes the index writeable 
+again. While an active monitoring index is read-only, it will naturally fail to 
+write (index) new data and will continuously log errors that indicate the write 
+failure. For more information, see 
+{ref}/disk-allocator.html[Disk-based Shard Allocation].
+
+[float]
+[[es-monitoring-default-exporter]]
+=== Default exporters
+
+If a node or cluster does not explicitly define an {monitoring} exporter, the
+following default exporter is used:
+
+[source,yaml]
+---------------------------------------------------
+xpack.monitoring.exporters.default_local: <1>
+  type: local
+---------------------------------------------------
+<1> The exporter name uniquely defines the exporter, but it is otherwise unused.
+    When you specify your own exporters, you do not need to explicitly overwrite
+    or reference `default_local`.
+
+If another exporter is already defined, the default exporter is _not_ created.
+When you define a new exporter, if the default exporter exists, it is
+automatically removed.
+
+[float]
+[[es-monitoring-templates]]
+=== Exporter templates and ingest pipelines
+
+Before exporters can route monitoring data, they must set up certain {es}
+resources. These resources include templates and ingest pipelines. The
+following table lists the templates that are required before an exporter can
+route monitoring data:
+
+[options="header"]
+|=======================
+| Template               | Purpose
+| `.monitoring-alerts`   | All cluster alerts for monitoring data.
+| `.monitoring-beats`    | All Beats monitoring data.
+| `.monitoring-es`       | All {es} monitoring data.
+| `.monitoring-kibana`   | All {kib} monitoring data.
+| `.monitoring-logstash` | All Logstash monitoring data.
+|=======================
+
+The templates are ordinary {es} templates that control the default settings and
+mappings for the monitoring indices.
+
+By default, monitoring indices are created daily (for example,
+`.monitoring-es-6-2017.08.26`). You can change the default date suffix for
+monitoring indices with the `index.name.time_format` setting. You can use this
+setting to control how frequently monitoring indices are created by a specific
+`http` exporter. You cannot use this setting with `local` exporters. For more
+information, see <<http-exporter-settings>>.
+
+WARNING: Some users create their own templates that match _all_ index patterns,
+which therefore impact the monitoring indices that get created. It is critical
+that you do not disable `_source` storage for the monitoring indices. If you do,
+{monitoring} for {kib} does not work and you cannot visualize monitoring data
+for your cluster.
+
+The following table lists the ingest pipelines that are required before an
+exporter can route monitoring data:
+
+[options="header"]
+|=======================
+| Pipeline               | Purpose
+| `xpack_monitoring_2`   | Upgrades X-Pack monitoring data coming from X-Pack
+5.0 - 5.4 to be compatible with the format used in {monitoring} 5.5.
+| `xpack_monitoring_6`   | A placeholder pipeline that is empty.
+|=======================
+
+Exporters handle the setup of these resources before ever sending data. If
+resource setup fails (for example, due to security permissions), no data is sent
+and warnings are logged.
+
+NOTE: Empty pipelines are evaluated on the coordinating node during indexing and
+they are ignored without any extra effort. This inherently makes them a safe,
+no-op operation.
+
+For monitoring clusters that have disabled `node.ingest` on all nodes, it is
+possible to disable the use of the ingest pipeline feature. However, doing so
+blocks its purpose, which is to upgrade older monitoring data as our mappings
+improve over time. Beginning in 6.0, the ingest pipeline feature is a
+requirement on the monitoring cluster; you must have `node.ingest` enabled on at
+least one node.
+
+WARNING: Once any node running 5.5 or later has set up the templates and ingest
+pipeline on a monitoring cluster, you must use {kib} 5.5 or later to view all
+subsequent data on the monitoring cluster. The easiest way to determine
+whether this update has occurred is by checking for the presence of indices
+matching `.monitoring-es-6-*` (or more concretely the existence of the
+new pipeline). Versions prior to 5.5 used `.monitoring-es-2-*`.
+
+Each resource that is created by an {monitoring} exporter has a `version` field, 
+which is used to determine whether the resource should be replaced. The `version` 
+field value represents the latest version of {monitoring} that changed the 
+resource. If a resource is edited by someone or something external to 
+{monitoring}, those changes are lost the next time an automatic update occurs. 
+
+include::local-export.asciidoc[]
+include::http-export.asciidoc[]