elastic · bmorelli25 · May 19, 2023 · May 4, 2023 · May 17, 2023 · May 17, 2023
@@ -0,0 +1,43 @@
+[[common-response-codes]]
+=== APM Server response codes
+
+[[bad-request]]
+[float]
+==== HTTP 400: Data decoding error / Data validation error
+
+The most likely cause for this error is using incompatible versions of {apm-agent} and APM Server.
+See the <<agent-server-compatibility,agent/server compatibility matrix>> to verify compatibility.
+
+[[event-too-large]]
+[float]
+==== HTTP 400: Event too large
+
+APM agents communicate with the APM server by sending events in an HTTP request. Each event is sent as its own line in the HTTP request body. If events are too large, you should consider increasing the <<apm-input-general-settings,maximum size per event>>
+setting in the APM integration, and adjusting relevant settings in the agent.
+
+[[unauthorized]]
+[float]
+==== HTTP 401: Invalid token
+
+Either the <<secret-token>> in the request header doesn't match the secret token configured in the APM integration,
+or the <<api-key>> is invalid.
+
+[[forbidden]]
+[float]
+==== HTTP 403: Forbidden request
+
+Either you are sending requests to a <<apm-rum,RUM>> endpoint without RUM enabled, or a request
+is coming from an origin not specified in the APM integration settings.
+See the <<apm-input-rum-settings,Allowed origins>> setting for more information.
+
+[[request-timed-out]]
+[float]
+==== HTTP 503: Request timed out waiting to be processed
+
+This happens when APM Server exceeds the maximum number of requests that it can process concurrently.
+To alleviate this problem, you can try to: reduce the sample rate and/or reduce the collected stack trace information.
+See <<reduce-apm-storage>> for more information.
+
+Another option is to increase processing power.
+This can be done by either migrating your {agent} to a more powerful machine
+or adding more APM Server instances.
@@ -0,0 +1,29 @@
+[[server-es-down]]
+=== What happens when APM Server or {es} is down?
+
+*If {es} is down*
+
+APM Server does not have an internal queue to buffer requests,
+but instead leverages an HTTP request timeout to act as back-pressure.
+If {es} goes down, the APM Server will eventually deny incoming requests.
+Both the APM Server and {apm-agent}(s) will issue logs accordingly.
+
+*If APM Server is down*
+
+Some agents have internal queues or buffers that will temporarily store data if the APM Server goes down.
+As a general rule of thumb, queues fill up quickly. Assume data will be lost if APM Server goes down.
+Adjusting these queues/buffers can increase the agent's overhead, so use caution when updating default values.
+
+* **Go agent** - Circular buffer with configurable size:
+{apm-go-ref}/configuration.html#config-api-buffer-size[`ELASTIC_APM_BUFFER_SIZE`].
+// * **iOS agent** - ??
+* **Java agent** - Internal buffer with configurable size:
+{apm-java-ref}/config-reporter.html#config-max-queue-size[`max_queue_size`].
+* **Node.js agent** - No internal queue. Data is lost.
+* **PHP agent** - No internal queue. Data is lost.
+* **Python agent** - Internal {apm-py-ref}/tuning-and-overhead.html#tuning-queue[Transaction queue]
+with configurable size and time between flushes.
+* **Ruby agent** - Internal queue with configurable size:
+{apm-ruby-ref}/configuration.html#config-api-buffer-size[`api_buffer_size`].
+* **RUM agent** - No internal queue. Data is lost.
+* **.NET agent** - No internal queue. Data is lost.
@@ -1,99 +1,72 @@
 [[common-problems]]
 === Common problems
 
-This section describes common problems for users running {agent} and the APM integration.
-If you're using the standalone (legacy) APM Server binary, see
-<<common-problems-legacy,legacy common problems>> instead.
+This section describes common problems you might encounter when using a Fleet-managed APM Server.
 
 * <<no-data-indexed>>
 * <<common-response-codes>>
 * <<common-ssl-problems>>
 * <<io-timeout>>
-* <<server-es-down>>
+* <<field-limit-exceeded-legacy>>
 
 [float]
 [[no-data-indexed]]
 === No data is indexed
 
 If no data shows up in {es}, first make sure that your APM components are properly connected.
 
-**Is {agent} healthy?**
-
-In {kib} open **{fleet}** and find the host that is running the APM integration;
-confirm that its status is **Healthy**.
-If it isn't, check the {agent} logs to diagnose potential causes.
-See {fleet-guide}/monitor-elastic-agent.html[Monitor {agent}s] to learn more.
-
-**Is APM Server happy?**
-
-In {kib}, open **{fleet}** and select the host that is running the APM integration.
-Open the **Logs** tab and select the `elastic_agent.apm_server` dataset.
-Look for any APM Server errors that could help diagnose the problem.
-
-**Can the {apm-agent} connect to APM Server**
-
-To determine if the {apm-agent} can connect to the APM Server, send requests to the instrumented service and look for lines
-containing `[request]` in the APM Server logs.
-
-If no requests are logged, confirm that:
-
-. SSL isn't <<ssl-client-fails, misconfigured>>.
-. The host is correct. For example, if you're using Docker, ensure a bind to the right interface (for example, set
-`apm-server.host = 0.0.0.0:8200` to match any IP) and set the `SERVER_URL` setting in the {apm-agent} accordingly.
-
-If you see requests coming through the APM Server but they are not accepted (a response code other than `202`),
-see <<common-response-codes>> to narrow down the possible causes.
-
-**Instrumentation gaps**
-
-APM agents provide auto-instrumentation for many popular frameworks and libraries.
-If the {apm-agent} is not auto-instrumenting something that you were expecting, data won't be sent to the {stack}.
-Reference the relevant {apm-agents-ref}/index.html[{apm-agent} documentation] for details on what is automatically instrumented.
+include::./tab-widgets/no-data-indexed-widget.asciidoc[]
 
+[[data-indexed-no-apm-legacy]]
 [float]
-[[common-response-codes]]
-=== APM Server response codes
+=== Data is indexed but doesn't appear in the APM app
 
-[[bad-request]]
-[float]
-==== HTTP 400: Data decoding error / Data validation error
+The {apm-app} relies on index mappings to query and display data.
+If your APM data isn't showing up in the {apm-app}, but is elsewhere in {kib}, like the Discover app,
+you may have a missing index mapping.
 
-The most likely cause for this error is using incompatible versions of {apm-agent} and APM Server.
-See the <<agent-server-compatibility,agent/server compatibility matrix>> to verify compatibility.
+You can determine if a field was mapped correctly with the `_mapping` API.
+For example, run the following command in the {kib} {kibana-ref}/console-kibana.html[console].
+This will display the field data type of the `service.name` field.
 
-[[event-too-large]]
-[float]
-==== HTTP 400: Event too large
-
-APM agents communicate with the APM server by sending events in an HTTP request. Each event is sent as its own line in the HTTP request body. If events are too large, you should consider increasing the <<apm-input-general-settings,maximum size per event>>
-setting in the APM integration, and adjusting relevant settings in the agent.
+[source,curl]
+----
+GET *apm*/_mapping/field/service.name
+----
 
-[[unauthorized]]
-[float]
-==== HTTP 401: Invalid token
+If the `mapping.name.type` is `"text"`, your APM indices were not set up correctly.
 
-Either the <<secret-token>> in the request header doesn't match the secret token configured in the APM integration,
-or the <<api-key>> is invalid.
+[source,yml]
+----
+".ds-metrics-apm.transaction.1m-default-2023.04.12-000038": {
+   "mappings": {
+      "service.name": {
+         "full_name": "service.name",
+         "mapping": {
+            "name": {
+               "type": "text" <1>
+            }
+         }
+      }
+   }
+}
+----
+<1> The `service.name` `mapping.name.type` would be `"keyword"` if this field had been set up correctly.
 
-[[forbidden]]
-[float]
-==== HTTP 403: Forbidden request
+To fix this problem, install the APM integration by following these steps:
 
-Either you are sending requests to a <<apm-rum,RUM>> endpoint without RUM enabled, or a request
-is coming from an origin not specified in the APM integration settings.
-See the <<apm-input-rum-settings,Allowed origins>> setting for more information.
+--
+include::./legacy/getting-started-apm-server.asciidoc[tag=install-apm-integration]
+--
 
-[[request-timed-out]]
-[float]
-==== HTTP 503: Request timed out waiting to be processed
+This will reinstall the APM index templates and trigger a data stream index rollover.
 
-This happens when APM Server exceeds the maximum number of requests that it can process concurrently.
-To alleviate this problem, you can try to: reduce the sample rate and/or reduce the collected stack trace information.
-See <<reduce-apm-storage>> for more information.
+You can verify the correct index templates were installed by running the following command in the {kib} console:
 
-Another option is to increase processing power.
-This can be done by either migrating your {agent} to a more powerful machine
-or adding more APM Server instances.
+[source,curl]
+----
+GET /_index_template/traces-apm
+----
 
 [float]
 [[common-ssl-problems]]
@@ -191,31 +164,26 @@ APM agent --> Load Balancer  --> APM Server
 The APM Server timeout can be configured by updating the
 <<apm-input-general-settings,maximum duration for reading an entire request>>.
 
-[[server-es-down]]
+[[field-limit-exceeded-legacy]]
 [float]
-=== What happens when APM Server or {es} is down?
-
-APM Server does not have an internal queue to buffer requests,
-but instead leverages an HTTP request timeout to act as back-pressure.
-If {es} goes down, the APM Server will eventually deny incoming requests.
-Both the APM Server and {apm-agent}(s) will issue logs accordingly.
-
-If either {es} or the APM Server goes down,
-some APM agents have internal queues or buffers that will temporarily store data.
-As a general rule of thumb, queues fill up quickly. Assume data will be lost if APM Server or {es} goes down.
-
-Adjusting {apm-agent} queues/buffers can increase the agent's overhead, so use caution when updating default values.
-
-* **Go agent** - Circular buffer with configurable size:
-{apm-go-ref}/configuration.html#config-api-buffer-size[`ELASTIC_APM_BUFFER_SIZE`].
-// * **iOS agent** -
-* **Java agent** - Internal buffer with configurable size:
-{apm-java-ref}/config-reporter.html#config-max-queue-size[`max_queue_size`].
-* **Node.js agent** - No internal queue. Data is lost.
-* **PHP agent** - No internal queue. Data is lost.
-* **Python agent** - Internal {apm-py-ref}/tuning-and-overhead.html#tuning-queue[Transaction queue]
-with configurable size and time between flushes.
-* **Ruby agent** - Internal queue with configurable size:
-{apm-ruby-ref}/configuration.html#config-api-buffer-size[`api_buffer_size`].
-* **RUM agent** - No internal queue. Data is lost.
-* **.NET agent** - No internal queue. Data is lost.
+=== Field limit exceeded
+
+When adding too many distinct tag keys on a transaction or span,
+you risk creating a link:{ref}/mapping.html#mapping-limit-settings[mapping explosion].
+
+For example, you should avoid that user-specified data,
+like URL parameters, is used as a tag key.
+Likewise, using the current timestamp or a user ID as a tag key is not a good idea.
+However, tag *values* with a high cardinality are not a problem.
+Just try to keep the number of distinct tag keys at a minimum.
+
+The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day,
+the spans and transactions will be indexed again because a new index is created each day.
+But as soon as the field limit is reached, indexing stops again.
+
+In the agent logs, you won't see a sign of failures as the APM server asynchronously sends the data it received from the agents to {es}. However, the APM server and {es} log a warning like this:
+
+[source,logs]
+----
+{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in [INDEX_NAME] has been exceeded\"}
+----
@@ -134,8 +134,6 @@ include::./legacy/getting-started-apm-server.asciidoc[]
 :beat-specific-security: {docdir}/legacy/security.asciidoc
 include::{libbeat-dir}/shared-securing-beat.asciidoc[leveloffset=+1]
 
-include::./legacy/troubleshooting.asciidoc[leveloffset=+1]
-
 // include::./legacy/breaking-changes.asciidoc[leveloffset=+1]
 
 include::./legacy/redirects.asciidoc[]