Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metrics destination index and dataset to match v1 #2004

Closed
wants to merge 1 commit into from

Conversation

joshdover
Copy link
Contributor

What does this PR do?

Fixes the data streams that we send agent monitoring metrics to use the elastic_agent.<binary name> pattern from v1 instead of elastic_agent.<unit id>

Why is it important?

We need to be sure agent's monitoring metrics get delivered so they must match what's defined by the elastic_agent package. Also, having a data stream per unique unit will cause an explosion of data streams without user benefit. Without this change, metrics from individual binaries are being dropped with errors like:

{"type":"security_exception","reason":"action [indices:admin/auto_create] is unauthorized for API key id [OgRsP4UBpJowysjwmQ1H] of user [elastic/fleet-server] on indices [metrics-elastic_agent.system_metrics_8c767cd0_82cf_11ed_a802_7913b47c3a5f-default], this action is granted by the index privileges [auto_configure,create_index,manage,all]"}, dropping event!

This is the same change we did for logging in #1845 and the long-term plan for v2 logs and metrics will be discussed in:

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

How to test this PR locally

Enroll an agent with monitoring enabled, verify that none of the error logs above show up in logs-elastic_agent.metricbeat-default and verify that metrics are available in metrics-elastic_agent.metricbeat-default

@joshdover joshdover added bug Something isn't working V2-Architecture skip-changelog backport-v8.6.0 Automated backport with mergify labels Dec 23, 2022
@joshdover joshdover requested a review from a team as a code owner December 23, 2022 16:53
@joshdover joshdover requested review from aleksmaus and michalpristas and removed request for a team December 23, 2022 16:53
@@ -596,20 +596,20 @@ func (b *BeatsMonitor) injectMetricsInput(cfg map[string]interface{}, componentI
idKey: "metrics-monitoring-" + name,
"data_stream": map[string]interface{}{
"type": "metrics",
"dataset": fmt.Sprintf("elastic_agent.%s", name),
"dataset": fmt.Sprintf("elastic_agent.%s", binaryName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did do this change before as well: https://github.com/elastic/elastic-agent/pull/1854/files

Could you just change name := strings.ReplaceAll(strings.ReplaceAll(unit, "-", "_"), "/", "_") // conform with index naming policy above instead to fix this?

Specifically using strings.ReplaceAll(binaryName instead of strings.ReplaceAll(unit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is this suggested change as a forward-port PR: #2006

Copy link
Contributor Author

@joshdover joshdover Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I find doing it this way curious. I believe it will result in multiple streams with the same idKey, which I presumed would be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it will result in multiple streams with the same idKey, which I presumed would be an issue.

I'm not very familiar with the constraints on index names here, but the binary name can be anything that is a valid file name so I could believe we have to do some sanitization/normalization on it.

We did the exact same change for logs on main and 8.6 in https://github.com/elastic/elastic-agent/pull/1845/files so if we decide to change it we should change it in both places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged the backport to fix main quickly. We can still adjust the way this is done if needed.

@cmacknz
Copy link
Member

cmacknz commented Dec 23, 2022

Aha, #1854 was not forward ported to main it was merged directly to 8.6

@cmacknz
Copy link
Member

cmacknz commented Dec 23, 2022

The logs change was forward ported #1845

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-12-23T16:53:14.471+0000

  • Duration: 17 min 35 sec

Test stats 🧪

Test Results
Failed 0
Passed 4749
Skipped 13
Total 4762

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.305% (58/59) 👍
Files 69.268% (142/205) 👍
Classes 69.231% (270/390) 👍
Methods 54.128% (826/1526) 👍
Lines 39.294% (8979/22851) 👍 0.018
Conditionals 100.0% (0/0) 💚

@joshdover
Copy link
Contributor Author

Closing this for now, no reported issues with current implementation and fix was forwardported to main

@joshdover joshdover closed this Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.6.0 Automated backport with mergify bug Something isn't working skip-changelog V2-Architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants