Skip to content

[Heartbeat] Add managed status reporter at monitor factory level#41077

Merged
emilioalvap merged 2 commits intoelastic:mainfrom
emilioalvap:hb-status-reporter
Oct 4, 2024
Merged

[Heartbeat] Add managed status reporter at monitor factory level#41077
emilioalvap merged 2 commits intoelastic:mainfrom
emilioalvap:hb-status-reporter

Conversation

@emilioalvap
Copy link
Contributor

Proposed commit message

Add status reporting for monitors when running under elastic-agent, this will allow the Fleet UI to reflect theres an issue with one or more heartbeat integrations.
image

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Build agentbeat locally with:
 DEV=true SNAPSHOT=true  PLATFORMS=linux/amd64 mage package
  1. Build elastic-agent locally with:
DEV=true SNAPSHOT=true PLATFORMS=linux/amd64 PACKAGES=docker mage package
  1. Enroll a non-complete elastic-agent into a private location policy with a browser monitor assigned.
  2. Check agent status is eventually reported as degraded and the integration marked as failed.

@emilioalvap emilioalvap added enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team backport-skip Skip notification from the automated backport with mergify labels Oct 2, 2024
@emilioalvap emilioalvap requested a review from a team as a code owner October 2, 2024 14:59
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 2, 2024
@emilioalvap
Copy link
Contributor Author

cc @lucabelluccini

p.Jobs = []jobs.Job{func(event *beat.Event) ([]jobs.Job, error) {
// if statusReporter is set, as it is for running managed-mode, update the input status
// to failed, specifying the error
m.updateStatus(status.Failed, fmt.Sprintf("monitor could not be started: %s, err: %s", m.stdFields.ID, fullErr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this

	// Failed is status describing unit is failed. This status should
	// only be used in the case the beat should stop running as the failure
	// cannot be recovered.

Could this cause other HB to stop and also other monitors from running? Is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this cause other HB to stop and also other monitors from running?

It probably won't (it doesn't, as of now). Even if that were the case, since the status is scoped at monitor level, it should only filter the failed integrations, but I'm speculating here. There are also multiple status layers, this change only affects the stream (not even the integration) status.
As for the status, either failed or degraded should achieve the same purpose, I'm open to discussion on the implications. I leaned on failed because the type of error that is caught on this part is generally not recoverable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried if it could stop the other monitors. But if thats not the case, I am not super inclined towards changing this.

}

// SetStatusReporter
func (m *Monitor) SetStatusReporter(statusReporter status.StatusReporter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its set on the Monitor level, what happens if multiple monitors were configured, does the errors get accumulated or there is upper limit to how its shown in the UI ?

Copy link
Contributor Author

@emilioalvap emilioalvap Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every monitor will map 1:1 to an agent integration, which Fleet UI already shows individually:
image

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vigneshshanmugam
Copy link
Member

@emilioalvap Do you intent to add changelog entry?

@emilioalvap emilioalvap enabled auto-merge (squash) October 4, 2024 15:22
@emilioalvap emilioalvap added backport-8.15 Automated backport to the 8.15 branch with mergify backport-8.x Automated backport to the 8.x branch with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Oct 4, 2024
@emilioalvap emilioalvap merged commit c70d2d8 into elastic:main Oct 4, 2024
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
)

* [Heartbeat] Add status reporting for monitors when running under elastic-agent

(cherry picked from commit c70d2d8)
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
)

* [Heartbeat] Add status reporting for monitors when running under elastic-agent

(cherry picked from commit c70d2d8)
emilioalvap added a commit to emilioalvap/beats that referenced this pull request Oct 4, 2024
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
(cherry picked from commit efb563c)

# Conflicts:
#	heartbeat/monitors/monitor.go
#	heartbeat/monitors/monitor_test.go
mergify bot pushed a commit that referenced this pull request Oct 4, 2024
(cherry picked from commit efb563c)

# Conflicts:
#	heartbeat/monitors/monitor.go
#	heartbeat/monitors/monitor_test.go
emilioalvap added a commit to emilioalvap/beats that referenced this pull request Oct 4, 2024
emilioalvap added a commit that referenced this pull request Oct 10, 2024
…auto-merge #41077 (#41133)

* [Heartbeat] Fix linting issues introduced by auto-merge #41077 (#41128)

* Manual merge

* [Heartbeat] Add status reporter at monitor factory level (#41077)

---------

Co-authored-by: Emilio Alvarez Piñeiro <95703246+emilioalvap@users.noreply.github.com>
Co-authored-by: emilioalvap <emilio.alvarezpineiro@elastic.co>
emilioalvap added a commit that referenced this pull request Oct 16, 2024
…uto-merge #41077 (#41134)

* [Heartbeat] Fix linting issues introduced by auto-merge #41077 (#41128)

(cherry picked from commit efb563c)

# Conflicts:
#	heartbeat/monitors/monitor.go
#	heartbeat/monitors/monitor_test.go

* Merge conflicts

* [Heartbeat] Add status reporter at monitor factory level

* Add unit test and changelog

---------

Co-authored-by: Emilio Alvarez Piñeiro <95703246+emilioalvap@users.noreply.github.com>
Co-authored-by: emilioalvap <emilio.alvarezpineiro@elastic.co>
@khushijain21 khushijain21 mentioned this pull request Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants