[Heartbeat] Add managed status reporter at monitor factory level #41077

vigneshshanmugam · 2024-10-02T17:36:35Z

Since its set on the Monitor level, what happens if multiple monitors were configured, does the errors get accumulated or there is upper limit to how its shown in the UI ?

Every monitor will map 1:1 to an agent integration, which Fleet UI already shows individually:

vigneshshanmugam · 2024-10-02T17:33:04Z

Reading this

// Failed is status describing unit is failed. This status should // only be used in the case the beat should stop running as the failure // cannot be recovered.

Could this cause other HB to stop and also other monitors from running? Is this intended?

Could this cause other HB to stop and also other monitors from running?

It probably won't (it doesn't, as of now). Even if that were the case, since the status is scoped at monitor level, it should only filter the failed integrations, but I'm speculating here. There are also multiple status layers, this change only affects the stream (not even the integration) status.
As for the status, either failed or degraded should achieve the same purpose, I'm open to discussion on the implications. I leaned on failed because the type of error that is caught on this part is generally not recoverable.

I was worried if it could stop the other monitors. But if thats not the case, I am not super inclined towards changing this.

-Original file line number
+Diff line change
@@ Expand Up @@
     - Added status to monitor run log report.
     - Upgrade node to latest LTS v18.20.3. {pull}40038[40038]
     - Add journey duration to synthetics browser events. {pull}40230[40230]
+    - Add monitor status reporter under managed mode. {pull}41077[41077]
     *Metricbeat*
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -34,6 +34,7 @@ import ( @@
     	"github.com/elastic/beats/v7/heartbeat/monitors/wrappers"
     	"github.com/elastic/beats/v7/heartbeat/scheduler"
     	"github.com/elastic/beats/v7/libbeat/beat"
+    	"github.com/elastic/beats/v7/libbeat/management/status"
     )
     // ErrMonitorDisabled is returned when the monitor plugin is marked as disabled.
@@ Expand Down Expand Up / @@ -71,6 +72,12 @@ type Monitor struct { @@
     	stats plugin.RegistryRecorder
     	monitorStateTracker *monitorstate.Tracker
+    	statusReporter      status.StatusReporter
+    }
+    // SetStatusReporter
+    func (m *Monitor) SetStatusReporter(statusReporter status.StatusReporter) {
+    	m.statusReporter = statusReporter
     }
     // String prints a description of the monitor in a threadsafe way. It is important that this use threadsafe
@@ Expand Down Expand Up / @@ -175,6 +182,9 @@ func newMonitorUnsafe( @@
     		logp.L().Error(fullErr)
     		p.Jobs = []jobs.Job{func(event *beat.Event) ([]jobs.Job, error) {
+    			// if statusReporter is set, as it is for running managed-mode, update the input status
+    			// to failed, specifying the error
+    			m.updateStatus(status.Failed, fmt.Sprintf("monitor could not be started: %s, err: %s", m.stdFields.ID, fullErr))
     			return nil, fullErr
     		}}
@@ Expand Down Expand Up / @@ -237,6 +247,7 @@ func (m *Monitor) Start() { @@
     	m.stats.StartMonitor(int64(m.endpoints))
     	m.state = MON_STARTED
+    	m.updateStatus(status.Running, "")
     }
     // Stop stops the monitor without freeing it in global dedup
@@ Expand All / @@ -262,4 +273,11 @@ func (m *Monitor) Stop() { @@
     	m.stats.StopMonitor(int64(m.endpoints))
     	m.state = MON_STOPPED
+    	m.updateStatus(status.Stopped, "")
+    }
+    func (m *Monitor) updateStatus(status status.Status, msg string) {
+    	if m.statusReporter != nil {
+    		m.statusReporter.UpdateStatus(status, msg)
+    	}
     }

-Original file line number
+Diff line change
@@ Expand Up / @@ -18,12 +18,14 @@ @@
     package monitors
     import (
+    	"fmt"
     	"testing"
     	"time"
     	"github.com/stretchr/testify/assert"
     	"github.com/stretchr/testify/require"
+    	"github.com/elastic/elastic-agent-libs/config"
     	conf "github.com/elastic/elastic-agent-libs/config"
     	"github.com/elastic/elastic-agent-libs/mapstr"
     	"github.com/elastic/elastic-agent-libs/monitoring"
@@ Expand All / @@ -32,7 +34,9 @@ import ( @@
     	"github.com/elastic/go-lookslike/testslike"
     	"github.com/elastic/go-lookslike/validator"
+    	"github.com/elastic/beats/v7/heartbeat/monitors/plugin"
     	"github.com/elastic/beats/v7/heartbeat/scheduler"
+    	"github.com/elastic/beats/v7/libbeat/management/status"
     )
     // TestMonitorBasic tests a basic config
@@ Expand Down Expand Up / @@ -131,3 +135,60 @@ func TestCheckInvalidConfig(t *testing.T) { @@
     	require.Error(t, checkMonitorConfig(serverMonConf, reg))
     }
+    type MockStatusReporter struct {
+    	us func(status status.Status, msg string)
+    }
+    func (sr *MockStatusReporter) UpdateStatus(status status.Status, msg string) {
+    	sr.us(status, msg)
+    }
+    func TestStatusReporter(t *testing.T) {
+    	confMap := map[string]interface{}{
+    		"type":     "fail",
+    		"urls":     []string{"http://example.net"},
+    		"schedule": "@every 1ms",
+    		"name":     "myName",
+    		"id":       "myId",
+    	}
+    	conf, err := config.NewConfigFrom(confMap)
+    	require.NoError(t, err)
+    	reg, _, _ := mockPluginsReg()
+    	pipel := &MockPipeline{}
+    	monReg := monitoring.NewRegistry()
+    	mockDegradedPluginFactory := plugin.PluginFactory{
+    		Name:    "fail",
+    		Aliases: []string{"failAlias"},
+    		Make: func(s string, config *config.C) (plugin.Plugin, error) {
+    			return plugin.Plugin{}, fmt.Errorf("error plugin")
+    		},
+    		Stats: plugin.NewPluginCountersRecorder("fail", monReg),
+    	}
+    	reg.Add(mockDegradedPluginFactory)
+    	sched := scheduler.Create(1, monitoring.NewRegistry(), time.Local, nil, true)
+    	defer sched.Stop()
+    	c, err := pipel.Connect()
+    	require.NoError(t, err)
+    	m, err := newMonitor(conf, reg, c, sched.Add, nil, nil)
+    	require.NoError(t, err)
+    	// Track status marked as failed during run_once execution
+    	var failed bool = false
+    	m.SetStatusReporter(&MockStatusReporter{
+    		us: func(s status.Status, msg string) {
+    			if s == status.Failed {
+    				failed = true
+    			}
+    		},
+    	})
+    	m.Start()
+    	sched.WaitForRunOnce()
+    	require.True(t, failed)
+    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Heartbeat] Add managed status reporter at monitor factory level #41077

Uh oh!

Diff view

Diff view

There are no files selected for viewing

vigneshshanmugam Oct 2, 2024

Uh oh!

emilioalvap Oct 3, 2024 •

edited

Loading

Uh oh!

vigneshshanmugam Oct 2, 2024

Uh oh!

emilioalvap Oct 3, 2024

Uh oh!

vigneshshanmugam Oct 3, 2024

Uh oh!

[Heartbeat] Add managed status reporter at monitor factory level #41077

Uh oh!

[Heartbeat] Add managed status reporter at monitor factory level #41077

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

vigneshshanmugam Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

emilioalvap Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vigneshshanmugam Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

emilioalvap Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

vigneshshanmugam Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

emilioalvap Oct 3, 2024 •

edited

Loading