Skip to content

[extension/opamp] Implement ReportsHealth capability#35488

Merged
evan-bradley merged 14 commits into
open-telemetry:mainfrom
bacherfl:feat/35433/opampextension-reportshealth
Oct 17, 2024
Merged

[extension/opamp] Implement ReportsHealth capability#35488
evan-bradley merged 14 commits into
open-telemetry:mainfrom
bacherfl:feat/35433/opampextension-reportshealth

Conversation

@bacherfl
Copy link
Copy Markdown
Contributor

@bacherfl bacherfl commented Sep 30, 2024

Description: This PR adds the ReportsHealth capability to the OpAMP extension.

Link to tracking Issue: #35433

Testing: Add unit test to check the configuration being used for the OpAMP client. Also tested manually by running the supervisor and verifying the health messages are received there

Documentation: Describe the new option in the readme

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Comment thread extension/opampextension/factory.go
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
@bacherfl bacherfl marked this pull request as ready for review September 30, 2024 12:32
@bacherfl bacherfl requested a review from a team as a code owner September 30, 2024 12:32
Copy link
Copy Markdown
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be a complete implementation of this functionality? The healthcheckv2extension aggregates component status (via the componentstatus.Watcher interface) and produces a nested representation of collector health. See the "verbose example" here. This representation is structurally similar to the ComponentHealth message for OpAMP. I think we can extract the status aggregator from the healthcheckv2 extension, use it in the OpAMP extension, and convert AggregateStatuses into ComponentHealth messages. I believe work to extract the aggregator is underway as part of: #34692.

@bacherfl
Copy link
Copy Markdown
Contributor Author

bacherfl commented Oct 4, 2024

Is this meant to be a complete implementation of this functionality? The healthcheckv2extension aggregates component status (via the componentstatus.Watcher interface) and produces a nested representation of collector health. See the "verbose example" here. This representation is structurally similar to the ComponentHealth message for OpAMP. I think we can extract the status aggregator from the healthcheckv2 extension, use it in the OpAMP extension, and convert AggregateStatuses into ComponentHealth messages. I believe work to extract the aggregator is underway as part of: #34692.

Thanks for the input @mwear - this implementation is a rudimentary version of the health report to just indicate whether the agent is up or not (i kept the implementation close to how it is implemented in the supervisor). However, when the component health information from the healthcheckv2 extension becomes available to be used here as well using this here makes a lot of sense - I can also leave this PR as a draft for now and then adapt it to populate the component health map once #34692 has been resolved

@mwear
Copy link
Copy Markdown
Member

mwear commented Oct 4, 2024

@bacherfl, if this is helpful as is, we can move forward with it so long as we have an issue describing the end result we want.

Copy link
Copy Markdown
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request, but I'm fine getting a basic implementation in since it inches us closer to proper health reporting.

Comment thread extension/opampextension/opamp_agent.go Outdated
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
# Conflicts:
#	extension/opampextension/go.mod
#	extension/opampextension/go.sum
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Comment thread extension/opampextension/opamp_agent.go Outdated
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Copy link
Copy Markdown
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as a first pass. The next step will be using status reporting.

# Conflicts:
#	extension/opampextension/go.mod
#	extension/opampextension/go.sum
@evan-bradley evan-bradley merged commit d8cad1c into open-telemetry:main Oct 17, 2024
@github-actions github-actions Bot added this to the next release milestone Oct 17, 2024
@evan-bradley
Copy link
Copy Markdown
Contributor

I've captured the next step here, which uses component status reporting to improve the health information we report: #35856.

sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants