Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cmd): initial integration of OpenTelemetry with OTLP exporter #907

Merged
merged 12 commits into from
Aug 2, 2022

Conversation

Wondertan
Copy link
Member

@Wondertan Wondertan commented Jul 13, 2022

  • Initial integration of OpenTelemetry with JaegerOTLP exporter
  • Adds new flags switches for both tracing and metrics with endpoint customization
    • --tracing and --tracing.endpoint
    • --metrics and --metrics.endpoint
  • Simple tracing coverage for ipld.Retriever
  • Simple metric for current store height

P.S. OTLP integration could be separated from introduced metrics and tracing coverage, but it is the bare minimum needed to manually verify that the integration works as expected by looking at actual data reported by the local nodes.

TODO:

  • Ensure different node types advertise as different services
  • Use OTLP instead of Jaeger
  • Integrate OTLP for metrics
  • Store height metric

As always, recommend looking at the PR commit by commit(CBC)

Substitutes #810
Closes #934

@Wondertan Wondertan added area:ipld IPLD plugin area:metrics Related to measuring/collecting node metrics labels Jul 13, 2022
@Wondertan Wondertan self-assigned this Jul 13, 2022
@Wondertan Wondertan added the kind:feat Attached to feature PRs label Jul 13, 2022
@Wondertan Wondertan force-pushed the hlib/tracing branch 2 times, most recently from 10a4ab9 to 4813d46 Compare July 13, 2022 17:11
@Wondertan Wondertan changed the title feat(cmd): initial integration of OpenTelemetry and Jaeger exporter feat(cmd): initial integration of OpenTelemetry with OTLP exporter Jul 20, 2022
@Wondertan Wondertan added the area:header Extended header label Jul 20, 2022
@Wondertan Wondertan marked this pull request as ready for review July 21, 2022 09:49
@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2022

Codecov Report

Merging #907 (255ed8c) into main (0c2daea) will decrease coverage by 0.23%.
The diff coverage is 41.24%.

@@            Coverage Diff             @@
##             main     #907      +/-   ##
==========================================
- Coverage   58.59%   58.35%   -0.24%     
==========================================
  Files         130      132       +2     
  Lines        7798     7937     +139     
==========================================
+ Hits         4569     4632      +63     
- Misses       2755     2831      +76     
  Partials      474      474              
Impacted Files Coverage Δ
cmd/celestia/full.go 42.37% <0.00%> (ø)
cmd/celestia/light.go 42.37% <0.00%> (ø)
cmd/env.go 76.47% <ø> (ø)
header/metrics.go 0.00% <0.00%> (ø)
node/services/service.go 80.66% <0.00%> (-1.22%) ⬇️
node/settings.go 31.37% <0.00%> (-4.19%) ⬇️
cmd/flags_misc.go 42.59% <36.11%> (-11.80%) ⬇️
cmd/celestia/bridge.go 63.26% <100.00%> (ø)
ipld/retriever.go 93.58% <100.00%> (+3.19%) ⬆️
ipld/retriever_byzantine.go 69.23% <100.00%> (ø)
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

Copy link
Contributor

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Is it possible for me to see this in action? Is there an OTEL collector that has metrics / traces emitted to it?

@Wondertan
Copy link
Member Author

@rootulp, you can try running collector locally with default endpoints/port while passing to Light or Full node new flags described on the PR desc. And you should see the height of the node being reported, etc. To see traces you would need to run Full node

Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It would be nice to get a demo for this some time soon. One question, tracing happens regardless of whether tracing is enabled, correct? --tracing being enabled just adds the default (or custom) exporter endpoint. So if --tracing isn't passed, but tracing still occurs, does it impact performance?

cmd/env.go Show resolved Hide resolved
cmd/flags_misc.go Outdated Show resolved Hide resolved
ipld/retriever.go Outdated Show resolved Hide resolved
@Wondertan Wondertan mentioned this pull request Jul 22, 2022
14 tasks
@Wondertan
Copy link
Member Author

Wondertan commented Jul 22, 2022

One question, tracing happens regardless of whether tracing is enabled, correct? --tracing being enabled just adds the default (or custom) exporter endpoint. So if --tracing isn't passed, but tracing still occurs, does it impact performance?

That's a good question, and I am glad you asked it. I had the same concern and even looked deeper at Otel implementation to find out. My conclusion is that it won't noticeably affect performance. The default tracing exporter does nothing besides keeping information about spans to be delegated if tracing is turned on in runtime(see #937).

The same is true for metrics. The default exporter does nothing besides logic to delegate to some custom exporter that can be enabled in runtime. I could potentially make MonitorHead always called without adding any options for the node to enable it, similar to tracing, but I am still unsure about this. Mainly the confusion here comes from how to marry the global meter pattern with an instance of Store, etc, and the current result looks clean.

@rootulp
Copy link
Contributor

rootulp commented Jul 23, 2022

Screen Shot 2022-07-22 at 10 50 14 PM

Screenshot of a Grafana dashboard featuring the head metric (implemented in this PR). This is for a light node I'm running on AWS which is emitting metrics to an OTEL Collector deployed on the same instance. The OTEL Collector is exporting metrics to Prometheus that is hosted by Grafana Cloud. Setup instructions are in #922

tagging @renaynay because I'm happy to hop on a call and demo this or set it up for you.

@renaynay
Copy link
Member

@rootulp would love to do a demo on how to set it up, thanks!

header/metrics.go Outdated Show resolved Hide resolved
These errors can only happen if the flags are set incorrectly(wrong name, not added to the command, etc) which are bad programmer issues and should be reported as panics accordingly
@Wondertan Wondertan requested review from renaynay and rootulp August 1, 2022 17:06
cmd/flags_misc.go Outdated Show resolved Hide resolved
cmd/flags_misc.go Outdated Show resolved Hide resolved
cmd/flags_misc.go Show resolved Hide resolved
cmd/flags_misc.go Show resolved Hide resolved
cmd/flags_misc.go Outdated Show resolved Hide resolved
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@Wondertan Wondertan merged commit ace2fda into main Aug 2, 2022
@Wondertan Wondertan deleted the hlib/tracing branch August 2, 2022 15:34
@Wondertan Wondertan mentioned this pull request Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:header Extended header area:ipld IPLD plugin area:metrics Related to measuring/collecting node metrics kind:feat Attached to feature PRs
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

cmd: OTLP TLS connection
4 participants