-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add receiver for sapm protocol #48
Conversation
e96223b
to
9d24e33
Compare
I think the ci is failing because |
9de6d66
to
477060b
Compare
receiver/sapmreceiver/factory.go
Outdated
// assert config is SAPM config | ||
rCfg := cfg.(*Config) | ||
|
||
if rCfg.IsEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen this before. I think this is supposed to be used internally by the pipeline and not the factory. If a component is disabled, pipeline should never ask it's factory to create a receiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be handled by https://github.com/open-telemetry/opentelemetry-collector/blob/d32cd2a100a49d2dcd34ba4401d50b4ee15adc37/config/config.go#L711
If it is not then we have a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
for _, err := range errs { | ||
fmt.Fprintf(buf, "%s\n", err.Error()) | ||
} | ||
err = errors.New(buf.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otel has a package to combine multiple errors: https://github.com/open-telemetry/opentelemetry-collector/blob/master/oterr/errors.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, I copied that from the jaeger receiver. I'll update to use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't actually have multiple errors any way. It was a bad copy/paste. I've removed it entirely.
477060b
to
97ffff3
Compare
return func(rw http.ResponseWriter, req *http.Request) { | ||
|
||
// trace this request | ||
_, span := trace.StartSpan(sr.defaultCtx, traceSource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be using the per request context? In fact, I think it should use req.Context()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main thing I wanted to talk about
97ffff3
to
d413c31
Compare
@@ -6,6 +6,8 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/exporter/azure | |||
|
|||
replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/collectdreceiver => ./receiver/collectdreceiver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@owais I assume this is no longer needed. Can you please remove it in a separate PR?
@@ -191,6 +191,8 @@ github.com/gogo/protobuf v1.2.2-0.20190723190241-65acae22fc9d/go.mod h1:SlYgWuQ5 | |||
github.com/gogo/protobuf v1.2.2-0.20190730201129-28a6bbf47e48/go.mod h1:SlYgWuQ5SjCEi6WLHjHCa1yvBfUnHcTbrrZtXPKa29o= | |||
github.com/gogo/protobuf v1.3.0 h1:G8O7TerXerS4F6sx9OV7/nRfJdnXgHZu/S/7F2SN+UE= | |||
github.com/gogo/protobuf v1.3.0/go.mod h1:SlYgWuQ5SjCEi6WLHjHCa1yvBfUnHcTbrrZtXPKa29o= | |||
github.com/gogo/protobuf v1.3.1 h1:DqDEcV5aeaTmdFBePNpYsp3FlcVH/2ISVVM9Qf8PSls= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid to consume 2 different versions of the same lib? Which one is actually used by "import" statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really consuming. It's just recording the checksum for the version so it it sees the version again, it'll reject it if the checksum does not match. Probably some dependency is using 1.3.1 and some other 1.3.0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this means that according to "Minimal Version Selection" rule 1.3.1 will win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually 1.3.2 wins. Which exact version of a dependency is used is only dictated by go.mod. go.sum only contains checksums. Relevant line: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/48/files#diff-c391b461a7bdb4bc745d488660866ab0R7
@@ -312,6 +314,8 @@ github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANyt | |||
github.com/influxdata/influxdb v1.7.7/go.mod h1:qZna6X/4elxqT3yI9iZYdZrWWdeFOOprn86kgg4+IzY= | |||
github.com/jaegertracing/jaeger v1.14.0 h1:C0En+gfcxf3NsAriMAvQ6LcSFrQ5VQGXddqfty1EpTI= | |||
github.com/jaegertracing/jaeger v1.14.0/go.mod h1:LUWPSnzNPGRubM8pk0inANGitpiMOOxihXx0+53llXI= | |||
github.com/jaegertracing/jaeger v1.15.1 h1:7QzNAXq+4ko9GtCjozDNAp2uonoABu+B2Rk94hjQcp4= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
go.sum
Outdated
@@ -491,6 +495,8 @@ github.com/shurcooL/vfsgen v0.0.0-20180825020608-02ddb050ef6b/go.mod h1:TrYk7fJV | |||
github.com/shurcooL/vfsgen v0.0.0-20181202132449-6a9ea43bcacd/go.mod h1:TrYk7fJVaAttu97ZZKrO9UbRa8izdowaMIZcxYMbVaw= | |||
github.com/signalfx/com_signalfx_metrics_protobuf v0.0.0-20190530013331-054be550cb49 h1:a6us2VYa4abLd4FG6F3BaGEzuq6WIvloIx3M40ePaJ0= | |||
github.com/signalfx/com_signalfx_metrics_protobuf v0.0.0-20190530013331-054be550cb49/go.mod h1:muYA2clvwCdj7nzAJ5vJIXYpJsUumhAl4Uu1wUNpWzA= | |||
github.com/signalfx/sapm-proto v0.0.1 h1:At9k1ynR74nmW8VMdfcqCFV9MWq1BdUpvYtyxHB3jeA= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make this repo public, otherwise this won't build.
receiver/sapmreceiver/config.go
Outdated
"github.com/open-telemetry/opentelemetry-collector/config/configmodels" | ||
) | ||
|
||
// Config defines configuration for Jaeger receiver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Config defines configuration for Jaeger receiver. | |
// Config defines configuration for SAPM receiver. |
|
||
// extract the port number from string in "address:port" format. If the | ||
// port number cannot be extracted returns an error. | ||
func extractPortFromEndpoint(endpoint string) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make this a utility func available for all components. No need to change anything in this PR, maybe just a TODO note for future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a TODO
receiver/sapmreceiver/factory.go
Outdated
// assert config is SAPM config | ||
rCfg := cfg.(*Config) | ||
|
||
if rCfg.IsEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be handled by https://github.com/open-telemetry/opentelemetry-collector/blob/d32cd2a100a49d2dcd34ba4401d50b4ee15adc37/config/config.go#L711
If it is not then we have a bug.
td.SourceFormat = "sapm" | ||
|
||
// pass the trace data to the next consumer | ||
sr.nextConsumer.ConsumeTraceData(rCtx, td) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ConsumeTraceData
can return an error, it needs to be handled here otherwise we may have silent drops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan Is returning on error appropriate for converting the batches and consuming their trace data? Or should I attempt to process the rest of the batches before returning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 2 options:
- Return ErrBadRequest error to the caller. This will ensure caller does not re-send this anymore.
- Return "Success" but increment local "bad input data" counter and process the rest of batches which are good.
(1) is definitely more appropriate if ProtoBatchToOCProto returns an error because it typically means something is wrong with span data.
(2) may be more appropriate for ConsumeTraceData
errors, but I could argue either way.
To keep it simple I would handle errors from both functions the same way and would just return ErrBadRequest (but make sure we actually return this exact error so that SAPM responds with 400 Bad Data and we don't receive this data again).
return func(rw http.ResponseWriter, req *http.Request) { | ||
|
||
// trace this request | ||
_, span := trace.StartSpan(sr.defaultCtx, traceSource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// set up the listener | ||
ln, err = net.Listen("tcp", sr.config.Endpoint) | ||
if err != nil { | ||
err = fmt.Errorf("failed to bind to address %s: %v", sr.config.Endpoint, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we return
here? Is there a point in continuing and trying to setup the http.Server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a return
Nit: capitalize subject line of commit message. |
After discussing with @owais, I'm going to remove the handler from sapm-proto. I'll leave the request parsing function, but the handler itself will be moved into the otel receiver directly. |
d413c31
to
6f907a8
Compare
83137b5
to
b0bcd97
Compare
@tigrannajaryan @owais I think I've addressed all comments on this pr. I'll update the sapm-proto dependency with a tagged version once this pr is approved: signalfx/sapm-proto#7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. I miss somewhere a description for SAPM (Splunk APM?), best place will be adding a doc.go for the package and describe it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my part, please address remaining comments from other reviewers.
// handle the request payload | ||
err := sr.handleRequest(ctx, req) | ||
if err != nil { | ||
// TODO account for this error (throttled logging or metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a Github issue or JIRA task to track this issue? I want to make sure we don't forget this, otherwise we will run blind when this happens on production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. @tigrannajaryan is there some kind of convention in otel for handling this? I don't think zap has a rate limited logger (at least i couldn't find one in their docs). I hesitate to put a plain log statement there because log files could blow up from a bad actor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm I think i found something in zap. They refer to this as "sampling". I'll have to read their doc further to understand how to use that feature. I've never used zap before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is best to record a metric instead of logging. I believe that is how we expose most of bad things happening in Collector that can happen because of input data.
5eab011
to
98abb65
Compare
I'll continue looking into the logging stuff separately and open another pr to address it. @tigrannajaryan @pjanotti @owais I think this pr is ready to merge pending your final approval. |
98abb65
to
1c7bf31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @codesmith14 LGTM
* Move scope.Active to trace.CurrentSpan * Remove scope / does not build * Global tracer * Checkpoint * Checkpoint * Add key/key.go for key.New * Comments * Remove more EventID and ScopeID * Use Handle to describe static objects * TODOs * Remove empty file * Remove singletons * Update TODOs * TODO about map update * Make stats package option aliases (like key has) * Rename experimental/streaming * streaming SDK builds w/ many TODOs * Get the examples building * Tidy up metric API / add interface check * Remove logic from the registry; this is now a placeholder
* Add additional golden config test cases to file_input
* Update to Collector v0.28.0 Closes open-telemetry#48 Addresses the breaking API change in open-telemetry/opentelemetry-collector#3163, besides the usual version number changes. Signed-off-by: Fangyi Zhou <[email protected]> * Use `go mod tidy` instead of `go mod download` It appears that this magically resolves the go.mod file issue. https://stackoverflow.com/questions/67203641/missing-go-sum-entry-for-module-providing-package-package-name Signed-off-by: Fangyi Zhou <[email protected]>
* Initial commit * Add CODEOWNERS file (open-telemetry#2) * Add CODEOWNERS file * Update CODEOWNERS * Moved from github.com/observatorium/opentelemetry-collector-builder (open-telemetry#3) Signed-off-by: Juraci Paixão Kröhling <[email protected]> * fixed panics (open-telemetry#6) Signed-off-by: Joe Elliott <[email protected]> * Replace master with main in CI and mergify files (open-telemetry#8) Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Bump to OpenTelemetry Collector 0.20.0 (open-telemetry#10) Closes open-telemetry#9 Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Explicitly enable Go modules in quickstart instructions (open-telemetry#13) * Update to collector v0.21.0 (open-telemetry#17) Fixes open-telemetry#16 Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Update to collector v0.22.0 (open-telemetry#19) * Download go modules before building (open-telemetry#20) Fixes open-telemetry#14 * Add version command (open-telemetry#25) Signed-off-by: Ashmita Bohara <[email protected]> * Pass errors from cobra Execute back to main for correct exit code (open-telemetry#28) * pass errors from cobra execute back to main * print the error * Update to collector v0.23.0 (open-telemetry#27) * Generate a warning if the builder and collector base version mismatch (open-telemetry#30) * Generate a warning if the builder and collector base version mismatch * Show current default version in the warning message * Update to OpenTelemetry Collector 0.24.0 * Don't use %w formatting with log.Fatal (open-telemetry#35) * Update to OpenTelemetry Collector 0.25.0 (open-telemetry#36) Signed-off-by: Serge Catudal <[email protected]> * Update to 0.26.0 and update BuildInfo (open-telemetry#39) * Sync build and CI Go versions at latest 1.16 (open-telemetry#34) * Sync build and CI Go versions at latest 1.16 * Run go mod tidy * Set go binary to use in the compilation phase in tests Signed-off-by: Juraci Paixão Kröhling <[email protected]> Co-authored-by: Juraci Paixão Kröhling <[email protected]> * Add option to generate go code only (no compile) (open-telemetry#40) * Issue#24 Add option to generate go code only (no compile) * Update cmd/root.go logging Suggested by @jpkkrohling Co-authored-by: Juraci Paixão Kröhling <[email protected]> * remove verbose help .. created by corba * suggestion by jpkrohling to keep generateandcompile * lint error: remove unused var * reword cmd option and add back help message for default Co-authored-by: Juraci Paixão Kröhling <[email protected]> * Don't reuse exec.Cmd (open-telemetry#42) * Update to OpenTelemetry Collector 0.27.0 (open-telemetry#43) * Add CI Badge (open-telemetry#47) * Update to Collector v0.28.0 (open-telemetry#49) * Update to Collector v0.28.0 Closes open-telemetry#48 Addresses the breaking API change in open-telemetry/opentelemetry-collector#3163, besides the usual version number changes. Signed-off-by: Fangyi Zhou <[email protected]> * Use `go mod tidy` instead of `go mod download` It appears that this magically resolves the go.mod file issue. https://stackoverflow.com/questions/67203641/missing-go-sum-entry-for-module-providing-package-package-name Signed-off-by: Fangyi Zhou <[email protected]> * Account for go mod download in go1.17 not updating go.sum (open-telemetry#50) * Update to collector v0.29.0 (open-telemetry#54) * Update replaces.builder.yaml * Update nocore.builder.yaml * Update config.go * Update README.md * Update main.go * Update to collector v0.30.0 (open-telemetry#57) * cmd: fix module flag default value to github.com/open-telemetry (open-telemetry#58) Signed-off-by: Koichi Shiraishi <[email protected]> * Update to collector v0.31.0 (open-telemetry#60) * Update to v0.33.0 (open-telemetry#62) Signed-off-by: Anthony J Mirabella <[email protected]> * Add excludes support to generated go.mod (open-telemetry#63) Signed-off-by: Anthony J Mirabella <[email protected]> Co-authored-by: Juraci Paixão Kröhling <[email protected]> * Small cleanup for the builder files (open-telemetry#64) Signed-off-by: Bogdan Drutu <[email protected]> * Support building with Go 1.17 (open-telemetry#66) * Support building with Go 1.17 Fixes open-telemetry#65 Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Update workflows to use Go 1.17 Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Add gosec exceptions for exec.Command Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Update to OpenTelemetry core 0.34.0 (open-telemetry#68) Fixes open-telemetry#67 Signed-off-by: Juraci Paixão Kröhling <[email protected]> * Upgrade to OpenTelemetry Collector 0.35.0 (open-telemetry#70) Signed-off-by: Fangyi Zhou <[email protected]> * Upgrade to OpenTelemetry Collector 0.36.0 (open-telemetry#76) * Generate custom service code for Windows (open-telemetry#75) * update main to include windows service code * use main version from tag 0.35.0 * update main function * align with upstream v0.36.0 tag * dummy change to trigger build * Revert "dummy change to trigger build" This reverts commit 629d499461da2d2c240bf1e495b5fe0558e3547f. * Remove Core from Module type (open-telemetry#77) Fixes open-telemetry#15 Signed-off-by: yugo-horie <[email protected]> * release 0.37.0 (open-telemetry#78) * release 0.37.0 * update use of NewCommand * Move builder to subdirectory Signed-off-by: Juraci Paixão Kröhling <[email protected]> Co-authored-by: Bogdan Drutu <[email protected]> Co-authored-by: Bogdan Drutu <[email protected]> Co-authored-by: Joe Elliott <[email protected]> Co-authored-by: Eric Yang <[email protected]> Co-authored-by: Brian Gibbins <[email protected]> Co-authored-by: Ashmita <[email protected]> Co-authored-by: Fangyi Zhou <[email protected]> Co-authored-by: Shaun Creary <[email protected]> Co-authored-by: Patryk Małek <[email protected]> Co-authored-by: Serge Catudal <[email protected]> Co-authored-by: Aaron Stone <[email protected]> Co-authored-by: Patryk Małek <[email protected]> Co-authored-by: Aaron Stone <[email protected]> Co-authored-by: Kelvin Lo <[email protected]> Co-authored-by: Himanshu <[email protected]> Co-authored-by: Y.Horie <[email protected]> Co-authored-by: Koichi Shiraishi <[email protected]> Co-authored-by: Anthony Mirabella <[email protected]> Co-authored-by: Cal Loomis <[email protected]> Co-authored-by: alrex <[email protected]>
No description provided.