Skip to content

Update all deps (take 2) and make new release#849

Merged
DanTulovsky merged 15 commits into
mainfrom
update_deps.2
Sep 10, 2025
Merged

Update all deps (take 2) and make new release#849
DanTulovsky merged 15 commits into
mainfrom
update_deps.2

Conversation

@DanTulovsky
Copy link
Copy Markdown
Contributor

@DanTulovsky DanTulovsky commented May 12, 2025

Description: Update all deps (take 2)

This is currently blocked on:

mgechev/revive#1342

And is blocking:

https://github.com/lightstep/lightstep/pull/64194

@DanTulovsky DanTulovsky requested a review from akehlenbeck May 12, 2025 19:10
@DanTulovsky DanTulovsky changed the title Update all deps (take 2) Update all deps (take 2) and make new release May 12, 2025
@DanTulovsky DanTulovsky requested a review from a team May 12, 2025 19:12
@akehlenbeck
Copy link
Copy Markdown
Contributor

I spent an hour just now trying an alternative approach. Right now we declare our tool dependencies (like on golangci-lint) in an empty tools.go file, which causes all the dependencies of those tools to be pulled in via go.mod. But we don't really care about all that: all we care about is that the tool works. So what I tried was changing the Makefile rule to go install the tools at specific versions instead of go building them. It sorta worked! ...

... But then I hit subsequent errors that I don't think you've reached yet. It seems that some of the unittests in this repo are just outright busted. I suspect that's because some otel defaults have changed underneath us (based on reading the tea leaves of the ambiguous error message) but I haven't actually managed to make it work yet, so.

@akehlenbeck
Copy link
Copy Markdown
Contributor

The test failures I am hitting are variations of

2025/05/13 14:28:14 setup error: failed to create span exporter: component type mismatch: component ID "otel_sdk_trace_otlp" does not have type "otelarrow"
FAIL	github.com/lightstep/otel-launcher-go/examples/metrics	0.008s

but they actually started happening with the previous PR (#848). I have not yet managed to figure out what it wants: the strings in that error message are difficult to grep and there are so many layers of FactoryProviderExporters that I haven't yet figured out what line of code is even throwing that error.

@akehlenbeck
Copy link
Copy Markdown
Contributor

The test failures I am hitting are variations of

2025/05/13 14:28:14 setup error: failed to create span exporter: component type mismatch: component ID "otel_sdk_trace_otlp" does not have type "otelarrow"
FAIL	github.com/lightstep/otel-launcher-go/examples/metrics	0.008s

but they actually started happening with the previous PR (#848). I have not yet managed to figure out what it wants: the strings in that error message are difficult to grep and there are so many layers of FactoryProviderExporters that I haven't yet figured out what line of code is even throwing that error.

The lightstep tracer had a latent bug: it appears that it was trying to use otelarrow even if the settings had disabled arrow. The code in otel-launcher-go/lightstep/sdk/trace/exporters/otlp/otelcol/client.go is roughly this:

func NewExporter(ctx context.Context, cfg Config, opts ...func(options *ExporterOptions)) (trace.SpanExporter, error) {
        ...
        if !cfg.Exporter.Arrow.Disabled {
                c.settings.ID = component.NewID(component.MustNewType("otel_sdk_trace_arrow"))
        } else {
                c.settings.ID = component.NewID(component.MustNewType("otel_sdk_trace_otlp"))
        }
        ...
        // I think this line is the bug: we're trying to use an otelarrow factory even
        // if cfg.Exporter.Arrow.Disabled was true above.
        exp, err := otelarrowexporter.NewFactory().CreateTraces(ctx, c.settings, &cfg.Exporter)
        if err != nil {
                return nil, err
        }
        ...
}

Prior to your previous #848, that latent bug went undetected. But PR 848 pulled in this upstream change open-telemetry/opentelemetry-collector#12381 which is stricter about catching such configuration mismatches and caused our tests to start failing.

I think this is probably a real bug: if cfg.Exporter.Arrow.Disabled is true, it sure sounds like we shouldn't be using an otelarrowexporter.Factory, just based on the names of those things. I haven't yet chased down what to replace it with though.

@DanTulovsky
Copy link
Copy Markdown
Contributor Author

I wonder if it's easier for us to just stop using the launcher in our code :/

lines := readMetricsEndpoint(t.T())

return slices.Contains(lines, `request_size_bucket{job="tester",property="value",service_name="tester",le="0"} 1`)
return slices.Contains(lines, `request_size_bucket{job="tester",otel_scope_name="test-meter",otel_scope_schema_url="",otel_scope_version="",property="value",service_name="tester",le="+Inf"} 1`)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one changed le="0" to le="+Inf" .. I think the other fields are just additions due to update of prom libraries..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one changed le="0" to le="+Inf" .. I think the other fields are just additions due to update of prom libraries..

Was it failing consistently or flakily? I think "0" is correct, but also I think it doesn't really matter if the goal of the test is just to verify that there's some output. (If there's a count of 1 in the "le=0" bucket then there's necessarily a count of 1 in the "le=Inf" bucket as well, and both indicate that the counter.Add call did something... which I think is the goal of this test.) LGTM in any case.

lines := readMetricsEndpoint(t.T())

return slices.Contains(lines, `requests{job="tester",property="value",service_name="tester"} 12`)
return slices.Contains(lines, `requests{job="tester",otel_scope_name="test-meter",otel_scope_schema_url="",otel_scope_version="",property="value",service_name="tester"} 12`)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just additions due to upgrade of libraries

lines := readMetricsEndpoint(t.T())

return slices.Contains(lines, `request_size_bucket{job="tester",property="value",service_name="tester",le="0"} 1`)
return slices.Contains(lines, `request_size_bucket{job="tester",otel_scope_name="test-meter",otel_scope_schema_url="",otel_scope_version="",property="value",service_name="tester",le="+Inf"} 1`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one changed le="0" to le="+Inf" .. I think the other fields are just additions due to update of prom libraries..

Was it failing consistently or flakily? I think "0" is correct, but also I think it doesn't really matter if the goal of the test is just to verify that there's some output. (If there's a count of 1 in the "le=0" bucket then there's necessarily a count of 1 in the "le=Inf" bucket as well, and both indicate that the counter.Add call did something... which I think is the goal of this test.) LGTM in any case.

@DanTulovsky DanTulovsky merged commit f852d5f into main Sep 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants