Skip to content

feat: Add otelcol.auth.google client auth provider#5526

Merged
kalleep merged 10 commits into
grafana:mainfrom
dashpole:googleclientauth
Mar 31, 2026
Merged

feat: Add otelcol.auth.google client auth provider#5526
kalleep merged 10 commits into
grafana:mainfrom
dashpole:googleclientauth

Conversation

@dashpole
Copy link
Copy Markdown
Contributor

@dashpole dashpole commented Feb 13, 2026

Brief description of Pull Request

Add the opentelemetry collector's googleclientauthextension as otelcol.auth.google.

Issue(s) fixed by this Pull Request

Fixes #5368

PR Checklist

  • Documentation added
  • Tests updated
  • Config converters updated

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 13, 2026

CLA assistant check
All committers have signed the CLA.

@dashpole dashpole changed the title Add otelcol.auth.google client auth provider feat: Add otelcol.auth.google client auth provider Feb 13, 2026
@dashpole
Copy link
Copy Markdown
Contributor Author

cc @kalleep

@kalleep kalleep marked this pull request as ready for review February 13, 2026 14:56
@kalleep kalleep requested review from a team and clayton-cornell as code owners February 13, 2026 14:56
@dashpole dashpole force-pushed the googleclientauth branch 2 times, most recently from 9b11da4 to 0fce6c3 Compare February 18, 2026 18:40
@dashpole
Copy link
Copy Markdown
Contributor Author

I'm running into an issue I can't figure out how to solve, and could use some help. I need the google client auth to be started before the OTLP exporter, but I can't seem to figure out how to make that happen. I'm trying this configuration:

tracing {
	// Sample all traces. This value should be lower for production configs!
	sampling_fraction = 1

	write_to = [otelcol.exporter.otlphttp.google.input]
}

otelcol.exporter.otlphttp "google" {
	client {
		endpoint = "telemetry.googleapis.com"
		auth     = otelcol.auth.google.creds.handler
	}
}

otelcol.auth.google "creds" {
	project = "dashpole-dev"
}

I added a panic when the auth extension successfully starts:

ts=2026-02-18T18:44:52.791487865Z level=info boringcrypto_enabled=false
ts=2026-02-18T18:44:52.780691855Z level=info source=/go/pkg/mod/github.com/!kim!machine!gun/automemlimit@v0.7.4/memlimit/memlimit.go:175 msg="memory is not limited, skipping" package=github.com/KimMachineGun/automemlimit/memlimit
ts=2026-02-18T18:44:52.791586579Z level=info msg="no peer discovery configured: both join and discover peers are empty" service=cluster
ts=2026-02-18T18:44:52.791607459Z level=info msg="running usage stats reporter"
ts=2026-02-18T18:44:52.791613605Z level=info msg="starting complete graph evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8
ts=2026-02-18T18:44:52.791638958Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=otel duration=1.962µs
ts=2026-02-18T18:44:52.791649155Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=livedebugging duration=12.675µs
ts=2026-02-18T18:44:52.791657281Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=otelcol.auth.google.creds duration=163.457µs
ts=2026-02-18T18:44:52.791665418Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=otelcol.exporter.otlphttp.google duration=495.142µs
ts=2026-02-18T18:44:52.791673476Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=tracing duration=46.786µs
ts=2026-02-18T18:44:52.791685121Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=logging duration=213.354µs
ts=2026-02-18T18:44:52.791717352Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=labelstore duration=8.737µs
ts=2026-02-18T18:44:52.791800016Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=remotecfg duration=61.797µs
ts=2026-02-18T18:44:52.791827759Z level=info msg="applying non-TLS config to HTTP server" service=http
ts=2026-02-18T18:44:52.791836121Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=http duration=17.242µs
ts=2026-02-18T18:44:52.791850006Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=ui duration=2.636µs
ts=2026-02-18T18:44:52.791865438Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 node_id=cluster duration=3.927µs
ts=2026-02-18T18:44:52.791875398Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=380a7d5da35b56f60406d43ae83901c8 duration=1.339045ms
ts=2026-02-18T18:44:52.79193023Z level=debug msg="changing node state" service=cluster from=viewer to=participant
ts=2026-02-18T18:44:52.791994442Z level=info msg="scheduling loaded components and services"
ts=2026-02-18T18:44:52.792273761Z level=error msg="failed to start scheduled component" component_path=/ component_id=otelcol.exporter.otlphttp.google err="not started"
ts=2026-02-18T18:44:52.792291686Z level=debug msg="finished node evaluation" controller_path=/ controller_id="" node_id=tracing duration=138.264µs
ts=2026-02-18T18:44:52.792321074Z level=info msg="starting cluster node" service=cluster peers_count=0 peers="" advertise_addr=127.0.0.1:12345 minimum_cluster_size=0 minimum_size_wait_timeout=0s
ts=2026-02-18T18:44:52.792314184Z level=error msg="failed to start scheduled component" component_path=/ component_id=otelcol.exporter.otlphttp.google err="not started"
ts=2026-02-18T18:44:52.792371835Z level=info msg="failed to register collector with remote server" service=remotecfg id=8c8cdab3-5d7b-4e81-8224-6b7ac8c31487 name="" err="noop client"
ts=2026-02-18T18:44:52.792393695Z level=error msg="failed to start scheduled component" component_path=/ component_id=otelcol.exporter.otlphttp.google err="not started"
ts=2026-02-18T18:44:52.79245562Z level=info msg="peers changed" service=cluster peers_count=1 min_cluster_size=0 peers=452a81310b8b
ts=2026-02-18T18:44:52.792480855Z level=error msg="node exited with error" node=otelcol.exporter.otlphttp.google err="no components started successfully: not started; not started; not started"
ts=2026-02-18T18:44:52.792275092Z level=debug msg="finished node evaluation" controller_path=/ controller_id="" node_id=otelcol.exporter.otlphttp.google duration=160.248µs
panic: GOT TOKEN

As you can see, the otelcol.exporter.otlphttp.google component attempts to start 3 times before giving up, and only then is the auth extension started.

Comment thread docs/sources/reference/components/otelcol/otelcol.auth.google.md
Comment thread docs/sources/reference/components/otelcol/otelcol.auth.google.md Outdated
Comment thread docs/sources/reference/components/otelcol/otelcol.auth.google.md
Comment thread docs/sources/reference/components/otelcol/otelcol.auth.google.md Outdated
Comment thread docs/sources/reference/components/otelcol/otelcol.auth.google.md
Comment thread docs/sources/reference/components/otelcol/otelcol.auth.google.md Outdated
Comment thread internal/component/otelcol/auth/google/google_test.go Outdated
@dashpole
Copy link
Copy Markdown
Contributor Author

Test failures seem unrelated to this PR. I still haven't been able to figure out #5526 (comment)

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Feb 27, 2026

Test failures seem unrelated to this PR

Yeah seems like a flaky test.

I still haven't been able to figure out #5526 (comment)

Yes this is a known problem in alloy, I have #5613 up to fix both order on startup and shutdown.

But I am not entirely sure this would solve your problem either because start would run in it's own gorutine, and looking at it I am pretty sure this is a problem for other auth extentions as well, need to think a bit about this.

@dashpole
Copy link
Copy Markdown
Contributor Author

dashpole commented Mar 2, 2026

Sounds good. I'll do some testing on top of #5613 if I have time.

@dashpole
Copy link
Copy Markdown
Contributor Author

dashpole commented Mar 9, 2026

I tested this on top of your patch, and verified that it mostly fixes the issue.

The only remaining issue seems to be that components are started in parallel, so it can sometimes still race at startup. I implemented a workaround in 778b653, but I suspect there is probably a better way to do that.

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Mar 10, 2026

The only remaining issue seems to be that components are started in parallel, so it can sometimes still race at startup. I implemented a workaround in 778b653, but I suspect there is probably a better way to do that.

Just took a quick look at the code managing auth extensions and we do re-recreate the component on each update call, but will delay Start until Run is triggered and that's where the race happens.

Maybe we need to do some kind of workaround like the one you added for all auth extentions, e.g. trigger Start on update instead of Run. This also only seems to be a problem when alloy start the first time and not if config is reloaded.

I will set aside some time today and experiment a bit with the wrapper auth component and make sure this is correct for other auth extentions we have.

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Mar 16, 2026

I am working on updating our otel dependencies and with the update I hit issues around scheduling. So made the fix in that pr. So when that one is merged and scheduling ordering it should work as expected

kalleep added a commit that referenced this pull request Mar 24, 2026
### Pull Request Details
#### Relevant change log entries from v0.143.0 to v0.147.0
- feat(otelcol.auth.basic):
- Add `username_file` and `password_file` options to client_auth config,
enabling file-based credentials with live rotation via file watching.
- feat(otelcol.auth.oauth2):
  - Support jwt-bearer grant-type
- feat(otelcol.exporter.debug):
- Add `output_paths` configuration option to control output destination
when `use_internal_logger` is false.
- feat(otelcol.exporter.file):
  - Add support for rotation when group_by is enabled in file exporter
- feat(otelcol.exporter.kafka):
- Add `conn_idle_timeout` configuration option to control when idle
connections are not reused and may be closed.
- Make `max_message_bytes` and `flush_max_messages` unconditional in
franz-go producer. Changed `flush_max_messages` default from 0 to 10000
to match franz-go default.
- feat(otelcol.exporter.loadbalancing):
  - Support metrics routing by attributes in the loadbalancing exporter
- feat(otelcol.processor.k8sattributes):
- Added `container.image.tags` resource attribute with feature gate
controls according to OpenTelemetry semantic conventions.
- feat(otelcol.processor.filter):
- Upstream deprecated legacy filter blocks `traces`, `metrics`, and
`logs`.
- Use inferred-context condition blocks instead: `trace_conditions`,
`metric_conditions`, and `log_conditions`.
- feat(otelcol.processor.resourcedetection):
- Add Oracle realm resource attribute support for Oracle Cloud detector
- Added Tencent Cloud CVM resource detector to the Resource Detection
Processor
- Add `tags_from_imds` config option to EC2 detector to control instance
tag retrieval method
- Add support for GCP resource detector to gather GCE instance labels as
resource attributes
- Added Alibaba Cloud ECS resource detector to the Resource Detection
Processor
- feat(otelcol.processor.tail_sampling):
- New policy type to return the opposite of the sampling decision of a
wrapped policy.
  - Add trace_flags policy
- Provide option to limit maximum trace size kept in memory by the tail
sampling processor
- Provide an option, `decision_wait_after_root_received`, to make
quicker decisions after a root span is received.
- feat(otelcol.receiver.awscloudwatch):
  - Add support for filtering log groups by account ID.
- feat(otelcol.receiver.datadog):
- Add support for handling the /api/v0.2/stats endpoint to receive and
process APM trace stats payloads from the Datadog Agent.
- feat(otelcol.receiver.filelog):
  - Suppress repeated permission-denied errors
  - gzip files are auto detected based on their header
- feat(otelcol.receiver.kafka):
- Add `conn_idle_timeout` configuration option to control when idle
connections are not reused and may be closed.
- feat(otelcol.processor.filter):
- Upstream deprecated legacy filter blocks `traces`, `metrics`, and
`logs`.
- Use inferred-context condition blocks instead: `trace_conditions`,
`metric_conditions`, and `log_conditions`.
- feat(otelcol.receiver.syslog):
  - Add facility_text attribute to syslog parser output

- fix(otelcol.auth.oauth2):
  - Make token refresh context-aware
  - Fix oauth2clientauth client-credentials grant type
- fix(otelcol.connector.count):
  - Basic config should emit default metrics
- fix(otelcol.exporter.datadog):
- OTLP logs now support array type attributes. Arrays containing
primitive values or nested maps are now correctly preserved in the log
output.
- Fix data race in the Datadog exporter which could cause a crash with
error message "concurrent map iteration and map write".
- fix(otelcol.exporter.debug):
  - add queue configuration
- fix(otelcol.exporter.kafka):
- Add `conn_idle_timeout` configuration option to control when idle
connections are not reused and may be closed.
- fix(otelcol.exporter.loadbalancing):
- Change default timeout for k8s resolver from 1s to 1m to reduce
excessive Kubernetes API server load.
- Fix k8s resolver parsing so loadbalancing exporter works with service
FQDNs
- fix(otelcol.exporter.syslog):
- Update the timestamp when using the RFC 3164 formatter to space-pad
the day of month for single digit days
- fix(otelcol.processor.cumulativetodelta):
  - Fix memory blowup in exponential histogram delta conversion
- fix(otelcol.processor.deltatocumulative):
- Fix panic when processing exponential histograms with empty bucket
counts
- fix(otelcol.processor.k8sattributes):
- Fix concurrent map access panic by cloning pod labels and annotations
before extraction.
- Allow key_regex to work without tag_name by using the default tag name
format
  - Fix k8s.node.uid extraction when node.name is disabled
- fix(otelcol.processor.resourcedetection):
- IRSA and Pod Identity tokens are checked to determine if running
within an EKS cluster
- fix(otelcol.processor.tail_sampling):
- Properly remove trace id from its original batch when using
`decision_wait_after_root_received`
- fix(otelcol.receiver.awscloudwatch):
- Use the oldest log timestamp as the next poll start time to prevent
logs from being ignored
- fix(otelcol.receiver.awss3):
- Fix data loss when SQS messages contain multiple S3 object
notifications and some fail to process
- fix(otelcol.receiver.datadog):
- Fix service check endpoint to handle both array and single object
payloads
- fix(otelcol.receiver.faro):
- Updates Faroreceiver to return HTTP 202 Accepted status code upon
successful data ingestion to comply with the OpenAPI specification.
- fix(otelcol.receiver.filelog):
  - Fixed encoding not being applied to multiline pattern matching
- fix(otelcol.receiver.fluentforward):
  - handle uint64 to int64 overflow by changing to string
- fix(otelcol.receiver.googlecloudpubsub):
- Fix compression detection when both encoding and compression are set
in the config
- fix(otelcol.receiver.kafka):
- Fix deprecated field migration logic for metrics, traces, and profiles
topic configuration
- fix(otelcol.storage.file):
- Handle filename too long error in file storage extension by using the
sha256 of the attempted filename instead.

- BREAKING-CHANGE(otelcol.processor.resourcedetection):
- Upstream changed resourcedetection so
`gcp.resource_attributes.faas.id` is no longer a supported config option
- Changed cloud platform value for Azure EKS from `azure_eks` to
`azure.eks` to align with OpenTelemetry semantic conventions v1.39.0.
- BREAKING-CHANGE(otelcol.processor.tail_sampling):
  - The deprecated invert decisions are disabled by default.
- BREAKING-CHANGE(otelcol.receiver.kafka):
  - Remove `default_fetch_size` configuration option

### Issue(s) fixed by this Pull Request

Prepare for #5748

### Notes to the Reviewer
1. I had a really hard time handling promethues dependency. To make it
possible to update I had to move to the version `promethuesreciever` was
using. It was a bit newer than we had before but there was not really
that much changes. And to use that version we still need our fork so I
created a branch from the commit used by otel collector and cherry
picked @kgeckhart fix to that branch.
* ExtraMetrics can now be configured per scrape target _and_ be
reloaded. Currently We have it as a global setting and pass to all
targets.

2. Add a specialized scheduler for auth extensions and do not re-use the
generic otel one we have. We must call `Start` on auth extensions before
we export them because it's used to setup state that will be shared by
e.g. `RoundTripper`. This was discovered in
#5526 but became a bigger problem
with these updates, e.g. basic auth now do periodic reload when e.g.
`password_file` is used.

### PR Checklist

<!-- Remove items that do not apply. For completed items, change [ ] to
[x]. -->

- [x] Documentation added
- [x] Tests updated
- [x] Config converters updated

---------
Co-authored-by: Bejal Lewis <164711649+blewis12@users.noreply.github.com>
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Mar 25, 2026

We have now merged both prs. So if you could rebase on main it should work now, do you follow the same release schedule as upstream otel, if so you should add v0.147.0 to alloy :)

@dashpole
Copy link
Copy Markdown
Contributor Author

Excellent. I'll rebase in the next day or two.

@dashpole
Copy link
Copy Markdown
Contributor Author

rebased, and removed the workaround for auth startup ordering.

Copy link
Copy Markdown
Contributor

@kalleep kalleep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, you just have to format one of the files. We have a code freeze during release but I expect that to be removed within a day or two. So onece the file is correctly formated and a new Alloy version is release I will merge this pr

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Mar 27, 2026

I pushed a commit with the format fix

@kalleep kalleep merged commit da99a66 into grafana:main Mar 31, 2026
47 of 49 checks passed
@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Mar 31, 2026

Great work. This took a bit more time than anticipated, thanks for your patience!

gaantunes pushed a commit that referenced this pull request Apr 3, 2026
### Brief description of Pull Request

Add the opentelemetry collector's `googleclientauthextension` as
`otelcol.auth.google`.

### Issue(s) fixed by this Pull Request

Fixes #5368

### PR Checklist

- [x] Documentation added
- [x] Tests updated
- [x] Config converters updated

---------

Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

frozen-due-to-age type/docs Docs Squad label across all Grafana Labs repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support the googleclientauth extension

4 participants