feat: Add otelcol.auth.google client auth provider#5526
Conversation
|
cc @kalleep |
7d13d21 to
4e35bdf
Compare
9b11da4 to
0fce6c3
Compare
|
I'm running into an issue I can't figure out how to solve, and could use some help. I need the google client auth to be started before the OTLP exporter, but I can't seem to figure out how to make that happen. I'm trying this configuration: I added a panic when the auth extension successfully starts: As you can see, the |
10f8517 to
6d76e5b
Compare
6d76e5b to
9068885
Compare
|
Test failures seem unrelated to this PR. I still haven't been able to figure out #5526 (comment) |
Yeah seems like a flaky test.
Yes this is a known problem in alloy, I have #5613 up to fix both order on startup and shutdown. But I am not entirely sure this would solve your problem either because start would run in it's own gorutine, and looking at it I am pretty sure this is a problem for other auth extentions as well, need to think a bit about this. |
|
Sounds good. I'll do some testing on top of #5613 if I have time. |
|
I tested this on top of your patch, and verified that it mostly fixes the issue. The only remaining issue seems to be that components are started in parallel, so it can sometimes still race at startup. I implemented a workaround in 778b653, but I suspect there is probably a better way to do that. |
Just took a quick look at the code managing auth extensions and we do re-recreate the component on each update call, but will delay Start until Run is triggered and that's where the race happens. Maybe we need to do some kind of workaround like the one you added for all auth extentions, e.g. trigger Start on update instead of Run. This also only seems to be a problem when alloy start the first time and not if config is reloaded. I will set aside some time today and experiment a bit with the wrapper auth component and make sure this is correct for other auth extentions we have. |
|
I am working on updating our otel dependencies and with the update I hit issues around scheduling. So made the fix in that pr. So when that one is merged and scheduling ordering it should work as expected |
### Pull Request Details #### Relevant change log entries from v0.143.0 to v0.147.0 - feat(otelcol.auth.basic): - Add `username_file` and `password_file` options to client_auth config, enabling file-based credentials with live rotation via file watching. - feat(otelcol.auth.oauth2): - Support jwt-bearer grant-type - feat(otelcol.exporter.debug): - Add `output_paths` configuration option to control output destination when `use_internal_logger` is false. - feat(otelcol.exporter.file): - Add support for rotation when group_by is enabled in file exporter - feat(otelcol.exporter.kafka): - Add `conn_idle_timeout` configuration option to control when idle connections are not reused and may be closed. - Make `max_message_bytes` and `flush_max_messages` unconditional in franz-go producer. Changed `flush_max_messages` default from 0 to 10000 to match franz-go default. - feat(otelcol.exporter.loadbalancing): - Support metrics routing by attributes in the loadbalancing exporter - feat(otelcol.processor.k8sattributes): - Added `container.image.tags` resource attribute with feature gate controls according to OpenTelemetry semantic conventions. - feat(otelcol.processor.filter): - Upstream deprecated legacy filter blocks `traces`, `metrics`, and `logs`. - Use inferred-context condition blocks instead: `trace_conditions`, `metric_conditions`, and `log_conditions`. - feat(otelcol.processor.resourcedetection): - Add Oracle realm resource attribute support for Oracle Cloud detector - Added Tencent Cloud CVM resource detector to the Resource Detection Processor - Add `tags_from_imds` config option to EC2 detector to control instance tag retrieval method - Add support for GCP resource detector to gather GCE instance labels as resource attributes - Added Alibaba Cloud ECS resource detector to the Resource Detection Processor - feat(otelcol.processor.tail_sampling): - New policy type to return the opposite of the sampling decision of a wrapped policy. - Add trace_flags policy - Provide option to limit maximum trace size kept in memory by the tail sampling processor - Provide an option, `decision_wait_after_root_received`, to make quicker decisions after a root span is received. - feat(otelcol.receiver.awscloudwatch): - Add support for filtering log groups by account ID. - feat(otelcol.receiver.datadog): - Add support for handling the /api/v0.2/stats endpoint to receive and process APM trace stats payloads from the Datadog Agent. - feat(otelcol.receiver.filelog): - Suppress repeated permission-denied errors - gzip files are auto detected based on their header - feat(otelcol.receiver.kafka): - Add `conn_idle_timeout` configuration option to control when idle connections are not reused and may be closed. - feat(otelcol.processor.filter): - Upstream deprecated legacy filter blocks `traces`, `metrics`, and `logs`. - Use inferred-context condition blocks instead: `trace_conditions`, `metric_conditions`, and `log_conditions`. - feat(otelcol.receiver.syslog): - Add facility_text attribute to syslog parser output - fix(otelcol.auth.oauth2): - Make token refresh context-aware - Fix oauth2clientauth client-credentials grant type - fix(otelcol.connector.count): - Basic config should emit default metrics - fix(otelcol.exporter.datadog): - OTLP logs now support array type attributes. Arrays containing primitive values or nested maps are now correctly preserved in the log output. - Fix data race in the Datadog exporter which could cause a crash with error message "concurrent map iteration and map write". - fix(otelcol.exporter.debug): - add queue configuration - fix(otelcol.exporter.kafka): - Add `conn_idle_timeout` configuration option to control when idle connections are not reused and may be closed. - fix(otelcol.exporter.loadbalancing): - Change default timeout for k8s resolver from 1s to 1m to reduce excessive Kubernetes API server load. - Fix k8s resolver parsing so loadbalancing exporter works with service FQDNs - fix(otelcol.exporter.syslog): - Update the timestamp when using the RFC 3164 formatter to space-pad the day of month for single digit days - fix(otelcol.processor.cumulativetodelta): - Fix memory blowup in exponential histogram delta conversion - fix(otelcol.processor.deltatocumulative): - Fix panic when processing exponential histograms with empty bucket counts - fix(otelcol.processor.k8sattributes): - Fix concurrent map access panic by cloning pod labels and annotations before extraction. - Allow key_regex to work without tag_name by using the default tag name format - Fix k8s.node.uid extraction when node.name is disabled - fix(otelcol.processor.resourcedetection): - IRSA and Pod Identity tokens are checked to determine if running within an EKS cluster - fix(otelcol.processor.tail_sampling): - Properly remove trace id from its original batch when using `decision_wait_after_root_received` - fix(otelcol.receiver.awscloudwatch): - Use the oldest log timestamp as the next poll start time to prevent logs from being ignored - fix(otelcol.receiver.awss3): - Fix data loss when SQS messages contain multiple S3 object notifications and some fail to process - fix(otelcol.receiver.datadog): - Fix service check endpoint to handle both array and single object payloads - fix(otelcol.receiver.faro): - Updates Faroreceiver to return HTTP 202 Accepted status code upon successful data ingestion to comply with the OpenAPI specification. - fix(otelcol.receiver.filelog): - Fixed encoding not being applied to multiline pattern matching - fix(otelcol.receiver.fluentforward): - handle uint64 to int64 overflow by changing to string - fix(otelcol.receiver.googlecloudpubsub): - Fix compression detection when both encoding and compression are set in the config - fix(otelcol.receiver.kafka): - Fix deprecated field migration logic for metrics, traces, and profiles topic configuration - fix(otelcol.storage.file): - Handle filename too long error in file storage extension by using the sha256 of the attempted filename instead. - BREAKING-CHANGE(otelcol.processor.resourcedetection): - Upstream changed resourcedetection so `gcp.resource_attributes.faas.id` is no longer a supported config option - Changed cloud platform value for Azure EKS from `azure_eks` to `azure.eks` to align with OpenTelemetry semantic conventions v1.39.0. - BREAKING-CHANGE(otelcol.processor.tail_sampling): - The deprecated invert decisions are disabled by default. - BREAKING-CHANGE(otelcol.receiver.kafka): - Remove `default_fetch_size` configuration option ### Issue(s) fixed by this Pull Request Prepare for #5748 ### Notes to the Reviewer 1. I had a really hard time handling promethues dependency. To make it possible to update I had to move to the version `promethuesreciever` was using. It was a bit newer than we had before but there was not really that much changes. And to use that version we still need our fork so I created a branch from the commit used by otel collector and cherry picked @kgeckhart fix to that branch. * ExtraMetrics can now be configured per scrape target _and_ be reloaded. Currently We have it as a global setting and pass to all targets. 2. Add a specialized scheduler for auth extensions and do not re-use the generic otel one we have. We must call `Start` on auth extensions before we export them because it's used to setup state that will be shared by e.g. `RoundTripper`. This was discovered in #5526 but became a bigger problem with these updates, e.g. basic auth now do periodic reload when e.g. `password_file` is used. ### PR Checklist <!-- Remove items that do not apply. For completed items, change [ ] to [x]. --> - [x] Documentation added - [x] Tests updated - [x] Config converters updated --------- Co-authored-by: Bejal Lewis <164711649+blewis12@users.noreply.github.com> Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
|
We have now merged both prs. So if you could rebase on main it should work now, do you follow the same release schedule as upstream otel, if so you should add v0.147.0 to alloy :) |
|
Excellent. I'll rebase in the next day or two. |
0faa0c2 to
7f68e9a
Compare
|
rebased, and removed the workaround for auth startup ordering. |
kalleep
left a comment
There was a problem hiding this comment.
LGTM, you just have to format one of the files. We have a code freeze during release but I expect that to be removed within a day or two. So onece the file is correctly formated and a new Alloy version is release I will merge this pr
|
I pushed a commit with the format fix |
Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
313a0e2 to
6a202e3
Compare
|
Great work. This took a bit more time than anticipated, thanks for your patience! |
### Brief description of Pull Request Add the opentelemetry collector's `googleclientauthextension` as `otelcol.auth.google`. ### Issue(s) fixed by this Pull Request Fixes #5368 ### PR Checklist - [x] Documentation added - [x] Tests updated - [x] Config converters updated --------- Co-authored-by: Clayton Cornell <131809008+clayton-cornell@users.noreply.github.com>
Brief description of Pull Request
Add the opentelemetry collector's
googleclientauthextensionasotelcol.auth.google.Issue(s) fixed by this Pull Request
Fixes #5368
PR Checklist