Datadog Integration (#3407)#3619
Conversation
* datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation, deployment override failsafes * datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation | final initial-push * changelog entry update * datadog-integration: updated consul-server agent server.config (enable_debug) and telemetry.config update | enable_debug to server.config * curt pr review changes (minus extraConfig templating verification changes) * global.metrics.AgentMetrics -> global.metrics.enableAgentMetrics * dogstatsd and otlp mutually exclusive verification checks * breaking changes now incorporated into consul.validateExtraConfig helper template function as precheck * extraConfig hash updates post merge conflict update * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * update changelog .txt to match new PR number * updated server-statefulset.yaml to correct ad.datadoghq.com/consul.logs annotation to valid single quote string * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * update UDP dogstatsdPort behavior to exclude including a port value if using a kube service address (as determined by user overrides) * update _helpers.tpl consul.ValidateDatadogConfiguration func to account for using 'https' as protocol => should fail * update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul * update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul * correct otlp protocol helpers.tpl check to lower-case the protocol to match the open-telemetry-deployment.yaml behavior * fix server-acl-init command_test.go for datadog token policy - datacenter should have been dc1 * add in server-statefulset bats test for extraConfig validation testing
curtbushko
left a comment
There was a problem hiding this comment.
Thanks Nate! It looks like changes from #3000 have leaked into your PR. Those should be removed as we thought they would be too disruptive for upgrades.
| {{- end }} | ||
| "server": true | ||
| "server": true, | ||
| "leave_on_terminate": true, |
There was a problem hiding this comment.
suggestion: These extra 'leave_on_terminate' and 'autopilot' settings should be removed as they were deemed destructive.
We need to check the other backports as anything from #3000 should not be in release/1.3.x, release/1.2.x and release/1.1.x (1.4.x is fine)
There was a problem hiding this comment.
Corrected as recommended by reverting back to release/1.3.x branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-config-configmap.yamlRe-applied datadog-integration changes into the following files:
charts/consul/templates/server-config-configmap.yaml- Reincorporated
enable_debugintoserver.json(updatesserver-statefulset.yamlconfig-checksum) - Reapplied all datadog and agent metric-related entries into the
telemetry-config.json
- Reincorporated
charts/consul/test/unit/server-statefulset.bats- Updated
config-configmaptests to reflectenable_debugupdate toserver.jsonconfig"server/StatefulSet: adds config-checksum annotation when extraConfig is blank""server/StatefulSet: adds config-checksum annotation when extraConfig is provided""server/StatefulSet: adds config-checksum annotation when extraConfig is updated"
- Updated
| component: server | ||
| spec: | ||
| maxUnavailable: {{ template "consul.pdb.maxUnavailable" . }} | ||
| maxUnavailable: {{ template "consul.server.pdb.maxUnavailable" . }} |
There was a problem hiding this comment.
suggestion: This is also from #3000 and should be dropped.
There was a problem hiding this comment.
Corrected as recommended by reverting back to release/1.3.x branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tplApplied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile - make bats-tests (all passed)
charts/consul/templates/_helpers.tpl
Outdated
| but if it's reduced by more than 1, the quorum size can change so that's why this is now always hardcoded to 1. | ||
| */}} | ||
| {{- define "consul.pdb.maxUnavailable" -}} | ||
| {{- define "consul.server.pdb.maxUnavailable" -}} |
There was a problem hiding this comment.
suggestion: A bunch of changes from #3000 in here
There was a problem hiding this comment.
Corrected as recommended by reverting back to release/1.3.x branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tplApplied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile - make bats-tests (all passed)
charts/consul/test/unit/helpers.bats
Outdated
| } | ||
|
|
||
| @test "connectInject/Deployment: fails if resource-apis is set and admin partitions are enabled" { | ||
| @test "connectInject/Deployment: fails if resource-apis is set, v2tenancy is unset, and admin partitions are enabled" { |
There was a problem hiding this comment.
suggestion: Looks like extra stuff picked up. git checkout 'release/1.3.x' helpers.bats will allow you to reset the file to the branch it is from.
There was a problem hiding this comment.
Corrected as recommended by reverting back to release/1.3.x branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tplApplied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile - make bats-tests (all passed)
| . | tee /dev/stderr | | ||
| yq '.spec.maxUnavailable' | tee /dev/stderr) | ||
| [ "${actual}" = "2" ] | ||
| [ "${actual}" = "1" ] |
There was a problem hiding this comment.
suggestion: This file too
There was a problem hiding this comment.
Corrected as recommended by reverting back to release/1.3.x branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tplApplied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile - make bats-tests (all passed)
… re-apply datadog-integration branch changes
Backport
This cherry-picked PR has been manually generated from #3407 to be assessed for backporting as automatic cherry-picking using the label failed.
The below text is copied from the body of the original PR.
Changes proposed in this PR
enable_debugtelemetry.disable_hostnametelemetry.enable_host_metricstelemetry.prefix_filtertelemetry.dogstatsd_addrtelemetry.dogstatsd_tags/v1/agent/metrics?format=prometheusendpoint/v1/agent/metrics?format=prometheus/v1/agent/self/v1/status/leader/v1/status/peers/v1/catalog/services/v1/health/service/v1/health/state/any/v1/coordinate/datacenters/v1/coordinate/nodesserver-acl-inittoken creation for OpenMetrics and Datadog Consul Integration check methods allowing default minimal acl token permission generation for Datadog agent usage as necessary.How I've tested this PR
CONTRIBUTING.mdsteps.consul-dev(main) andconsul-k8s-control-plane-dev(datadog-integration branch) images on k3d test cluster for each scenario. Test repository here.CONTRIBUTING.mdsteps.bats ./charts/consul/test/unit --jobs 8- ran successfully for all tests.How I expect reviewers to test this PR
Checklist
Overview of commits