tracing: fix Datadog span name#29932
Conversation
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
zuercher
left a comment
There was a problem hiding this comment.
Please update changeslogs/current.yaml with your release note. Otherwise this looks good.
zuercher
left a comment
There was a problem hiding this comment.
Also, please add a test in test/extensions/tracers/datadog. I think you could probably check this in the existing ExtractionSuccess test in tracer_test.cc.
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
|
Thanks, @zuercher, that was quick! I'll update |
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
|
Ok, changelog updated. Once CI accepts this (it might need a poke) and it's approved, I'd like to revisit the question of onto which release series this change will appear. I don't know the process for requesting a patch to the current release series. |
Co-authored-by: Stephan Zuercher <zuercher@gmail.com> Signed-off-by: David Goffredo <dmgoffredo@gmail.com>
zuercher
left a comment
There was a problem hiding this comment.
RBE is having a sad, so CI is busted but otherwise this looks good.
|
/retest |
….com:DataDog/envoy into david.goffredo/tracing-fix-datadog-span-name
|
I just pushed some changes that merge recent additions to |
|
I just pushed some changes that merge yet more additions to @zuercher Thanks again for your review. How do we move this PR forward? |
|
/retest |
|
We need the tests to pass. I restarted the TSAN test and hopefully it will pass. |
|
Hey, I'm not familiar with envoy releases. Can I see somewhere in which version this fix is or will be available? We wanted to update to Istio 1.19 but the change in span names does block us. |
|
if you look at the commit you can see which branches its on currently its only on @florianmutter if you want to raise backports for other release branches (1.24-7) i would be happy to review - it would then be included in any backported branches when they release, which is probably quite soon |
Backport c3646f9 to 1.26 Additional testing: Also, I ran the sources/extensions/tracers/datadog/demo both with and without these changes. Verified that the produced span's "operation name" before these changes is not as desired. Verified that the produced span's "operation name" after these changes is as desired. Desired: "Operation name" is "envoy.proxy", and "resource name" is the operation_name passed to startSpan. Risk Level: low Testing: See the modified unit test. Docs Changes: n/a Release Notes: updated Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
|
I created the first backport PR for 1.26 here: #30182 |
Backport c3646f9 to 1.27 Additional testing: Also, I ran the sources/extensions/tracers/datadog/demo both with and without these changes. Verified that the produced span's "operation name" before these changes is not as desired. Verified that the produced span's "operation name" after these changes is as desired. Desired: "Operation name" is "envoy.proxy", and "resource name" is the operation_name passed to startSpan. Risk Level: low Testing: See the modified unit test. Docs Changes: n/a Release Notes: updated Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Backport c3646f9 Additional testing: Also, I ran the sources/extensions/tracers/datadog/demo both with and without these changes. Verified that the produced span's "operation name" before these changes is not as desired. Verified that the produced span's "operation name" after these changes is as desired. Desired: "Operation name" is "envoy.proxy", and "resource name" is the operation_name passed to startSpan. Risk Level: low Testing: See the modified unit test. Docs Changes: n/a Release Notes: updated Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Backport c3646f9 Additional testing: Also, I ran the sources/extensions/tracers/datadog/demo both with and without these changes. Verified that the produced span's "operation name" before these changes is not as desired. Verified that the produced span's "operation name" after these changes is as desired. Desired: "Operation name" is "envoy.proxy", and "resource name" is the operation_name passed to startSpan. Risk Level: low Testing: See the modified unit test. Docs Changes: n/a Release Notes: updated Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
|
I tested this fix with Envoy 1.27.2, as part of the resolution of #30235. The operation name has been fixed from |
|
@hiddewie Yes, the handling of special tag names was removed in the new Datadog tracing library because it is no longer based on OpenTracing. However, as far as I can see, "resource.name" is not part of OpenTracing's semantic conventions, and so removing it was a change in Datadog behavior. I can either restore the special meaning of the "resource.name" tag as another patch, or find another solution. Out of curiosity, what is your particular use case for setting the "resource.name"? |
|
OK, thanks, interesting.
We set an internal header in the Envoy configuration responsible for routing the requests (there is Lua involved), to a functional resource name (basically sets of similar paths that have a functional name). This internal header is used in the When there is only a single |
Then making "resource.name" work with I'll prioritize restoring the "resource.name" tag behavior as a bug fix. |
- If a trace sampling decision is extracted from request headers, use it regardless of Envoy's sampling configuration. - Refactor Tracer::startSpan to avoid the move assignment operator of datadog::tracing::Span, which is deleted in newer versions of dd-trace-cpp. - Alter the comments above Tracer::startSpan to more accurately reflect changes made previously in envoyproxy#29932. Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
|
I'm confused. We did update to Istio 1.20.0. Looking at this we do now use envoy 1.28. Looking at this should contain the changed span name but we still don't seeenvoy.proxy being used as span name. Here is a screenshot:![]() The same trace with Istio 1.17.5 that includes envoy 1.25 shows the same trace like this: ![]() Do you have any idea what that could be? |
|
Hi, I believe the final patch was merged last week for both envoy versions 1.27 and 1.28. These patches should be incorporated into the upcoming Envoy patch builds. Ideally, they will be released prior to the next Istio patch build to ensure their inclusion. I asked yesterday whether this issue has been resolved, but it looks like my question only contributed to further confusion. My main concern is that DataDog tracing remains partially broken in both Istio versions |
|
@njegosrailic Thank you very much. I was not aware of this. |
|
Thanks, @njegosrailic. @florianmutter, it is surprising that the operation name of the span you screenshotted is not as expected. One way to verify the versions of things is to look at the trace in all of its undecorated glory via the JSON API, i.e. In the case of the operation name bug, the precise Envoy version is what is important (the fix did not involve Datadog's core tracing library). |
|
Thanks, @dgoffredo. Today, we tested the latest
< "name": "test_app_target_edgegateway_http.test_app.svc.cluster.local_80",
---
> "name": "envoy.proxy",
< "resource": "egress istio-test-app-target-edgegateway-http-c3.service.testzone.internal",
< "resource_hash": "423a0107c5aec597",
---
> "resource": "test-app-target-edgegateway-http.test-app.svc.cluster.local:80/*",
> "resource_hash": "647e5ccae10c6628",
< "version": "envoy f9707e29aa0a36b1430b373d95e6c9abb5deca75/1.27.3-dev/Clean/RELEASE/BoringSSL",
---
> "version": "envoy ea8f1b7c160f265a3fcfaab33446b8f5de98ca11/1.24.8-dev/Clean/RELEASE/BoringSSL",
< "_dd.p.dm": "-1",
< "_dd.tracer_version": "v0.1.8",
---
> "_dd.tracer_version": "v1.2.1", |
|
@njegosrailic That's very helpful. < "version": "envoy f9707e29aa0a36b1430b373d95e6c9abb5deca75/1.27.3-dev/Clean/RELEASE/BoringSSL",
---
> "version": "envoy ea8f1b7c160f265a3fcfaab33446b8f5de98ca11/1.24.8-dev/Clean/RELEASE/BoringSSL",New Istio's Envoy is on top, and old Istio's Envoy is on bottom. The "version" names a git tree in istio/proxy, not in the upstream Envoy. The tree named by the output of the newer Istio, So, the change to operation name shown elsewhere in your diff is unexpected. Please convey this issue to Datadog Support. They will collect configuration information, and we can take it from there. |
|
Ah, I found a clue. Here is some code from the Envoy that is part of the newer Istio (from mine and Njegos's comments above): I subsequently modified the behavior of that function as part of the "resource.name" fix, but that modification is not yet available to Istio. Perhaps the unexpected "operation name" is due not to span creation, but due to a subsequent call to This I could confirm with more specifics about your Istio setup. Please proceed with Datadog Support, and share the new case number with me when you have it. |
I attached two spans of a similar trace. I removed all other spans. The first span The second span Istio gateway span: {
"trace_id": "6214739122545778382",
"span_id": "6214739122545778382",
"parent_id": "0",
"start": 1701244677.028113,
"end": 1701244677.184877,
"duration": 0.156763803,
"type": "web",
"service": "istio-ingressgateway-iap.istio-system",
"name": "gateway.instore.svc.cluster.local_8080_api",
"resource": "egress instore.internal.example.com",
"resource_hash": "f17c52a6c152467c",
"host_id": 563956795,
"hostname": "gke-development-pool3-41f15120-m6ns.europe-west4-a.c.puc-d-ngshop-gke-0228.internal",
"env": "development",
"host_groups": [
"env:development"
],
"meta": {
"istio.mesh_id": "cluster.local",
"version": "envoy 580bf9cc561a012d4eb14af48d496d6748307e59/1.28.1-dev/Clean/RELEASE/BoringSSL",
"http.protocol": "HTTP/2",
"http.status_code": "200",
"runtime-id": "afd3478a-2e0a-413b-a8d1-74ae1360d99c",
"response_flags": "-",
"http.useragent": "axios/1.6.2",
"zone": "europe-west4",
"_dd.p.dm": "-1",
"process_id": "14",
"http.url_details.host": "instore.internal.example.com",
"language": "cpp",
"istio.namespace": "istio-system",
"http.method": "POST",
"upstream_cluster.name": "outbound|8080||gateway.instore.svc.cluster.local",
"peer.address": "10.239.100.136",
"istio.canonical_service": "istio-ingressgateway-iap",
"_dd.agent_hostname": "gke-development-pool3-41f15120-m6ns.europe-west4-a.c.puc-d-ngshop-gke-0228.internal",
"_dd.agent_version": "7.49.0",
"http.url_details.path": "/api/servicepoint/stock-info/click-and-reserve",
"http.url": "https://instore.internal.example.com/api/servicepoint/stock-info/click-and-reserve?tenant=PUC_DE",
"http.path_group": "/api/servicepoint/stock-info/click-and-reserve",
"request_size": "139",
"http.host": "instore.internal.example.com",
"node_id": "router~10.239.66.6~istio-ingressgateway-iap-867b5b4c79-mzht6.istio-system~istio-system.svc.cluster.local",
"_dd.agent_rare_sampler.enabled": "false",
"istio.canonical_revision": "latest",
"_dd.tracer_version": "v0.1.8",
"component": "proxy",
"http.url_details.scheme": "https",
"downstream_cluster": "-",
"guid:x-request-id": "52d096ff-0b0c-9df6-a33e-fbd156869bf4",
"ddtags": "ingestion_reason:auto",
"http.url_details.queryString.tenant": "PUC_DE"
},
"metrics": {
"_dd.agent_errors_sampler.target_tps": 10.0,
"_dd.agent_psr": 1.0,
"_trace_root": 1.0,
"response_size": 74.0,
"_sampling_priority_v1": 1.0,
"_top_level": 1.0,
"_dd.agent_priority_sampler.target_tps": 10.0
},
"ingestion_reason": "auto",
"metadata": {
"sds_info": []
},
"children_ids": [
"2310811855767820061"
]
}Istio sidecar span {
"trace_id": "6214739122545778382",
"span_id": "1767944568465430022",
"parent_id": "5261554865300988373",
"start": 1701244677.057804,
"end": 1701244677.179114,
"duration": 0.121309735,
"type": "web",
"service": "fulfill.order",
"name": "order.fulfill.svc.cluster.local_50051",
"resource": "ingress",
"resource_hash": "9498ccf48d93a1bf",
"host_id": 563547440,
"hostname": "gke-development-pool3-cd9ee151-7dsz.europe-west4-c.c.puc-d-ngshop-gke-0228.internal",
"env": "development",
"host_groups": [
"env:development"
],
"meta": {
"upstream_cluster.name": "inbound|50051||",
"zone": "europe-west4",
"http.status_code": "200",
"downstream_cluster": "-",
"http.useragent": "grpc-java-netty/1.59.0",
"istio.mesh_id": "cluster.local",
"http.url_details.host": "order.fulfill",
"language": "cpp",
"http.url_details.scheme": "http",
"_dd.agent_version": "7.49.0",
"_dd.tracer_version": "v0.1.8",
"component": "proxy",
"istio.namespace": "fulfill",
"http.url_details.path": "/fulfill.order.OrderService/CreateNewRetailOrder",
"grpc.authority": "order.fulfill:50051",
"grpc.path": "/fulfill.order.OrderService/CreateNewRetailOrder",
"version": "rev-d0eb43b93f93612ce20b31bb7ce4e45d83975145",
"guid:x-request-id": "b10e2e1f-4d3b-939e-bd6a-974d35c31184",
"istio.canonical_service": "order",
"node_id": "sidecar~10.239.67.58~order-6cdf99b645-lb582.fulfill~fulfill.svc.cluster.local",
"grpc.status_code": "0",
"peer.address": "10.239.67.87",
"grpc.content_type": "application/grpc",
"http.protocol": "HTTP/2",
"http.url": "http://order.fulfill:50051/fulfill.order.OrderService/CreateNewRetailOrder",
"ddtags": "ingestion_reason:auto",
"_dd.p.dm": "-1",
"env": "development",
"http.url_details.port": "50051",
"runtime-id": "5193c159-e218-44f1-8750-34d1d454fcd9",
"http.path_group": "/fulfill.order.OrderService/CreateNewRetailOrder",
"response_flags": "-",
"request_size": "347",
"_dd.agent_hostname": "gke-development-pool3-cd9ee151-7dsz.europe-west4-c.c.puc-d-ngshop-gke-0228.internal",
"istio.canonical_revision": "latest",
"process_id": "23",
"_dd.agent_rare_sampler.enabled": "false",
"http.host": "order.fulfill",
"http.method": "POST"
},
"metrics": {
"_top_level": 1.0,
"_dd.agent_priority_sampler.target_tps": 10.0,
"response_size": 39.0,
"_dd.agent_errors_sampler.target_tps": 10.0,
"_sampling_priority_v1": 1.0
},
"ingestion_reason": "auto",
"metadata": {
"sds_info": []
},
"children_ids": [
"6764890222525977566"
]
} |


Commit Message
tracing: fix Datadog span name
Additional Description
In a series of pull requests beginning last year, I modified Envoy's Datadog tracing extension to use a new library instead of OpenTracing.
Those changes included a bug.
Previously, a Datadog span produced by Envoy had its "operation name" hard-coded to "envoy.proxy". The
operation_nameparameter to the OpenTracing adapter'sstartSpanthen effectively became the span's "resource name," because of how operation name and resource name were managed by the OpenTracing-based Datadog tracing library.My changes instead set the span's operation name to the
operation_nameparameter ofstartSpan.Those changes were released with envoy
v1.26.0v1.27.0.The result is that upgrading Envoy to
v1.26v1.27 changed spans that used to be:into instead:
To remedy this, this pull request contains changes that restore the old behavior.
If these changes are accepted, I would appreciate if they would be included in the current minor release series as a bug fix patch.
Risk Level
low
Testing
See the modified unit test.
Also, I ran the
sources/extensions/tracers/datadog/demoboth with and without these changes. Verified that the produced span's "operation name" before these changes is not as desired. Verified that the produced span's "operation name" after these changes is as desired.Desired: "Operation name" is "envoy.proxy", and "resource name" is the
operation_namepassed tostartSpan.Docs Changes
none
Release Notes
Recommended wording:
Platform Specific Features
none