Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of cross-origin request network timings #3199

Open
t2t2 opened this issue Aug 25, 2022 · 2 comments
Open

Handling of cross-origin request network timings #3199

t2t2 opened this issue Aug 25, 2022 · 2 comments
Assignees
Labels
bug Something isn't working priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect

Comments

@t2t2
Copy link
Contributor

t2t2 commented Aug 25, 2022

What happened?

Steps to Reproduce

When a cross-origin request is done (eg. example.com -> resources.vendor.com, but also example.com -> static.example.com or even localhost:3000(a webpack server) -> localhost:8080 (API server)), the PerformanceResourceTiming generated for these doesn't include most of the values by default

Note: When CORS is in effect, many of these values are returned as zero unless the server's access policy permits these values to be shared. This requires the server providing the resource to send the Timing-Allow-Origin HTTP response header with a value specifying the origin or origins which are allowed to get the restricted timestamp values.

The properties which are returned as 0 by default when loading a resource from a domain other than the one of the web page itself: redirectStart, redirectEnd, domainLookupStart, domainLookupEnd, connectStart, connectEnd, secureConnectionStart, requestStart, and responseStart.

MDN

Currently otel/sdk-trace-web addSpanNetworkEvents doesn't have any checks for this, which means 0 values get passed to span.addEvent, where timeorigin (time at which page loaded) gets added, causing times that look normal but are usually before span start:

documentLoad
span start time:       1661419670581.9
fetchStart:            1661419670581.9
unloadEventStart:      1661419670378.3
unloadEventEnd:        1661419670378.3
domInteractive:        1661419671104.4

resourceFetch  (to another subdomain)
span start:            1661419671106.4

fetchStart:            1661419671106.4
domainLookupStart:     1661419670378.3
domainLookupEnd:       1661419670378.3
connectStart:          1661419670378.3
secureConnectionStart: 1661419670378.3
connectEnd:            1661419670378.3
requestStart:          1661419670378.3
responseStart:         1661419670378.3
responseEnd:           1661419671153.8

This sort of data isn't easy to filter out when processing is done on span level (there's no info of what is 0 value in browser), meaning simplest way to generate the request phases chart (eg. responseEnd - responseStart to calculate response duration) generating longer timings than makes sense:

image

Expected Behavior

Don't include values that are known to be unavailable in this case, so analysis tools could more easily handle this case differently (This might be considered a breaking change that people would be against? so I decided to first open this issue to suggest this instead of going straight to PR)

If there's too much resistance for changing current behaviour this could also be something speced out during RUM SIG and fix implemented when instrumentation follows those specs (kinda touching #3174), but as user of current instrumentation in signalfx/splunk would rather fix the current situation

@t2t2 t2t2 added bug Something isn't working triage labels Aug 25, 2022
@scheler
Copy link
Contributor

scheler commented Aug 26, 2022

This looks helpful i support this change. Is there any process currently for introducing breaking changes? For example, this change could be based on an InstrumentationConfig option and you could give notice to folks via changeling/ release notes of the npm package.

@legendecas
Copy link
Member

legendecas commented Aug 26, 2022

Is there any process currently for introducing breaking changes?

I don't think there is any, as long as we have a good reason to make the breaking change. I believe if the current 0-valued events are meaningless in CORS requests, it should be ok for us to not generate these events -- they are not working as expected anyways.

@dyladan dyladan added priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect and removed triage labels Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
Projects
None yet
Development

No branches or pull requests

4 participants