Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk-trace-web): ignore unavailable span network events #2457

Closed

Conversation

kkruk-sumo
Copy link
Contributor

Which problem is this PR solving?

When doing cross-origin XHR/Fetch requests browsers expect Timing-Allow-Origin HTTP header to explicitly allow getting network performance timings. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Timing-Allow-Origin for more.

Right now, on such cross-origin requests, we record all events with negative timings (relative to the span creation time).
It can be often misleading and can contribute to invalid metrics when calculating same event from multiple spans.

Short description of the changes

Network events equal 0 are not recorded.

@kkruk-sumo kkruk-sumo requested a review from a team September 6, 2021 12:37
@codecov
Copy link

codecov bot commented Sep 6, 2021

Codecov Report

Merging #2457 (a7a42bc) into main (7b01cfa) will increase coverage by 0.14%.
The diff coverage is 100.00%.

❗ Current head a7a42bc differs from pull request most recent head cbe049b. Consider uploading reports for the commit cbe049b to get more accurate results

@@            Coverage Diff             @@
##             main    #2457      +/-   ##
==========================================
+ Coverage   92.56%   92.71%   +0.14%     
==========================================
  Files         134      137       +3     
  Lines        4871     4997     +126     
  Branches     1036     1058      +22     
==========================================
+ Hits         4509     4633     +124     
- Misses        362      364       +2     
Impacted Files Coverage Δ
packages/opentelemetry-sdk-trace-web/src/utils.ts 94.93% <100.00%> (+0.03%) ⬆️
...async-hooks/src/AsyncLocalStorageContextManager.ts 100.00% <0.00%> (ø)
...ontext-async-hooks/src/AsyncHooksContextManager.ts 100.00% <0.00%> (ø)
...sync-hooks/src/AbstractAsyncHooksContextManager.ts 97.01% <0.00%> (ø)

@obecny
Copy link
Member

obecny commented Sep 9, 2021

I don't remember who, but it was already like this and it has been changed to explicitly log such information too

@kkruk-sumo
Copy link
Contributor Author

CC @johnbley . I've found the negative/zero performance timings were allowed in #1769 . I'm good with negative timings, but zero often means that the timing is not available.

@johnbley
Copy link
Member

johnbley commented Sep 10, 2021

CC @johnbley . I've found the negative/zero performance timings were allowed in #1769 . I'm good with negative timings, but zero often means that the timing is not available.

And zero also can mean "instantaneous", "was in cache", "not applicable", or "legitimately 0" (e.g., there were no redirects so redirectStart == 0.0, as Chromium does it at least). I wish the spec explicitly said "just leave the property undefined if it wasn't / isn't / can't be measured". As another special case, the spec states that startTime for PerformanceNavigationTiming entries is supposed to be exactly 0, which we definitely want to treat as "correct"/"true".

I think that for this piece of tech, our telemetry should report whatever the browser says with minimal processing, and let downstream/backend/easier-to-modify components deal with normalization/truncation/processing if desired (e.g., throwing away absurd times like "that page took 6 years or -5 hours to load", or declaring that "no redirects happened so the redirect timing is not 0 but non-existent"). Does that not work in your world for some reason?

@m-tu
Copy link

m-tu commented Sep 10, 2021

Unfortunately I don't think we can always be sure that 0 equals missing value. One example is Firefox which for security reasons reduces precision. It is possible that you have sub 2ms or 100ms(or more depending on settings) connect time which will show up as 0. Here is this reproduced in a non localhost page:

firefoxzeros

There are cases where we know that timings will be missing eg:

  • HTTP span always has missing secureConnectionStart
  • Third party requests without Timing-Allow-Origin header have most timings missing

In those cases we can predict that the timings will be missing and sanitize them server side. Although not sending them at all could save some bandwidth. If it is not possible for someone to have that processing maybe make this configurable? That seems a bit of an overkill tho.

@kkruk-sumo
Copy link
Contributor Author

@johnbley @m-tu Thanks for clarification. I was not aware how complex this matter is.
Your idea about handling it on backend side sounds good.
I think, the biggest issue for me was that the 0 is relative to the page time origin, so on some traces from long-running websites I've seen that span event occurred 30 minutes before trace. That was confusing, because I was checking one particular transaction with no knowledge about user session and how the page was opened. But now I don't think it can be easily fixed just by removing 0. Thanks for your help.

@kkruk-sumo kkruk-sumo closed this Sep 16, 2021
@kkruk-sumo kkruk-sumo deleted the ignore_zero_network_event branch September 16, 2021 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants