Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphaned spans using otelhttptrace instrumentation with Context Cancelled error during http.getconn #2855

Open
GrantOllis opened this issue Oct 10, 2022 · 2 comments
Labels

Comments

@GrantOllis
Copy link

GrantOllis commented Oct 10, 2022

When using otelhttptrace with default settings to generate spans, a timeout (Context Cancelled) error occurring during the http.tls child span of the http.getconn appears to prevent the ClientTrace.GotConn function from being called, and so the http.getconn span is never ended and the http.dns, http.connect and http.tls child spans are all orphaned in the trace.

I have simulated the timeout through connecting via a http proxy to a remote port that is blocked by a firewall, resulting in a successful connection but failure to negotiate TLS. This is one of the specific scenarios we're looking to be able to more easily identify with tracing in place. Checking the instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go file it looks like it may be appropriate to end a "parent span" with the same error that a child span was ended with, as it may not be possible to generate meaningful field values for the the structure expected by the clientTracer.gotConn method although some of the values could be propagated from clientTracter.connectDone method in this scenario.

Built with Go 1.18 and running v0.35.1 of otelhttptrace on a Debian11 docker image.

@dmathieu
Copy link
Member

Hi,
Thank you for reporting this. You mention having reproduced the issue on your end.
Could you provide us with a code sample to do the same reproduction?

@GrantOllis
Copy link
Author

GrantOllis commented Oct 13, 2022

Hi, Thank you for reporting this. You mention having reproduced the issue on your end. Could you provide us with a code sample to do the same reproduction?

I reproduced the issue in a corporate environment, where connecting to a blocked address results in the proxy accepting the TCP connection but the firewall silently blocking the remote connection. This results in the TLS leg timing out. To reproduce in a test harness I suppose you could achieve something similar setting up a TCP server to net.Listen() on a port (i.e. 8443), and wait for longer than the http.Client.Timeout after accepting before closing without response. You just need to use a https://localhost:8443 URL in your request.

I'm very new to Go, or else I would have taken a stab at fixing in a Pull request, but I could have a go at reproducing in some self-contained code and share if it will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Needs triage
Development

No branches or pull requests

3 participants