-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: 512 byte DNS response size limit causes "cannot unmarshal DNS" error #51127
Comments
This resolves golang#51127 in the near term by defaulting to a larger buffer size. This is not a permanent fix or implementation of EDNS(0) or [IETF RFC6891](https://datatracker.ietf.org/doc/html/rfc6891). These changes should be reviewed by someone with more experience than I. :) Signed-off-by: Aaron Friel <[email protected]>
Change https://go.dev/cl/384076 mentions this issue: |
Workaround: We were able to work around the problem by adding a DNS entry in the hosts file: |
For what it's worth, there is no generally applicable workaround that fixes users' experience without other side effects and possible downsides. That IP isn't the same IP I see, so I wonder if there's some geographic DNS response occurring. |
previously #11070 |
Even from the linked site, the recommendation for the increased buffer size is for EDNS0 which is not implemented here (ref #6464). Equally important on their site is the support for TCP, and had WSL followed spec and returned a proper truncated response, it would have been retried gracefully. |
I would push back on the notion that this should be resolved elsewhere. Go is the exception to behaving correctly: other userland programs such as dig(1), nslookup(1), host(1), as well as glibc API calls such as getaddrinfo(3) work. I can write Python, C#, Rust, C, etc, and those will work correctly in this networking environment. Go is adhering strictly to an antiquated standard, EDNS0 has been a standard since 1999 and larger responses are not a new specification or the result of rapidly moving network standards or the ground shifting under Go. Strict adherence to 512 byte responses is not followed by other tools in the same ecosystem, Go ought to "be liberal in what it accepts", within reason and of course, unless doing so would violate memory safety or other safety criteria of the software. End-users are not in a position to solve their upstream DNS server's issues, nor are software maintainers. We don't have control over our end user's DNS servers. This error isn't unique to the situation I described, it's just most acute right now for those users in the specific scenario I documented. 112 issues have been reported on GitHub with the text "cannot unmarshal DNS", and a survey of those shows that they have occurred across all platforms and among extraordinarily widely used pieces of software across Mac, Windows, *nix. Those issues show that various other VPN providers, ISPs, routers, have all behaved similarly. And going back to the earlier points, users don't have control over those things and we shouldn't expect all Go software users to be software engineers or to be able to modify their DNS configuration. Lastly, I strongly believe that software that works is superior to software that does not, and end-users of the software will not care what link in the chain is causing it not to work. There is an opportunity to mitigate an issue end-users are facing in one place, I think bringing Golang into alignment with the rest of the ecosystem will positively impact users. |
Thanks for the report.
Why not? It's been a while since I've read DNS RFCs, but my impression is still today that DNS servers are not allowed to send >512-byte responses unless the client explicitly indicates support for such using EDNS. As such, I feel like emphasizing "pre-1999" is unfair. I think Microsoft should update their DNS server to adhere to the DNS specification. I'd prefer we don't add hacks to accommodate non-spec behavior. However, #6464 remains open if someone wants to update Go's DNS client to use EDNS, and to support+advertise a larger buffer size. I think that's the standards-conforming way to address this issue, if folks aren't willing to wait on the issue being fixed in WSL2. |
Hey @mdempsky I would like this re-opened please. Any way we could get on a call to chat about this? |
Just for reference, there is an (currently open) issue over at WSL that should cover this issue microsoft/WSL#7642. I'd suggest adding your findings there as well. |
Understood, though I'd like to chat with someone on the Go language team about the scope & impact of this issue. It's affecting customers of major Go language-built software & has for about seven years. It's particularly acute because, I suspect, none of the players wants to take responsibility for fixing this. End users do not care why their software is broken, but we have an opportunity here to address, at least partially, thousands of issues raised by users over the past 7 years. And if the Pareto principle is applicable here, I suspect those users knowledgeable enough and motivated enough to comment on GitHub are just a fraction of those impacted. |
Why? What do you hope these requests would accomplish? As stated, the Go DNS client is spec-compliant to the best of my knowledge, and a feature request issue (#6464) already exists that I believe would make it more accommodating to non-compliant DNS software like WSL2. It just needs someone to implement it. I'm happy to review CLs. |
With all due respect, Go is a open source project and I think that your best bet to get a desired change through isn't via a private call with a maintainer. |
Other languages & libraries use larger buffers and accept larger responses in order to "be liberal in what they accept" to tolerate non-compliant implementations, and a concerted effort by a consortium of DNS implementations and stakeholders pushed for a larger acceptable buffer size in 2020, more than two decades after that specification was accepted. And end users do not care why their software does not work. I think a phone call might be a better channel to have an empathetic conversation over the issues I've read & the litany of closed/unsolved issues reported against packages on GitHub, StackOverflow, and elsewhere. Otherwise, I can keep replying, but I don't see any responses to my points on the merits so far. I would like to raise the bar from this text-based conversation to one that's more empathetic toward end-users. I think we should try, here, to solve customer, end-user problems. |
We've identified two ways to do that already: have WSL2 fix their DNS server (microsoft/WSL#7642), or implement #6464. |
Would anything break by using a larger default buffer for responses? I think that's what glibc does, and as observed previously I think Go is an outlier here among languages & libraries in not tolerating a larger response. |
There is something I don't understand here. Apparently some DNS server is out of spec by sending packets greater than 512 bytes without setting the truncated bit. But it can't be the regular Microsoft server, or Go programs on all systems would be reporting problems, not just programs on WSL. Does WSL run a local name server? What is the nameserver entry in resolv.conf? What happens if you change it to 8.8.8.8 or 1.1.1.1? CC @jstarks for WSL issue. |
@ianlancetaylor First, you're right, the WSL2 DNS server is out of spec. No question there. Second, let's take a step back - this isn't a WSL2 specific issue. Fixing the acute issue users are facing in WSL2 is WSL2 specific, but I'd encourage you to read the many, many comments on GitHub issues. https://github.com/search?o=asc&q=%22cannot+unmarshal+DNS%22&s=created&type=Issues Starting with these issues which predate WSL2. I'm using a red circle to indicate that a user's problem was never solved, a yellow circle to indicate that a workaround was implemented to mitigate customer issues, but didn't root cause them, and a green circle when a project that is actually a DNS server solved the issue. I'm also using GitHub Markdown's list notation to provide partially unfurled data about the link destination via just pasting in URLs. Consul
Confd
Docker
Kubernetes
Weave
rakyll/drive / odeke-em/drive
Mesos, again
Resolvable, a Docker DNS resolver
Goproxy
Moby / then Docker
freegeoip
heroku
clair
Docker for Mac
gorush application server
Docker for Mac
|
I think that software that works is better than software that doesn't work, and if a partial mitigation before EDNS0 support lands in Go would have prevented these issues, shouldn't it have been done? How many frustrated users is too many? That's just the first two pages of results from the GitHub issues. I'll continue tomorrow. |
Change https://go.dev/cl/385035 mentions this issue: |
@AaronFriel Can you or someone else with WSL see if https://go.dev/cl/385035 fixes the problem? That CL uses EDNS(0) to advertise a permitted packet size of 1232 bytes. Although I have to say that if there are DNS servers out there that incorrectly send responses larger than 512 bytes in the absence of an EDNS(0) packet length, then I suspect that there are DNS servers out there that will simply ignore the EDNS(0) packet length and send whatever packet size they feel like. So I don't know how much this will actually help. |
@ianlancetaylor I can, with great enthusiasm, report that your CL causes the test case to pass in the issue. 🎉🎉🎉🎉🎉 It took me a bit to figure out how to check out the CL - I used the base64 encoded blob, not sure if that's the easiest way to do it - but I did build Go locally. And the result of running the test command is starkly different. Go 1.17.6:
With patch applied:
I rebuilt the Pulumi toolchain that a user reported this error on and which I was able to reproduce, and I can confirm that issue is mitigated as well. I anticipate this would resolve issues for our friends and colleagues in the infra-as-code ecosystem, as well as anyone else using Go tooling to manage or authenticate with Azure, and likely many of issues folks experienced with non-conforming DNS resolvers out of their control due to being part of a proxy, VPNs, their ISP's routers or otherwise. If this could be included in the next dot release of Go, I would be eternally grateful. 🙇 |
Change https://go.dev/cl/386016 mentions this issue: |
Change https://go.dev/cl/386014 mentions this issue: |
This reverts https://go.dev/cl/385035. For 1.18 we will use a simple change to increase the accepted DNS packet size, to handle what appear to be broken resolvers that don't honor the 512 byte limit. For 1.19 we will restore CL 385035 to make a proper EDNS request, so that it has more testing time before it goes out in a release. For #6464 For #21160 For #44135 For #51127 For #51153 Change-Id: Ie4a0eb85ca0a6a73bee5cd4cfc6b7d2a15ef259f Reviewed-on: https://go-review.googlesource.com/c/go/+/386014 Trust: Ian Lance Taylor <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Reviewed-by: Damien Neil <[email protected]>
Change https://go.dev/cl/386034 mentions this issue: |
Change https://go.dev/cl/386035 mentions this issue: |
The existing value of 512 bytes as is specified by RFC 1035. However, the WSL resolver reportedly sends larger packets without setting the truncation bit, which breaks using the Go resolver. For 1.18 and backports, just increase the accepted packet size. This is what GNU glibc does (they use 65536 bytes). For 1.19 we plan to use EDNS to set the accepted packet size. That will give us more time to test whether that causes any problems. No test because I'm not sure how to write one and it wouldn't really be useful anyhow. Fixes #6464 Fixes #21160 Fixes #44135 Fixes #51127 For #51153 Change-Id: I0243f274a06e010ebb714e138a65386086aecf17 Reviewed-on: https://go-review.googlesource.com/c/go/+/386015 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Damien Neil <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
…1232 bytes The existing value of 512 bytes as is specified by RFC 1035. However, the WSL resolver reportedly sends larger packets without setting the truncation bit, which breaks using the Go resolver. For 1.18 and backports, just increase the accepted packet size. This is what GNU glibc does (they use 65536 bytes). For 1.19 we plan to use EDNS to set the accepted packet size. That will give us more time to test whether that causes any problems. No test because I'm not sure how to write one and it wouldn't really be useful anyhow. For #6464 For #21160 For #44135 For #51127 For #51153 Fixes #51162 Change-Id: I0243f274a06e010ebb714e138a65386086aecf17 Reviewed-on: https://go-review.googlesource.com/c/go/+/386015 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Damien Neil <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> TryBot-Result: Gopher Robot <[email protected]> (cherry picked from commit 6e82ff8) Reviewed-on: https://go-review.googlesource.com/c/go/+/386035 Reviewed-by: Dmitri Shuralyov <[email protected]>
…1232 bytes The existing value of 512 bytes as is specified by RFC 1035. However, the WSL resolver reportedly sends larger packets without setting the truncation bit, which breaks using the Go resolver. For 1.18 and backports, just increase the accepted packet size. This is what GNU glibc does (they use 65536 bytes). For 1.19 we plan to use EDNS to set the accepted packet size. That will give us more time to test whether that causes any problems. No test because I'm not sure how to write one and it wouldn't really be useful anyhow. For #6464 For #21160 For #44135 For #51127 For #51153 Fixes #51161 Change-Id: I0243f274a06e010ebb714e138a65386086aecf17 Reviewed-on: https://go-review.googlesource.com/c/go/+/386015 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Damien Neil <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> TryBot-Result: Gopher Robot <[email protected]> (cherry picked from commit 6e82ff8) Reviewed-on: https://go-review.googlesource.com/c/go/+/386034 Reviewed-by: Dmitri Shuralyov <[email protected]>
Advertise to DNS resolvers that we are willing and able to accept up to 1232 bytes in a DNS packet. The value 1232 was chosen based on https://dnsflagday.net/2020/. For #6464 For #21160 For #44135 For #51127 Fixes #51153 Change-Id: If9182d5210bfe047cf0a4d46163effc6812ab677 Reviewed-on: https://go-review.googlesource.com/c/go/+/386016 Trust: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Damien Neil <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Reference: golang/go#51127 Reference: #157 Reference: #188 Updates the testing and release processes to the latest 1.16.x version, which resolves a longstanding Go resolver issue with responses greater than 512 bytes. Verifies by enabling a previously skipped acceptance test. Also adds CHANGELOG entries for upstream module updates which are bundled with this provider release, which may fix specific EDNS handling issues.
Reference: golang/go#51127 Reference: #157 Reference: #188 Updates the testing and release processes to the latest 1.16.x version, which resolves a longstanding Go resolver issue with responses greater than 512 bytes. Verifies by enabling a previously skipped acceptance test. Also adds CHANGELOG entries for upstream module updates which are bundled with this provider release, which may fix specific EDNS handling issues.
So, you found this issue googling for "cannot unmarshal DNS"
There's good news: your issue has largely been fixed. The issue below was created initially because I discovered it in my network and operating system, but further discovery found that this issue has affected every major OS and users of VPNs, DNS providers written in Go, and more.
If you are a maintainer of code and someone has reported this issue: if you can update your build system to use Go 1.16.15 or 1.17.8, or Go 1.18, then you should see this go away and solve your users' issues.
If you are a user of a program and see this error, you need to ask the maintainer or creator of that package to do likewise. Unfortunately, there isn't a single set of instructions I can give for a workaround. If you're using a VPN, try using that program not on a VPN; that seems to be the most common user-reported scenario I've seen.
Original bug report:
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?Note: WSL2 on Windows. This is relevant, but not the sole scenario in which it can occur, see below.
go env
OutputWhat did you do?
Use infrastructure as code tools to manage Azure, and/or attempt to execute
net.LookupIP("management.azure.com")
.Example program:
What did you expect to see?
I expected to see the current IP, 13.86.219.80, as shown by the last line of:
What did you see instead?
Miscellany
It looks like this issue is widely affecting infrastructure as code tools such as Pulumi, Terraform, and others when they make API calls to Microsoft Azure on the Windows Subsystem for Linux 2, on Microsoft Windows.
This is a bit of a rock and a hard place situation. Microsoft is unlikely to update their DNS server to adhere to the pre-1999 DNS specification. The Go language team is in a position to be much more agile and issue a point release update to support a larger buffer size, even just going up to a single standard MTU of ~1500 bytes would resolve this issue in the near term.
As this problem primarily affects programs written in Go, in this author's estimation it seems unlikely a change in Windows' DNS server behavior could occur as quickly, even if the stars were to align on the need to change the implementation. Note that
host
,dig
,nslookup
, etc all behave correctly.Collected notes and root cause analysis:
DNS Flag Day 2020 had an explicit goal of ensuring that resolvers had a minimum accepted buffer size of 1232 bytes: https://dnsflagday.net/2020/#action-dns-resolver-operators
The text was updated successfully, but these errors were encountered: