-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cURL IPv4 issue since alpine 3.19
#366
Comments
The description isn't exactly accurate about the behavior. See c-ares/c-ares#551 for a better description, but basically a change was made in c-ares 1.20.0 to not go through the entire timeout sequence if we had at least a partial reply as it is very likely that it won't work. It still waits for the other address family to timeout or have some other issue on the current request. So if someone has Now, there apparently have been reported issues to glibc that does something similar to this as per https://man7.org/linux/man-pages/man5/resolv.conf.5.html:
So likely before c-ares 1.20.0, the retries allowed this to eventually succeed in such an environment. Currently c-ares doesn't honor the glibc single-request option. It would probably be good to know if this is what is really happening in your environment, a tcpdump/pcap would be useful. You should probably open a ticket in https://github.com/c-ares/c-ares/issues with your findings. |
I should also mention that we just added alpine linux automated (CI/CD) testing to c-ares to ensure there are no behavioral differences (e.g. due to musl c). All tests are passing, so I'm pretty sure whatever you are experiencing is outside of alpine's scope. |
Thank you very much for your answers. On my side, I'm not that advanced on networking so I'm not 100% sure I could handle this. I'll give it a try by looking at Nevertheless, I'm curious about the result you got when attempting to simply try to reproduce my commands. Did it actually work for you? I mean, when running from Docker containers, I expect almost nothing is fetch from my local environment as I thought containers are mainly isolated. I'm aware that core libs from native OS are used, of course, but I wouldn't expect any difference between my OS, a canonical alpine 3.18 from this OS and a canonical alpine 3.19 from this exact same OS, as what's imported in the containers to make it work seems highly generic and kernel-related to me. Anyway, thanks to IT colleagues I'll ask and your advices, I'll try to investigate as much as I can to identify the reasons I'm experiancing such issue. |
I haven't tried your exact scenario (using curl), just building and running the c-ares test suite on alpine linux. |
I tried and indeed, I've the same issue: $ docker run --rm -it --entrypoint=/bin/sh alpine:3.19 -c "apk add curl && curl --trace - --trace-time www.google.com"
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
(1/8) Installing ca-certificates (20230506-r0)
(2/8) Installing brotli-libs (1.1.0-r1)
(3/8) Installing c-ares (1.22.1-r0)
(4/8) Installing libunistring (1.1-r2)
(5/8) Installing libidn2 (2.3.4-r4)
(6/8) Installing nghttp2-libs (1.58.0-r0)
(7/8) Installing libcurl (8.5.0-r0)
(8/8) Installing curl (8.5.0-r0)
Executing busybox-1.36.1-r15.trigger
Executing ca-certificates-20230506-r0.trigger
OK: 12 MiB in 23 packages
14:14:49.612106 == Info: Host www.google.com:80 was resolved.
14:14:49.612159 == Info: IPv6: 2a00:1450:4001:80b::2004
14:14:49.612165 == Info: IPv4: (none)
14:14:49.612212 == Info: Trying [2a00:1450:4001:80b::2004]:80...
14:14:49.612242 == Info: Immediate connect fail for 2a00:1450:4001:80b::2004: Address not available
14:14:49.612258 == Info: Failed to connect to www.google.com port 80 after 2002 ms: Couldn't connect to server
14:14:49.612268 == Info: Closing connection
curl: (7) Failed to connect to www.google.com port 80 after 2002 ms: Couldn't connect to server |
Well, I am running the current c-ares main, not v1.22 which is a couple release behind (current release is v1.24). Perhaps there is some issue in v1.22 ? Anyhow, in our current c-ares CI system, this is the latest alpine build with tests: If you search for
In theory, that's exactly what curl should see as curl should internally be using the same function as ahost does (ares_getaddrinfo), and we can see both ipv4 and ipv6 addresses. That said, I don't know what alpine test environment you're using, as it could very well be environmental with what DNS servers you are using. Everything after that point is just running the whole test suite. |
On my side, I just gave it a try today with Preparation $ > docker run --rm -it --entrypoint=/bin/sh alpine:3.19
# In container:
/ > apk add curl tcpdump
# Downloading… Logging shell # Display verbosly with hexadecimal content representation, with IP and port on interface "eth0" (default one on Docker container) where source or destination is my current IP:
/ > tcpdump -vvXnni eth0 src $(hostname -i) or dst $(hostname -i)
# tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes Calling cURL shell / > curl --trace - --trace-time www.google.com Logging shell updating… # 16:09:59.631641 IP (tos 0x0, ttl 64, id 18664, offset 0, flags [DF], proto UDP (17), length 71)
# 10.1.128.2.39924 > 172.18.21.134.53: [bad udp cksum 0x4be0 -> 0x4f44!] 4305+ [1au] A? www.google.com. ar: . OPT UDPsize=1280 (43)
# 0x0000: 4500 0047 48e8 4000 4011 a622 0a01 8002 E..GH.@.@.."....
# 0x0010: ac12 1586 9bf4 0035 0033 4be0 10d1 0100 .......5.3K.....
# 0x0020: 0001 0000 0000 0001 0377 7777 0667 6f6f .........www.goo
# 0x0030: 676c 6503 636f 6d00 0001 0001 0000 2905 gle.com.......).
# 0x0040: 0000 0000 0000 00 .......
# 16:09:59.631696 IP (tos 0xc0, ttl 64, id 990, offset 0, flags [none], proto ICMP (1), length 99)
# 172.18.21.134 > 10.1.128.2: ICMP 172.18.21.134 udp port 53 unreachable, length 79
# IP (tos 0x0, ttl 64, id 18664, offset 0, flags [DF], proto UDP (17), length 71)
# 10.1.128.2.39924 > 172.18.21.134.53: [bad udp cksum 0x4be0 -> 0x4f44!] 4305+ [1au] A? www.google.com. ar: . OPT UDPsize=1280 (43)
# 0x0000: 45c0 0063 03de 0000 4001 2a61 ac12 1586 E..c....@.*a....
# 0x0010: 0a01 8002 0303 4c41 0000 0000 4500 0047 ......LA....E..G
# 0x0020: 48e8 4000 4011 a622 0a01 8002 ac12 1586 H.@.@.."........
# 0x0030: 9bf4 0035 0033 4be0 10d1 0100 0001 0000 ...5.3K.........
# 0x0040: 0000 0001 0377 7777 0667 6f6f 676c 6503 .....www.google.
# 0x0050: 636f 6d00 0001 0001 0000 2905 0000 0000 com.......).....
# 0x0060: 0000 00 ...
# 16:09:59.631737 IP (tos 0x0, ttl 64, id 48319, offset 0, flags [DF], proto UDP (17), length 71)
# 10.1.128.2.50287 > 172.18.86.200.53: [bad udp cksum 0x8d22 -> 0xfbd0!] 64107+ [1au] AAAA? www.google.com. ar: . OPT UDPsize=1280 (43)
# 0x0000: 4500 0047 bcbf 4000 4011 f108 0a01 8002 E..G..@.@.......
# 0x0010: ac12 56c8 c46f 0035 0033 8d22 fa6b 0100 ..V..o.5.3.".k..
# 0x0020: 0001 0000 0000 0001 0377 7777 0667 6f6f .........www.goo
# 0x0030: 676c 6503 636f 6d00 001c 0001 0000 2905 gle.com.......).
# 0x0040: 0000 0000 0000 00 .......
# 16:09:59.633326 IP (tos 0x0, ttl 124, id 58881, offset 0, flags [none], proto UDP (17), length 99)
# 172.18.86.200.53 > 10.1.128.2.50287: [udp sum ok] 64107 q: AAAA? www.google.com. 1/0/1 www.google.com. AAAA 2a00:1450:4001:806::2004 ar: . OPT UDPsize=4000 (71)
# 0x0000: 4500 0063 e601 0000 7c11 cbaa ac12 56c8 E..c....|.....V.
# 0x0010: 0a01 8002 0035 c46f 004f 7427 fa6b 8180 .....5.o.Ot'.k..
# 0x0020: 0001 0001 0000 0001 0377 7777 0667 6f6f .........www.goo
# 0x0030: 676c 6503 636f 6d00 001c 0001 c00c 001c gle.com.........
# 0x0040: 0001 0000 0050 0010 2a00 1450 4001 0806 .....P..*..P@...
# 0x0050: 0000 0000 0000 2004 0000 290f a000 0000 ..........).....
# 0x0060: 0000 00 ... Calling cURL shell updating… # 16:10:01.692831 == Info: Host www.google.com:80 was resolved.
# 16:10:01.692920 == Info: IPv6: 2a00:1450:4001:806::2004
# 16:10:01.692949 == Info: IPv4: (none)
# 16:10:01.693032 == Info: Trying [2a00:1450:4001:806::2004]:80...
# 16:10:01.693118 == Info: Immediate connect fail for 2a00:1450:4001:806::2004: Address not available
# 16:10:01.693162 == Info: Failed to connect to www.google.com port 80 after 2003 ms: Couldn't connect to server
# 16:10:01.693196 == Info: Closing connection
# curl: (7) Failed to connect to www.google.com port 80 after 2003 ms: Couldn't connect to server Logging shell updating… # 16:10:04.765537 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.128.1 tell 10.1.128.2, length 28
# 0x0000: 0001 0800 0604 0001 0242 0a01 8002 0a01 .........B......
# 0x0010: 8002 0000 0000 0000 0a01 8001 ............
# 16:10:04.765607 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.128.2 tell 10.1.128.1, length 28
# 0x0000: 0001 0800 0604 0001 0242 0c35 ad1f 0a01 .........B.5....
# 0x0010: 8001 0000 0000 0000 0a01 8002 ............
# 16:10:04.765616 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.1.128.2 is-at 02:42:0a:01:80:02, length 28
# 0x0000: 0001 0800 0604 0002 0242 0a01 8002 0a01 .........B......
# 0x0010: 8002 0242 0c35 ad1f 0a01 8001 ...B.5......
# 16:10:04.765680 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.1.128.1 is-at 02:42:0c:35:ad:1f, length 28
# 0x0000: 0001 0800 0604 0002 0242 0c35 ad1f 0a01 .........B.5....
# 0x0010: 8001 0242 0a01 8002 0a01 8002 ...B........ The only thing I can suspect is I'll check with my IT dept. people working on the DNS configuration, maybe 🤷 |
172.18.21.134 received an ICMP unreachable reply, and 172.18.86.200 works. That said, it's not immediately clear to me why the A request went to 172.18.21.134 and the AAAA request went to 172.18.86.200. Can you share your /etc/resolv.conf ? I wonder if you have rotate enabled for the dns servers. |
Is that really the entirety of the tcp dump? Typically an event should be received on an ICMP unreachable which then recv() would be called and then detect the udp destination isn't valid, so we should have seen another "A" record request go out, especially considering the timings shown here. |
The
That was the full tcp dump I got when filtering on my IP address. Here is the result without the filter, trying not to be too much polluted: / > tcpdump -vvXnni any
# tcpdump: data link type LINUX_SLL2
# tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
# 17:18:01.248975 eth0 Out IP (tos 0x0, ttl 64, id 10392, offset 0, flags [DF], proto UDP (17), length 71)
# 10.1.128.2.34956 > 172.18.21.134.53: [bad udp cksum 0x4be0 -> 0x4ecf!] 9390+ [1au] A? www.google.com. ar: . OPT UDPsize=1280 (43)
# 0x0000: 4500 0047 2898 4000 4011 c672 0a01 8002 E..G(.@[email protected]....
# 0x0010: ac12 1586 888c 0035 0033 4be0 24ae 0100 .......5.3K.$...
# 0x0020: 0001 0000 0000 0001 0377 7777 0667 6f6f .........www.goo
# 0x0030: 676c 6503 636f 6d00 0001 0001 0000 2905 gle.com.......).
# 0x0040: 0000 0000 0000 00 .......
# 17:18:01.249021 eth0 In IP (tos 0xc0, ttl 64, id 25710, offset 0, flags [none], proto ICMP (1), length 99)
# 172.18.21.134 > 10.1.128.2: ICMP 172.18.21.134 udp port 53 unreachable, length 79
# IP (tos 0x0, ttl 64, id 10392, offset 0, flags [DF], proto UDP (17), length 71)
# 10.1.128.2.34956 > 172.18.21.134.53: [bad udp cksum 0x4be0 -> 0x4ecf!] 9390+ [1au] A? www.google.com. ar: . OPT UDPsize=1280 (43)
# 0x0000: 45c0 0063 646e 0000 4001 c9d0 ac12 1586 E..cdn..@.......
# 0x0010: 0a01 8002 0303 4bcc 0000 0000 4500 0047 ......K.....E..G
# 0x0020: 2898 4000 4011 c672 0a01 8002 ac12 1586 (.@[email protected]........
# 0x0030: 888c 0035 0033 4be0 24ae 0100 0001 0000 ...5.3K.$.......
# 0x0040: 0000 0001 0377 7777 0667 6f6f 676c 6503 .....www.google.
# 0x0050: 636f 6d00 0001 0001 0000 2905 0000 0000 com.......).....
# 0x0060: 0000 00 ...
# 17:18:01.249058 eth0 Out IP (tos 0x0, ttl 64, id 4793, offset 0, flags [DF], proto UDP (17), length 71)
# 10.1.128.2.57608 > 172.18.86.200.53: [bad udp cksum 0x8d22 -> 0x7146!] 26717+ [1au] AAAA? www.google.com. ar: . OPT UDPsize=1280 (43)
# 0x0000: 4500 0047 12b9 4000 4011 9b0f 0a01 8002 E..G..@.@.......
# 0x0010: ac12 56c8 e108 0035 0033 8d22 685d 0100 ..V....5.3."h]..
# 0x0020: 0001 0000 0000 0001 0377 7777 0667 6f6f .........www.goo
# 0x0030: 676c 6503 636f 6d00 001c 0001 0000 2905 gle.com.......).
# 0x0040: 0000 0000 0000 00 .......
# 17:18:01.251345 eth0 In IP (tos 0x0, ttl 124, id 887, offset 0, flags [none], proto UDP (17), length 99)
# 172.18.86.200.53 > 10.1.128.2.57608: [udp sum ok] 26717 q: AAAA? www.google.com. 1/0/1 www.google.com. AAAA 2a00:1450:4001:80b::2004 ar: . OPT UDPsize=4000 (71)
# 0x0000: 4500 0063 0377 0000 7c11 ae35 ac12 56c8 E..c.w..|..5..V.
# 0x0010: 0a01 8002 0035 e108 004f e9a0 685d 8180 .....5...O..h]..
# 0x0020: 0001 0001 0000 0001 0377 7777 0667 6f6f .........www.goo
# 0x0030: 676c 6503 636f 6d00 001c 0001 c00c 001c gle.com.........
# 0x0040: 0001 0000 0047 0010 2a00 1450 4001 080b .....G..*..P@...
# 0x0050: 0000 0000 0000 2004 0000 290f a000 0000 ..........).....
# 0x0060: 0000 00 ...
# 17:18:06.429450 eth0 Out ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.128.1 tell 10.1.128.2, length 28
# 0x0000: 0001 0800 0604 0001 0242 0a01 8002 0a01 .........B......
# 0x0010: 8002 0000 0000 0000 0a01 8001 ............
# 17:18:06.429507 eth0 In ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.128.2 tell 10.1.128.1, length 28
# 0x0000: 0001 0800 0604 0001 0242 0c35 ad1f 0a01 .........B.5....
# 0x0010: 8001 0000 0000 0000 0a01 8002 ............
# 17:18:06.429513 eth0 Out ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.1.128.2 is-at 02:42:0a:01:80:02, length 28
# 0x0000: 0001 0800 0604 0002 0242 0a01 8002 0a01 .........B......
# 0x0010: 8002 0242 0c35 ad1f 0a01 8001 ...B.5......
# 17:18:06.429547 eth0 In ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.1.128.1 is-at 02:42:0c:35:ad:1f, length 28
# 0x0000: 0001 0800 0604 0002 0242 0c35 ad1f 0a01 .........B.5....
# 0x0010: 8001 0242 0a01 8002 0a01 8002 ...B........ In parallel, I also tried to call Plus, even if that's my IPv6 configuration on my machine which is wrongly set, it doesn't explain why the I'm a bit lost tbh, so thank you very much for your help about that! |
Ok, well that's even more interesting. That means the 10.1.128.2.34956 > 172.18.21.134.53 wasn't generated by c-ares at all, but from your local resolver at 127.0.0.1, as 172.18.21.134 isn't listed in your /etc/resolv.conf at all so there's no way c-ares would try to use that. Can you tcpdump all interfaces on port 53 udp on the machine and try again? I'd expect "lo" listed as an interface with port 53 traffic. |
When I run I'm cleaning up all of them and I'll try again. After cleaning up, you're right, I can see lots of traces on "lo" interface with port 53 traffic.
Do those calls may interfere with the curl requests I'm trying to do on my containers? |
I continued to investigate and I found something quite interesting IMO. I think the issue comes from the fact I can't use IPv6, neither on my machine nor on any container it hosts. But I also think that something could be improved in c-ares to remediate to such issue. I managed to simplify my tests to highlight only the important things, so here are my runs: # Inside a container from image alpine:3.19, on which I added `curl` via `apk add curl`.
/ > curl -Ivvv www.google.com
# * Host www.google.com:80 was resolved.
# * IPv6: 2a00:1450:4025:401::69, 2a00:1450:4025:401::6a, 2a00:1450:4025:401::67, 2a00:1450:4025:401::93
# * IPv4: (none)
# * Trying [2a00:1450:4025:401::69]:80...
# * Immediate connect fail for 2a00:1450:4025:401::69: Address not available
# * Trying [2a00:1450:4025:401::6a]:80...
# * Immediate connect fail for 2a00:1450:4025:401::6a: Address not available
# * Trying [2a00:1450:4025:401::67]:80...
# * Immediate connect fail for 2a00:1450:4025:401::67: Address not available
# * Trying [2a00:1450:4025:401::93]:80...
# * Immediate connect fail for 2a00:1450:4025:401::93: Address not available
# * Failed to connect to www.google.com port 80 after 2003 ms: Couldn't connect to server
# * Closing connection
# curl: (7) Failed to connect to www.google.com port 80 after 2003 ms: Couldn't connect to server
/ > curl -Ivvv4 www.google.com
# * Host www.google.com:80 was resolved.
# * IPv6: (none)
# * IPv4: 142.250.27.105, 142.250.27.106, 142.250.27.99, 142.250.27.103, 142.250.27.147, 142.250.27.104
# * Trying 142.250.27.105:80...
# * Connected to www.google.com (142.250.27.105) port 80
# > HEAD / HTTP/1.1
# > Host: www.google.com
# > User-Agent: curl/8.5.0
# Blablabla… the response is OK To me, that means that when I first ran To me, that means 2 issues:
I'll investigate more to make IPv6 work on my environment, and this should solve my issue, but I highly suspect other people to have wrongly set configurations too encountering the problem that PS: Looking at |
Its impossible to tell what is going on with your system with the information provided. The real issue is you have a local dns resolver running at 127.0.0.1 and other servers configured. We can't tell from what you've provide what c-ares is doing vs your local resolver. I don't believe your conclusion is accurate based on the information at hand. Really you either need to remove your local resolver from /etc/resolv.conf and test ... or remove all other dns servers and leave only the local resolver. |
This should fix it: If you want to test locally, you can install c-ares from the |
Hi! Thanks for the information!
At least in my environment, it seems to still fail even with the new version of c-ares. If you see any issue with my test or would like me to test anything else, please let me know. 🙏 Thanks again. |
FWIW, I have the exact same report than @Tithugues above: I still can't connect with the same issue. That means my assumption that issue was caused by c-ares in v1.22 is wrong. To me, the issue is still related to the fact I can't connect to anything with IPv6, and the "software" responsible for DNS resolution is failing to do its job properly, a.k.a. fallback on IPv4. I thought it was c-ares, as it was a new dependency or curl in alpine 3.19 compared to 3.18, but maybe I was wrong, or maybe it is actually c-ares but the version 1.24 doesn't fix my problem. When running this on alpine 3.19 / > curl -vvv www.google.com --trace-time
# 09:55:27.932169 * Host www.google.com:80 was resolved.
# 09:55:27.932296 * IPv6: 2a00:1450:4001:80b::2004
# 09:55:27.932366 * IPv4: (none)
# 09:55:27.932437 * Trying [2a00:1450:4001:80b::2004]:80...
# 09:55:27.932526 * Immediate connect fail for 2a00:1450:4001:80b::2004: Network unreachable
# 09:55:27.932587 * Failed to connect to www.google.com port 80 after 2001 ms: Couldn't connect to server
# 09:55:27.932639 * Closing connection
# curl: (7) Failed to connect to www.google.com port 80 after 2001 ms: Couldn't connect to server I can clearly see that DNS is resolving the IPv6 faster, but as I can't use IPv6, I just can't connect. Such thing should be tested before the IPv6 resolution starts as there's no point on resolving it. If I run the exact same command under alpine 3.18, we can see the DNS resolution is done on both IPv6 and IPv4: / > curl -vvv www.google.com --trace-time
# 10:00:08.889585 * Host www.google.com:80 was resolved.
# 10:00:08.889688 * IPv6: 2a00:1450:4001:80b::2004
# 10:00:08.889758 * IPv4: 142.250.186.164
# 10:00:08.889855 * Trying 142.250.186.164:80...
# 10:00:08.892884 * Connected to www.google.com (142.250.186.164) port 80
# 10:00:08.893074 > GET / HTTP/1.1
# 10:00:08.893074 > Host: www.google.com
# 10:00:08.893074 > User-Agent: curl/8.5.0
# 10:00:08.893074 > Accept: */*
# 10:00:08.893074 >
# 10:00:08.938750 < HTTP/1.1 200 OK
# 10:00:08.938830 < Date: Thu, 04 Jan 2024 10:00:08 GMT
# 10:00:08.938886 < Expires: -1
# 10:00:08.938950 < Cache-Control: private, max-age=0
# 10:00:08.939035 < Content-Type: text/html; charset=ISO-8859-1
# 10:00:08.939107 < Content-Security-Policy-Report-Only: object-src 'none';base-uri 'self';script-src 'nonce-btQ4x8jy7FUdjfmcMd8zxQ' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/other-hp
# 10:00:08.939171 < Server: gws
# 10:00:08.939236 < X-XSS-Protection: 0
# 10:00:08.939308 < X-Frame-Options: SAMEORIGIN
# 10:00:08.939367 < Set-Cookie: AEC=Ackid1T8i9FSUMjTgdj_cyfnoIvnHWy4Kp6QBB4EJ6ShA1xNuiHoehcWOw; expires=Tue, 02-Jul-2024 10:00:08 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax
# 10:00:08.939431 < Accept-Ranges: none
# 10:00:08.939520 < Vary: Accept-Encoding
# 10:00:08.939577 < Transfer-Encoding: chunked
# 10:00:08.939639 <
# <!doctype html>…</html> # Google's homepage. so, that's working thanks to IPv4 connection. Whatever "software", responsible for stopping any DNS resolution as soon as either IPv6 or IPv4 is resolved, must be improved to either:
With my hands tied on this currently, I don't know how to go further here, unfortunately 😢 |
As stated before, you have both a local dns server running at 127.0.0.1 and configurations of other servers which greatly complicates the ability to debug what is going on. I'd need access to a system that's not working in order to have any chance of determining what is really going on. Likely c-ares/c-ares#551 plays a role in the issue, but it doesn't seem wise to revert that as it will greatly extend DNS resolution times. |
When in my container, if I open the So, indeed, something is related to the configuration of my local DNS on my local machine. Thank you very much for pointing this out 🤟 This leads to 2 questions to me then:
Question 2 is probably for IT dept. of my company 😆 . |
What is the behavior the opposite direction, if you leave only that local DNS server in place? Does it still get an ipv6 address (when running curl with -Ivvv)? |
Nope, it makes "www.google.com" unresolvable: / > curl -I -vvv www.google.com
# * Could not resolve host: www.google.com
# * Closing connection
# curl: (6) Could not resolve host: www.google.com So maybe there's a real big issue with my local DNS that used to be masked by the other nameservers I have. However, I think I still need this nameserver to my local machine in order for my containers to communicate each other. |
so is your local nameserver meant to only resolve some subset of domains, specific to your internal network? If so, I believe its supposed to have a |
By the way, my theory is your local DNS server is configured to be recursive, but since IPv6 is not working on the host the ipv6 fails fast, so c-ares sends the ipv6 query to the next configured server. But the ipv4 query tries to recurse within your local DNS server, and eventually fails and returns that failure to c-ares ... however, by the time it fails, c-ares already received a legitimate reply for ipv6 from the next server so any retries for ipv4 are halted and you get only an ipv6 address back. If that is really what is happening, this falls within an "undefined behavior" grey zone. Since your local DNS server can't recurse, recursion should be disabled in its configuration, which in theory should fix the issue. |
Thanks for sharing this, I wasn't aware about that. I'll do that soon.
I really do think so, or kind of. My local DNS is Or maybe I'm misunderstanding something? |
Are your domains you're trying to resolve really ending in ".local"? If so, ".local" is reserved for multicast DNS (mDNS). That would also mean you're not maintaining any form of internal dns records within your local resolver. Perhaps this is a workaround to the fact that the alpine linux musl libc resolver doesn't implement multicast dns, but dnsmasq does, which makes a lot of sense why you might have your configuration this way. Infact, c-ares doesn't yet support multicast dns either, but it is something we are aware of and is on my task list ( c-ares/c-ares#171 ). The obvious solution here would be to make it so your dnsmasq can fully perform recursive DNS operations properly, and make it the only dns server in your /etc/resolv.conf. That is honestly the only configuration that would make your setup not rely on some undefined behavior (that just so happens to work some or most of the time). |
FYI, I came across this via a different route. I was using
I can verify that using |
Yes, my servers are reachable via
Unfortunately, due to constraints given by my company, I can't remove the other dns servers, otherwise I won't have access to internal servers my company's hosting. |
I'll give it a try with EDIT : aaaaaaand, that's a failure 😆 > docker run --rm -it --entrypoint=/bin/sh alpine:edge
# Unable to find image 'alpine:edge' locally
# edge: Pulling from library/alpine
# dcccee43ad5d: Pull complete
# Digest: sha256:9f867dc20de5aa9690c5ef6c2c81ce35a918c0007f6eac27df90d3166eaa5cc0
# Status: Downloaded newer image for alpine:edge
/ > apk add curl
# fetch https://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz
# fetch https://dl-cdn.alpinelinux.org/alpine/edge/community/x86_64/APKINDEX.tar.gz
# (1/8) Installing ca-certificates (20230506-r0)
# (2/8) Installing brotli-libs (1.1.0-r1)
# (3/8) Installing c-ares (1.24.0-r0)
# (4/8) Installing libunistring (1.1-r2)
# (5/8) Installing libidn2 (2.3.4-r4)
# (6/8) Installing nghttp2-libs (1.58.0-r0)
# (7/8) Installing libcurl (8.5.0-r0)
# (8/8) Installing curl (8.5.0-r0)
# Executing busybox-1.36.1-r17.trigger
# Executing ca-certificates-20230506-r0.trigger
# OK: 12 MiB in 23 packages
/ > curl -I -vvv www.google.com
# * Host www.google.com:80 was resolved.
# * IPv6: 2a00:1450:4025:401::69, 2a00:1450:4025:401::93, 2a00:1450:4025:401::67, 2a00:1450:4025:401::6a
# * IPv4: (none)
# * Trying [2a00:1450:4025:401::69]:80...
# * Immediate connect fail for 2a00:1450:4025:401::69: Address not available
# * Trying [2a00:1450:4025:401::93]:80...
# * Immediate connect fail for 2a00:1450:4025:401::93: Address not available
# * Trying [2a00:1450:4025:401::67]:80...
# * Immediate connect fail for 2a00:1450:4025:401::67: Address not available
# * Trying [2a00:1450:4025:401::6a]:80...
# * Immediate connect fail for 2a00:1450:4025:401::6a: Address not available
# * Failed to connect to www.google.com port 80 after 2002 ms: Couldn't connect to server
# * Closing connection
# curl: (7) Failed to connect to www.google.com port 80 after 2002 ms: Couldn't connect to server |
The content of EDIT: (by the way, trying to add a If I put my local DNS ( |
FYI this issues seems to be back again with alpine 3.20. |
please be more specific on the exact issue you are having. If you install c-ares 1.34.2, which is the latest version in alpine edge, do you still have your issue? |
I do not have the issue with c-ares 1.27.0-r0 (alpine 3.19), I have the issue with c-ares-1.33.1 (alpine 3.20) 1.34.2-r0 (alpine edge)
|
can you install c-ares-utils and try the |
Unfortunately I'm not familiar with these tools, I must doing something wrong:
|
well, that's going to be a packaging issue on the alpine side, sounds like they're redistributing the libtool wrapper rather than the actual utility to me. I'll see if I can investigate that and send those guys a PR. In the mean time if you can use curl to get a tcpdump of the communication with your DNS server during the failure, that would be useful. |
merge request to fix packaging issue here: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/73955 |
any luck getting a tcpdump of the dns query? Also, can you provide any more information such as if looking up other domains works? If not, what kind of DNS server is running at 10.215.39.1 and 10.215.39.2? If I can replicate the issue it should be easy to solve. |
@remiville any luck getting a tcpdump? I've ping'ed the alpine guys on my merge request to fix the tools, no reply yet. |
ping |
@remiville the alpine package for c-ares-utils has been updated to 1.34.2-r1. The utilities ahost and adig should now work if you can see if they produce any more meaningful info. |
Sorry, busy busy
|
@remiville thanks for the reply. The DNS server is rejecting the query with a FORMERR response which is quite unusual. Can you try I'd also be interested to see if you could install the Finally, if you can provide information about the DNS servers in use so that I can try to reproduce myself, that would be very helpful. |
|
@remiville awesome, thanks for that. That confirms my suspicion that whatever DNS server you're using is non-compliant and BIND is affected by the same issue when trying to communicate. The server is not ignoring unrecognized edns options like its supposed to. What is the vendor of the DNS server you use? I think it should be possible to detect this particular situation and mark the server as incapable of supporting DNS cookies (or possibly even EDNS completely), and requeue any queries. The drawback of this of course is any applications using c-ares that are not long-lived (say curl on the command line) won't be able to "remember" this, and thus will always have to detect this leading to additional queries and latency. I'd like to report this issue to your upstream DNS server vendor, so do please let me know who that is so they can get their product fixed. |
I'm not par of IT team, but one said it's WindowsServer AD, no much information about that. |
Hrm, I'd sure hope microsoft's DNS isn't that braindead. Unfortunately that's a system I'm least familiar with to try to test, and even harder to submit bug reports. I'll let you know when we have the detection in place and in a release. Should be less than a week. |
BTW, for clarification for someone reading this in the furture, we sort of took over this ticket. This issue has nothing to do with the original issue reported. |
In theory the original issue reported by @niconoe- should be worked-around in c-ares/c-ares@765d558 |
Some DNS servers don't properly ignore unknown EDNS options as the spec says they must, and instead will return EFORMERR. See discussion roughly starting here: alpinelinux/docker-alpine#366 (comment) In this case the DNS server is known to support EDNS in general (as version prior to c-ares 1.33 worked which used EDNS), but when adding the EDNS DNS Cookie extension, they return EFORMERR. This is in violation of [RFC6891 6.1.2](https://datatracker.ietf.org/doc/html/rfc6891#section-6.1.2): > Any OPTION-CODE values not understood by a responder or requestor MUST be ignored. The server in this example actual echo's back the EDNS record further causing confusion that makes you think they might understand the record. We need to catch an EFORMERR and re-attempt the query without EDNS completely since they are really non-compliant with EDNS. We may support additional EDNS extensions in the future and don't want to have to probe each individual extension with a braindead server. Fixes #911 Authored-By: Brad House (@bradh352)
Some DNS servers don't properly ignore unknown EDNS options as the spec says they must, and instead will return EFORMERR. See discussion roughly starting here: alpinelinux/docker-alpine#366 (comment) In this case the DNS server is known to support EDNS in general (as version prior to c-ares 1.33 worked which used EDNS), but when adding the EDNS DNS Cookie extension, they return EFORMERR. This is in violation of [RFC6891 6.1.2](https://datatracker.ietf.org/doc/html/rfc6891#section-6.1.2): > Any OPTION-CODE values not understood by a responder or requestor MUST be ignored. The server in this example actual echo's back the EDNS record further causing confusion that makes you think they might understand the record. We need to catch an EFORMERR and re-attempt the query without EDNS completely since they are really non-compliant with EDNS. We may support additional EDNS extensions in the future and don't want to have to probe each individual extension with a braindead server. Fixes #911 Authored-By: Brad House (@bradh352)
Some DNS servers don't properly ignore unknown EDNS options as the spec says they must, and instead will return EFORMERR. See discussion roughly starting here: alpinelinux/docker-alpine#366 (comment) In this case the DNS server is known to support EDNS in general (as version prior to c-ares 1.33 worked which used EDNS), but when adding the EDNS DNS Cookie extension, they return EFORMERR. This is in violation of [RFC6891 6.1.2](https://datatracker.ietf.org/doc/html/rfc6891#section-6.1.2): > Any OPTION-CODE values not understood by a responder or requestor MUST be ignored. The server in this example actual echo's back the EDNS record further causing confusion that makes you think they might understand the record. We need to catch an EFORMERR and re-attempt the query without EDNS completely since they are really non-compliant with EDNS. We may support additional EDNS extensions in the future and don't want to have to probe each individual extension with a braindead server. Fixes #911 Authored-By: Brad House (@bradh352)
@remiville please try c-ares 1.34.3 which is now available in edge, hopefully it fixes your issue |
Thanks, I should find some time tomorrow to do a test. |
@bradh352 it is working !, thanks a lot !
|
@remiville great. Please do try to ask your IT dept which DNS server is responding there. I'd really like to inform the vendor. |
Hi, and thank you for your awesome work!
I'm experiencing an issue with alpine
3.19
when using curl: it seems that curl only tries to match IPv6 rather than being able to switch to the right IP version to connect. The thing is this doesn't look like to come from cURL itself, as on alpine3.18
, it works like a charm.How to reproduce
Alpine 3.19 (curl classic)
Alpine 3.19 (curl with --ipv4 option)
Alpine 3.18 (curl classic)
As you can see, the curl version is exactly the same between all tests (8.5.0-r0) but still, there's a difference between Alpine 3.18 and Alpine 3.19.
I expect the
curl
command from Alpine 3.19 to work as expected without the requiring need to force ipv4.If you need more info, fell free to ask. Thanks a lot
EDIT: after a very short investigation, I can see that Alpine 3.19 is now adding
c-ares=1.22.1-r0
as a dependency of cURL, and I just discovered that c-ares/c-ares#652 could be related: AFAIK, when cURL tries to resolve the DNS, it tries with both IPv4 or IPv6 by default, and takes the faster match.c-ares
is here to help cURL doing that in parallel so that the DNS resolution between both IPv4 or IPv6 is parallelized, resulting in faster cURL calls. But with the issue I just linked above, it looks like when multiple DNS researches are given and one fail, it gives the failure status definitly. Therefore, as the IPv6 fails faster than IPv4 is resolved,c-ares
wrongly says to cURL that the host is unreachable.I'm not 100% sure about this, but I think it deserves to take a look. I wasn't able to remove
c-ares
and give it a try without it.The text was updated successfully, but these errors were encountered: