-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fqdn: Update high-level package docs #11034
Conversation
Please set the appropriate release note label. |
1 similar comment
Please set the appropriate release note label. |
a97b78d
to
3b21187
Compare
3b21187
to
a71255e
Compare
test-docs-please |
d6c7ebc
to
8567323
Compare
test-docs-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope my comments are helpful. Very informative!
|
||
While there is no common culprit when debugging, the DNS Proxy is the most | ||
one-off component in this chain. The cascading caching scheme is also complex | ||
in its behaviour. Isolating an issue to these components, or to the policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isolating an issue to these components
to
Isolating an issue in these components
The proxy uses REFUSED DNS responses to indicate a denied request. Some libc | ||
implementations, notably musl which is common in alpine images, terminate the | ||
whole DNS search in these cases. To work around it, denied responses can be | ||
configured to be NXDOMAIN via ``--tofqdns-dns-reject-response-code``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change
via
to
by setting the command line argument
configured to be NXDOMAIN via ``--tofqdns-dns-reject-response-code``. | ||
|
||
*Monitor Events* | ||
The DNS Proxy emits L7 monitor events. One for the request and one for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming I understand, I would reword to this:
The DNS Proxy emits multiple L7 monitor events. One for the request and one for the response (if allowed). Often the L7 DNS rules are paired with L3 toFQDNs rules and, thus, emits extra events.
|
||
.. Note:: | ||
|
||
Bu sure to run cilium monitor on the same node as the pod being debugged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be
Otherwise, pods would attempt to make the outbound connection (the thing that | ||
caused the DNS lookup) before the datapath is ready. Many stacks retry the | ||
SYN in such cases but some return an error and some apps further crash as a | ||
response. This delay is configurable via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by setting the command argument
pkg/fqdn/doc.go
Outdated
// - Users must explicitly allow `*.*.svc.cluster.local.` in k8s clusters. | ||
// This is not automatic | ||
// - L7 DNS rules are egress-only | ||
// - The proxy emits L7 cilium-monitor events: one for the request and a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one for the request, an accept/reject event, and the final response.
pkg/fqdn/doc.go
Outdated
// - The proxy emits L7 cilium-monitor events: one for the request and a | ||
// accept/reject event, and the final response | ||
// | ||
// Apart from allowing or denying DNS requests the DNS proxy is used to observe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from allowing or denying DNS requests*,* the DNS proxy is used to observe
pkg/fqdn/doc.go
Outdated
// is because of ipcache limitations when mapping ip->identity. Endpoint | ||
// identities can clobber the FQDN IP identity | ||
// - Despite being tracked per-Endpoint. DNS lookup IPs are collected into a | ||
// global cache. This is historical and can be changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this was an experiment on my part. I usually have proper periods in lists but I think I read somewhere that one isn't supposed to... so I tried it out. I actually did try to look it up and I think I'll add them back in to satisfy my OCD.
// startup. These caches apply TTL expiration and limit the IP count per domain. | ||
// - Keep a global cache to combine all this DNS information and send it to the | ||
// policy system. This cache applies TTL but not per-domain limits. | ||
// This causes a DNS lookup in one endpoint to leak to another! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were many hand-wringey converstions about this. I think the conclusion was that:
1- Trusting the DNS data itself would use other tools (including controlling the DNS lookups and where they can go)
2- It's a little messy anyway because the DNS TTL is much shorter than a connection could be and we would expect to allow the whole connection. So letting it leak between endpoints is sort of like that (hand-wave).
Now that we have more wiring to the connection tracking we can clamp down on this quite a bit. Setting aside weird case like one pod doing service lookup and giving the IP to another (which I planned to support as a sibling to the DNS discovery) it's unlikely that leaking out of a pod will be needed.
pkg/fqdn/doc.go
Outdated
// requests to a target with changing IPs (like S3) but another makes few | ||
// requests that are long-lived. We need to ensure "fairness" where one does | ||
// not starve the other. The limits in the per-Endpoint caches allow this, and | ||
// the global cache acts are a collector across different Endpoints (without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acts as?
8567323
to
8416bc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nathanjsweet! I basically took all your comments in as-is. I reworded some bits.
- Endpoint regeneration is slow and the Policy Map has not been updated yet. | ||
This can occur in cases where we have leaked IPs from the DNS cache (i.e. the | ||
were never deleted correctly) or when there are legitimately many IPs. It can | ||
also simply mean an overloaded node or even a deadlock within cilium. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no! an actual lock vs lock deadlock. We have had those because we have a bunch of concurrent stuff that synchronises with a combination of locks, channels and wait groups so it can get messy.
pkg/fqdn/doc.go
Outdated
// is because of ipcache limitations when mapping ip->identity. Endpoint | ||
// identities can clobber the FQDN IP identity | ||
// - Despite being tracked per-Endpoint. DNS lookup IPs are collected into a | ||
// global cache. This is historical and can be changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this was an experiment on my part. I usually have proper periods in lists but I think I read somewhere that one isn't supposed to... so I tried it out. I actually did try to look it up and I think I'll add them back in to satisfy my OCD.
// startup. These caches apply TTL expiration and limit the IP count per domain. | ||
// - Keep a global cache to combine all this DNS information and send it to the | ||
// policy system. This cache applies TTL but not per-domain limits. | ||
// This causes a DNS lookup in one endpoint to leak to another! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were many hand-wringey converstions about this. I think the conclusion was that:
1- Trusting the DNS data itself would use other tools (including controlling the DNS lookups and where they can go)
2- It's a little messy anyway because the DNS TTL is much shorter than a connection could be and we would expect to allow the whole connection. So letting it leak between endpoints is sort of like that (hand-wave).
Now that we have more wiring to the connection tracking we can clamp down on this quite a bit. Setting aside weird case like one pod doing service lookup and giving the IP to another (which I planned to support as a sibling to the DNS discovery) it's unlikely that leaking out of a pod will be needed.
cbf965a
to
4148d8f
Compare
test-docs-please |
pkg/fqdn/doc.go
Outdated
// restrictions). | ||
// | ||
// Expiration of DNS data is handled by the dns-garbage-collector-job controller. | ||
// Historically, the only expiraion was TTL based and the per-Endpoint and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"expiration"
Identities and Policy | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Once a DNS response has been passed back through the proxy and is places in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"placed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Noticed two typos though :-)
4148d8f
to
b40d4b8
Compare
test-docs-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome docs.
I wonder on a couple of the points how well the docs will date (particularly things that reference history without a reference point to say when this information was true), but I don't really have any good suggestions how we can improve this. AFAIK all of this will stay true for the forseable future, as we're not actively planning any work at the moment on the core DNS handling engine.
Feel free to either address the comments below in this PR or follow up with the minor fixups.
While there is no common culprit when debugging, the DNS Proxy is the most | ||
one-off component in this chain. The cascading caching scheme is also complex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what the most one-off component
means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had "most likely to have bugs" in mind but I changed the wording to say that.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The proxy uses REFUSED DNS responses to indicate a denied request. Some libc | ||
implementations, notably musl which is common in alpine images, terminate the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementations, notably musl which is common in alpine images, terminate the | |
implementations, notably musl which is common in Alpine Linux images, terminate the |
|
||
The proxy uses REFUSED DNS responses to indicate a denied request. Some libc | ||
implementations, notably musl which is common in alpine images, terminate the | ||
whole DNS search in these cases. To work around it, denied responses can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the user-facing error in these cases? Could we provide an example of what this looks like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's app dependent, I'm afraid. The return from the lookup function is empty and the app would then error. I added a bit more explanation of what to look for.
mean that the policy does not select this endpoint or there is an issue with | ||
the proxy redirection. | ||
In the past, a bug occurred with more permissive L3 rules overriding the | ||
proxy redirect, causing the proxy to never see the requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, users can check whether there are redirects using cilium status --all-redirects
.
In the past, a bug occurred with more permissive L3 rules overriding the | ||
proxy redirect, causing the proxy to never see the requests. | ||
- If the L7 DNS request is blocked, with an explicit denied message, then the | ||
requests are not allowed by the proxy. This may be due to a typo or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requests are not allowed by the proxy. This may be due to a typo or the | |
requests are not allowed by the proxy. This may be due to a typo in the network policy, or the |
passing policy information to the proxy. There is no way to dump the rules in | ||
the proxy, but a debug log is printed when a rule is added. Look for | ||
``DNS Proxy updating matchNames in allowed list during UpdateRules``. | ||
See pkg/proxy/dns.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last fragment doesn't seem fully formed. Maybe..
See pkg/proxy/dns.go | |
The pkg/proxy/dns.go file contains the DNS proxy implementation. |
- A per-Endpoint ``DNSCache`` stores the lookups for this endpoint. It is | ||
restored on cilium startup with the endpoint. Limits are applied here for | ||
``--tofqdns-endpoint-max-ip-per-hostname`` and TTLs are tracked. The | ||
``--tofqdns-min-ttl`` is not used here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``--tofqdns-min-ttl`` is not used here | |
``--tofqdns-min-ttl`` is not used here. |
L3 ``toFQDNs`` rules are egress only, so we would expect to see an ``Egress`` | ||
entry with Security Identity ``16777217``. The L7 rule, used to redirect to the | ||
DNS Proxy is also present with a populated ``PROXY PORT``. It has a 0 | ||
``IDENTITY`` as it is an L3 wildcard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``IDENTITY`` as it is an L3 wildcard. | |
``IDENTITY`` as it is an L3 wildcard, ie the policy allows any peer on the specified port. |
pkg/fqdn/doc.go
Outdated
// to-evict data internally and only do so on GC method calls from the | ||
// controller. | ||
// When DNS data is evicted from any per-Endpoint cache, for any reason, each | ||
// IP is retained as a "zombie". These are tracked per IP for that endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can work a little more clarity into here, something like:
These "zombies" represent IPs that were previously associated with a resolved DNS name, but the DNS name is no longer known (for example because of TTL expiry). However there may still be an active connection associated with the zombie IP.
(then the following part describes alive/dead zombies)
pkg/fqdn/doc.go
Outdated
// connections to the same IPs, however, so these IPs may be inserted into the | ||
// global cache for the same domain or a different one (or both). | ||
// Note: The CT GC has a variable run period. This ranges from 30s to 12 hours | ||
// and is shorter when more connection churn is observed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit for maintainability, I wonder for these constants like 1000, 30s, 12h whether we should refer to the variable that holds these configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it but I erred on being lazy. These would be the defaults like defaults.ToFQDNsMaxDeferredConnectionDeletes
. I wasn't sure how valuable that would be since the names are a tad opaque and finding them would be straightforward. The actual value seemed more important to me, since knowing that something may take 12 hours would change the debugging choices. I added the references in with the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about someone editing the default value, then wanting to grep to find references, so having the variable there should help keeping these docs up to date in such a case (hopefully).
|
||
WARNING: You are looking at unreleased Cilium documentation. | ||
Please use the official rendered version released here: | ||
http://docs.cilium.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://docs.cilium.io | |
https://docs.cilium.io |
We switched to the https
version for this URL in all the documentation a few days ago
The interactions of L3 toFQDNs and L7 DNS rules can be difficult to debug | ||
around. Unlike many other policy rules, these are resolved at runtime with | ||
unknown data. Pods may create large numbers of IPs in the cache or the IPs | ||
returned are not compatible with our datapath implementation. Sometimes we also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returned are not compatible with our datapath implementation. Sometimes we also | |
returned may not be compatible with our datapath implementation. Sometimes we also |
|
||
If an IP exists in the FQDN cache (check with ``cilium fqdn cache list``) then | ||
``toFQDNs`` rules that select a domain name, either explicitly via | ||
``matchName`` or via ``matchPattern`` should cause IPs for that domain to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``matchName`` or via ``matchPattern`` should cause IPs for that domain to have | |
``matchName`` or via ``matchPattern``, should cause IPs for that domain to have |
16777217 cidr:104.198.14.52/32 | ||
reserved:world | ||
|
||
Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16 Million range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16 Million range. | |
Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16-million range. |
Also, break the long line for consistency?
for IPs that came from the DNS Proxy. Other CIDR identities would not be | ||
included. | ||
|
||
*Datapath Plumbing* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Sphinx render this as a paragraph title? I'd expect it to joint it with the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! I had placeholders for the sections and I forgot to switch this one. Good catch!
16777217 | ||
&LabelSelector{MatchLabels:map[string]string{reserved.none: ,},MatchExpressions:[]LabelSelectorRequirement{},} 1 | ||
|
||
In this example 16777217 is used by two selectors, one with ``matchPattern: "*"`` and another empty one. This is because of the policy in use: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break the line?
Egress cidr:104.198.14.52/32 ANY NONE 477 6 | ||
reserved:world | ||
|
||
Note that the labels for identities are resolved here. This can be skipped, or there may be cases where this doesn't occur: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break the line?
The fqdn package has grown considerably since the first version of the package doc. It is very out-of-date. It now includes a more expanded discussion of how data moves within cilium and where the fqdn components sit in that. Signed-off-by: Ray Bejjani <[email protected]>
Debugging FQDN-related issues comes up often, and when developing the issues are more subtle. We don't have a section for developer debugging so we now do, and it includes a walkthrough of how to investigate the DNS proxy and fqdn DNS and policy components. It also touches briefly on how to interrogate the datapath Policy Map. Signed-off-by: Ray Bejjani <[email protected]>
b40d4b8
to
52909d2
Compare
test-docs-please |
I'm not too sure if all of this information should go in the package docs. We don't have a place for developer oriented debugging notes, for example, but a lot of what's in here is also stated in the cilium docs around these policy components.
Edit: reviewers, I added the complementary debug doc as well. You can skip it if you'd like but it's there so you have a sense for it when considering what information is where.