Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fqdn: Update high-level package docs #11034

Merged
merged 2 commits into from
Apr 27, 2020
Merged

Conversation

raybejjani
Copy link
Contributor

@raybejjani raybejjani commented Apr 17, 2020

I'm not too sure if all of this information should go in the package docs. We don't have a place for developer oriented debugging notes, for example, but a lot of what's in here is also stated in the cilium docs around these policy components.

Edit: reviewers, I added the complementary debug doc as well. You can skip it if you'd like but it's there so you have a sense for it when considering what information is where.

@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

1 similar comment
@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@coveralls
Copy link

coveralls commented Apr 17, 2020

Coverage Status

Coverage decreased (-0.02%) to 44.779% when pulling 52909d2 on raybejjani:fqdn-add-docs into d07eec5 on cilium:master.

@raybejjani
Copy link
Contributor Author

test-docs-please

@raybejjani raybejjani force-pushed the fqdn-add-docs branch 2 times, most recently from d6c7ebc to 8567323 Compare April 21, 2020 18:14
@raybejjani
Copy link
Contributor Author

test-docs-please

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope my comments are helpful. Very informative!


While there is no common culprit when debugging, the DNS Proxy is the most
one-off component in this chain. The cascading caching scheme is also complex
in its behaviour. Isolating an issue to these components, or to the policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isolating an issue to these components

to

Isolating an issue in these components

The proxy uses REFUSED DNS responses to indicate a denied request. Some libc
implementations, notably musl which is common in alpine images, terminate the
whole DNS search in these cases. To work around it, denied responses can be
configured to be NXDOMAIN via ``--tofqdns-dns-reject-response-code``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change

via

to

by setting the command line argument

configured to be NXDOMAIN via ``--tofqdns-dns-reject-response-code``.

*Monitor Events*
The DNS Proxy emits L7 monitor events. One for the request and one for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I understand, I would reword to this:

The DNS Proxy emits multiple L7 monitor events. One for the request and one for the response (if allowed). Often the L7 DNS rules are paired with L3 toFQDNs rules and, thus, emits extra events.


.. Note::

Bu sure to run cilium monitor on the same node as the pod being debugged!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be

Otherwise, pods would attempt to make the outbound connection (the thing that
caused the DNS lookup) before the datapath is ready. Many stacks retry the
SYN in such cases but some return an error and some apps further crash as a
response. This delay is configurable via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by setting the command argument

pkg/fqdn/doc.go Outdated
// - Users must explicitly allow `*.*.svc.cluster.local.` in k8s clusters.
// This is not automatic
// - L7 DNS rules are egress-only
// - The proxy emits L7 cilium-monitor events: one for the request and a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one for the request, an accept/reject event, and the final response.

pkg/fqdn/doc.go Outdated
// - The proxy emits L7 cilium-monitor events: one for the request and a
// accept/reject event, and the final response
//
// Apart from allowing or denying DNS requests the DNS proxy is used to observe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from allowing or denying DNS requests*,* the DNS proxy is used to observe

pkg/fqdn/doc.go Outdated
// is because of ipcache limitations when mapping ip->identity. Endpoint
// identities can clobber the FQDN IP identity
// - Despite being tracked per-Endpoint. DNS lookup IPs are collected into a
// global cache. This is historical and can be changed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this was an experiment on my part. I usually have proper periods in lists but I think I read somewhere that one isn't supposed to... so I tried it out. I actually did try to look it up and I think I'll add them back in to satisfy my OCD.

// startup. These caches apply TTL expiration and limit the IP count per domain.
// - Keep a global cache to combine all this DNS information and send it to the
// policy system. This cache applies TTL but not per-domain limits.
// This causes a DNS lookup in one endpoint to leak to another!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were many hand-wringey converstions about this. I think the conclusion was that:
1- Trusting the DNS data itself would use other tools (including controlling the DNS lookups and where they can go)
2- It's a little messy anyway because the DNS TTL is much shorter than a connection could be and we would expect to allow the whole connection. So letting it leak between endpoints is sort of like that (hand-wave).

Now that we have more wiring to the connection tracking we can clamp down on this quite a bit. Setting aside weird case like one pod doing service lookup and giving the IP to another (which I planned to support as a sibling to the DNS discovery) it's unlikely that leaking out of a pod will be needed.

pkg/fqdn/doc.go Outdated
// requests to a target with changing IPs (like S3) but another makes few
// requests that are long-lived. We need to ensure "fairness" where one does
// not starve the other. The limits in the per-Endpoint caches allow this, and
// the global cache acts are a collector across different Endpoints (without
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acts as?

Copy link
Contributor Author

@raybejjani raybejjani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nathanjsweet! I basically took all your comments in as-is. I reworded some bits.

- Endpoint regeneration is slow and the Policy Map has not been updated yet.
This can occur in cases where we have leaked IPs from the DNS cache (i.e. the
were never deleted correctly) or when there are legitimately many IPs. It can
also simply mean an overloaded node or even a deadlock within cilium.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no! an actual lock vs lock deadlock. We have had those because we have a bunch of concurrent stuff that synchronises with a combination of locks, channels and wait groups so it can get messy.

pkg/fqdn/doc.go Outdated
// is because of ipcache limitations when mapping ip->identity. Endpoint
// identities can clobber the FQDN IP identity
// - Despite being tracked per-Endpoint. DNS lookup IPs are collected into a
// global cache. This is historical and can be changed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this was an experiment on my part. I usually have proper periods in lists but I think I read somewhere that one isn't supposed to... so I tried it out. I actually did try to look it up and I think I'll add them back in to satisfy my OCD.

// startup. These caches apply TTL expiration and limit the IP count per domain.
// - Keep a global cache to combine all this DNS information and send it to the
// policy system. This cache applies TTL but not per-domain limits.
// This causes a DNS lookup in one endpoint to leak to another!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were many hand-wringey converstions about this. I think the conclusion was that:
1- Trusting the DNS data itself would use other tools (including controlling the DNS lookups and where they can go)
2- It's a little messy anyway because the DNS TTL is much shorter than a connection could be and we would expect to allow the whole connection. So letting it leak between endpoints is sort of like that (hand-wave).

Now that we have more wiring to the connection tracking we can clamp down on this quite a bit. Setting aside weird case like one pod doing service lookup and giving the IP to another (which I planned to support as a sibling to the DNS discovery) it's unlikely that leaking out of a pod will be needed.

@raybejjani raybejjani force-pushed the fqdn-add-docs branch 2 times, most recently from cbf965a to 4148d8f Compare April 22, 2020 14:38
@raybejjani
Copy link
Contributor Author

test-docs-please

pkg/fqdn/doc.go Outdated
// restrictions).
//
// Expiration of DNS data is handled by the dns-garbage-collector-job controller.
// Historically, the only expiraion was TTL based and the per-Endpoint and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"expiration"

Identities and Policy
~~~~~~~~~~~~~~~~~~~~~

Once a DNS response has been passed back through the proxy and is places in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"placed"

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Noticed two typos though :-)

@raybejjani
Copy link
Contributor Author

test-docs-please

@raybejjani raybejjani marked this pull request as ready for review April 23, 2020 12:03
@raybejjani raybejjani requested a review from a team April 23, 2020 12:03
@raybejjani raybejjani requested a review from a team as a code owner April 23, 2020 12:03
Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@raybejjani raybejjani requested a review from nathanjsweet April 23, 2020 13:38
@raybejjani raybejjani added the area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. label Apr 23, 2020
@raybejjani raybejjani added pending-review release-note/misc This PR makes changes that have no direct user impact. and removed dont-merge/debug-only labels Apr 23, 2020
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome docs.

I wonder on a couple of the points how well the docs will date (particularly things that reference history without a reference point to say when this information was true), but I don't really have any good suggestions how we can improve this. AFAIK all of this will stay true for the forseable future, as we're not actively planning any work at the moment on the core DNS handling engine.

Feel free to either address the comments below in this PR or follow up with the minor fixups.

Comment on lines 26 to 27
While there is no common culprit when debugging, the DNS Proxy is the most
one-off component in this chain. The cascading caching scheme is also complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what the most one-off component means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had "most likely to have bugs" in mind but I changed the wording to say that.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The proxy uses REFUSED DNS responses to indicate a denied request. Some libc
implementations, notably musl which is common in alpine images, terminate the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implementations, notably musl which is common in alpine images, terminate the
implementations, notably musl which is common in Alpine Linux images, terminate the


The proxy uses REFUSED DNS responses to indicate a denied request. Some libc
implementations, notably musl which is common in alpine images, terminate the
whole DNS search in these cases. To work around it, denied responses can be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the user-facing error in these cases? Could we provide an example of what this looks like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's app dependent, I'm afraid. The return from the lookup function is empty and the app would then error. I added a bit more explanation of what to look for.

mean that the policy does not select this endpoint or there is an issue with
the proxy redirection.
In the past, a bug occurred with more permissive L3 rules overriding the
proxy redirect, causing the proxy to never see the requests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, users can check whether there are redirects using cilium status --all-redirects.

In the past, a bug occurred with more permissive L3 rules overriding the
proxy redirect, causing the proxy to never see the requests.
- If the L7 DNS request is blocked, with an explicit denied message, then the
requests are not allowed by the proxy. This may be due to a typo or the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requests are not allowed by the proxy. This may be due to a typo or the
requests are not allowed by the proxy. This may be due to a typo in the network policy, or the

passing policy information to the proxy. There is no way to dump the rules in
the proxy, but a debug log is printed when a rule is added. Look for
``DNS Proxy updating matchNames in allowed list during UpdateRules``.
See pkg/proxy/dns.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last fragment doesn't seem fully formed. Maybe..

Suggested change
See pkg/proxy/dns.go
The pkg/proxy/dns.go file contains the DNS proxy implementation.

- A per-Endpoint ``DNSCache`` stores the lookups for this endpoint. It is
restored on cilium startup with the endpoint. Limits are applied here for
``--tofqdns-endpoint-max-ip-per-hostname`` and TTLs are tracked. The
``--tofqdns-min-ttl`` is not used here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``--tofqdns-min-ttl`` is not used here
``--tofqdns-min-ttl`` is not used here.

L3 ``toFQDNs`` rules are egress only, so we would expect to see an ``Egress``
entry with Security Identity ``16777217``. The L7 rule, used to redirect to the
DNS Proxy is also present with a populated ``PROXY PORT``. It has a 0
``IDENTITY`` as it is an L3 wildcard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``IDENTITY`` as it is an L3 wildcard.
``IDENTITY`` as it is an L3 wildcard, ie the policy allows any peer on the specified port.

pkg/fqdn/doc.go Outdated
// to-evict data internally and only do so on GC method calls from the
// controller.
// When DNS data is evicted from any per-Endpoint cache, for any reason, each
// IP is retained as a "zombie". These are tracked per IP for that endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can work a little more clarity into here, something like:

These "zombies" represent IPs that were previously associated with a resolved DNS name, but the DNS name is no longer known (for example because of TTL expiry). However there may still be an active connection associated with the zombie IP.

(then the following part describes alive/dead zombies)

pkg/fqdn/doc.go Outdated
// connections to the same IPs, however, so these IPs may be inserted into the
// global cache for the same domain or a different one (or both).
// Note: The CT GC has a variable run period. This ranges from 30s to 12 hours
// and is shorter when more connection churn is observed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit for maintainability, I wonder for these constants like 1000, 30s, 12h whether we should refer to the variable that holds these configurations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it but I erred on being lazy. These would be the defaults like defaults.ToFQDNsMaxDeferredConnectionDeletes. I wasn't sure how valuable that would be since the names are a tad opaque and finding them would be straightforward. The actual value seemed more important to me, since knowing that something may take 12 hours would change the debugging choices. I added the references in with the numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about someone editing the default value, then wanting to grep to find references, so having the variable there should help keeping these docs up to date in such a case (hopefully).


WARNING: You are looking at unreleased Cilium documentation.
Please use the official rendered version released here:
http://docs.cilium.io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http://docs.cilium.io
https://docs.cilium.io

We switched to the https version for this URL in all the documentation a few days ago

The interactions of L3 toFQDNs and L7 DNS rules can be difficult to debug
around. Unlike many other policy rules, these are resolved at runtime with
unknown data. Pods may create large numbers of IPs in the cache or the IPs
returned are not compatible with our datapath implementation. Sometimes we also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
returned are not compatible with our datapath implementation. Sometimes we also
returned may not be compatible with our datapath implementation. Sometimes we also


If an IP exists in the FQDN cache (check with ``cilium fqdn cache list``) then
``toFQDNs`` rules that select a domain name, either explicitly via
``matchName`` or via ``matchPattern`` should cause IPs for that domain to have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``matchName`` or via ``matchPattern`` should cause IPs for that domain to have
``matchName`` or via ``matchPattern``, should cause IPs for that domain to have

16777217 cidr:104.198.14.52/32
reserved:world

Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16 Million range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16 Million range.
Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16-million range.

Also, break the long line for consistency?

for IPs that came from the DNS Proxy. Other CIDR identities would not be
included.

*Datapath Plumbing*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Sphinx render this as a paragraph title? I'd expect it to joint it with the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I had placeholders for the sections and I forgot to switch this one. Good catch!

16777217
&LabelSelector{MatchLabels:map[string]string{reserved.none: ,},MatchExpressions:[]LabelSelectorRequirement{},} 1

In this example 16777217 is used by two selectors, one with ``matchPattern: "*"`` and another empty one. This is because of the policy in use:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break the line?

Egress cidr:104.198.14.52/32 ANY NONE 477 6
reserved:world

Note that the labels for identities are resolved here. This can be skipped, or there may be cases where this doesn't occur:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break the line?

The fqdn package has grown considerably since the first version of the
package doc. It is very out-of-date. It now includes a more expanded
discussion of how data moves within cilium and where the fqdn components
sit in that.

Signed-off-by: Ray Bejjani <[email protected]>
Debugging FQDN-related issues comes up often, and when developing the
issues are more subtle. We don't have a section for developer debugging
so we now do, and it includes a walkthrough of how to investigate the
DNS proxy and fqdn DNS and policy components. It also touches briefly on
how to interrogate the datapath Policy Map.

Signed-off-by: Ray Bejjani <[email protected]>
@raybejjani
Copy link
Contributor Author

test-docs-please

@raybejjani raybejjani merged commit 6fcfd39 into cilium:master Apr 27, 2020
@raybejjani raybejjani deleted the fqdn-add-docs branch April 27, 2020 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants