-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Traceflow implementation for external IPs and gateway IP #1884
Fix Traceflow implementation for external IPs and gateway IP #1884
Conversation
ob.TunnelDstIP = tunnelDstIP | ||
ob.Action = opsv1alpha1.Forwarded | ||
} else if ipDst == gatewayIP.String() && outputPort == config.HostGatewayOFPort { | ||
ob.Action = opsv1alpha1.Delivered | ||
} else if c.networkConfig.TrafficEncapMode.SupportsEncap() && outputPort == config.HostGatewayOFPort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if ipDst in Pod CIDR here? Current we support to trace inter-Node packet without encapsulation, but this behavior is changed in this PR.
Otherwise the e2e test will fail: https://github.com/vmware-tanzu/antrea/pull/1884/checks?check_run_id=1924336861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I can see that I forgot an if case here. Added it back.
Codecov Report
@@ Coverage Diff @@
## main #1884 +/- ##
=======================================
Coverage ? 53.48%
=======================================
Files ? 200
Lines ? 17272
Branches ? 0
=======================================
Hits ? 9238
Misses ? 6873
Partials ? 1161
Flags with carried forward coverage won't be shown. Click here to find out more. |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new behavior sound good to me.
pkg/agent/openflow/pipeline.go
Outdated
Action().SendToController(uint8(PacketInReasonTF)). | ||
Cookie(c.cookieAllocator.Request(category).Raw()). | ||
Done()) | ||
// Only SendToController if output port is local gateway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still output when kube-proxy is used, could we still trace the Service traffic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, we do not support tracing Service traffic when AntreaProxy is disabled. It is documented (https://github.com/vmware-tanzu/antrea/blob/main/docs/traceflow-guide.md#prerequisites) and the following error will be logged by the agent: Error: using Service destination requires AntreaProxy feature enabled
. However, it dates back to before we started using DSCP to carry the Traceflow dataplane tag. Now that we do, it may be possible to lift the restriction, since iptables will preserve the DSCP field. I commented out the error in the agent code, and this is what I get for a Traceflow request from a Pod to a Service, when AntreaProxy is disabled:
name: default-toolbox-jvmv5-to-kube-system-kube-dns-xs25d865
phase: Succeeded
source: default/toolbox-jvmv5
destination: kube-system/kube-dns
results:
- node: k8s-node-worker-1
timestamp: 1613785297
observations:
- component: Forwarding
componentInfo: Classification
action: Received
- component: Forwarding
componentInfo: Output
action: Delivered
- node: k8s-node-control-plane
timestamp: 1613785297
observations:
- component: SpoofGuard
action: Forwarded
- component: Forwarding
componentInfo: Output
action: Forwarded
tunnelDstIP: 192.168.77.101
- node: k8s-node-control-plane
timestamp: 1613785297
observations:
- component: SpoofGuard
action: Forwarded
- component: Forwarding
componentInfo: Output
action: Forwarded
This is definitely not as straightforward as the one when AntreaProxy is enabled (although it is accurate):
name: default-toolbox-jvmv5-to-kube-system-kube-dns-m489kxwk
phase: Succeeded
source: default/toolbox-jvmv5
destination: kube-system/kube-dns
results:
- node: k8s-node-control-plane
timestamp: 1613785535
observations:
- component: SpoofGuard
action: Forwarded
- component: LB
action: Forwarded
pod: kube-system/coredns-74ff55c5b-vc7z7
translatedDstIP: 10.10.1.2
- component: Forwarding
componentInfo: Output
action: Forwarded
tunnelDstIP: 192.168.77.101
- node: k8s-node-worker-1
timestamp: 1613785535
observations:
- component: Forwarding
componentInfo: Classification
action: Received
- component: Forwarding
componentInfo: Output
action: Delivered
I am personally not sure it is worth lifting the restriction, but I can open a PR to do it and update the documentation. I don't know if there is any edge case I am missing. @jianjuns what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Probably not worthwhile, given AntreaProxy is enabled by default now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a typo
PR antrea-io#1883 fixes a panic in libOpenflow triggered when OVS receives reply traffic for a Traceflow request with a valid dataplane tag as the ToS field and the Linux packet mark set. However, it should be noted that reply packets for Traceflow requests are generally meaningless and should be ignored. In encapMode, The Traceflow implementation should also not timeout when a Traceflow request leaves the overlay: as soon as the request is forwarded through the gateway port, we should consider the request complete, and ignore any potential reply packet. So we include the following changes: * add a new "ForwardedOutOfOverlay" Traceflow action when a request is forwarded out of the network managed by Antrea in encapMode. The Controller can then mark the request as "succeeded". In theory, something similar could be done for other traffic modes, but it would be much more complex. * add support for Traceflow requests for which the destination is the gateway's IP, by reporting a "Delivered" action. * add an OVS flow in charge of dropping reply traffic for Traceflow requests (using the conntrack state to match this traffic), thus ensuring it is not set to the Agent. In our testing, this is especially useful when the destination IP is the local Node's IP, as the IP ToS field seems to be preseved in that case, causing the reply packet to be treated as a Traceflow request. We add end-to-end tests for both cases (external destination IP and Antrea gateway destination IP). See antrea-io#1878
Co-authored-by: Quan Tian <[email protected]>
a778c0a
to
95f43fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
/test-conformance |
PR #1883 fixes a panic in libOpenflow triggered when OVS receives reply
traffic for a Traceflow request with a valid dataplane tag as the ToS
field and the Linux packet mark set. However, it should be noted that
reply packets for Traceflow requests are generally meaningless and
should be ignored. In encapMode, The Traceflow implementation should
also not timeout when a Traceflow request leaves the overlay: as soon as
the request is forwarded through the gateway port, we should consider
the request complete, and ignore any potential reply packet. So we
include the following changes:
forwarded out of the network managed by Antrea in encapMode. The
Controller can then mark the request as "succeeded". In theory,
something similar could be done for other traffic modes, but it would
be much more complex.
gateway's IP, by reporting a "Delivered" action.
requests (using the conntrack state to match this traffic), thus
ensuring it is not sent to the Agent. In our testing, this is
especially useful when the destination IP is the local Node's IP, as
the IP ToS field seems to be preseved in that case, causing the reply
packet to be treated as a Traceflow request.
We add end-to-end tests for both cases (external destination IP and
Antrea gateway destination IP).
See #1878