-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hubble: Populate traffic direction for trace and drop events #11062
hubble: Populate traffic direction for trace and drop events #11062
Conversation
test-me-please |
// We need to access the connection tracking result from the `Reason` | ||
// field to invert the direction for reply packets. The `Reason` field | ||
// is populated for TRACE_TO_{LXC,HOST,STACK,PROXY} events. | ||
// TRACE_TO_PROXY events happen for both ingress and egress flows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRACE_TO_PROXY events happen for both ingress and egress flows
So do TraceToLxc
, TraceToStack
, TraceToHost
flows depending on whether the endpoint, remote peer or host initiates the connection or is responding to the connection. decodeIsReply()
below appears to be relying on connection-tracking state to determine whether the traffic is a reply or not, so this seems to be more to do with whether CT runs on the particular path through the datapath or not.
I'd expect for us to decide to send something to the proxy, we'd need CT state so should be able to provide the same visibility, though it's possible that for historic reasons the trace messages don't always contain this information.
I guess this brings us to basically, it looks like these observation points reliably provide the information today, and while other observation points might be able to provide the info, they may/may not be reliably providing it with the current implementation. If this PR is improving the ability to detect flow direction, then great. I wouldn't be surprised if we need to come back and revisit this for other observation points though (for instance, we're missing stuff like ToOverlay
which is a common deployment setup but will handle traffic from local endpoints OR the local host, so there's no guarantee that we run the connection tracker).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the feedback @joestringer! Much appreciated.
TRACE_TO_PROXY
events happen for both ingress and egress flowsSo do
TraceToLxc
,TraceToStack
,TraceToHost
flows depending on whether the endpoint, remote peer or host initiates the connection or is responding to the connection.
You are, of course, absolutely correct.
I somehow completely overlooked the obvious fact that I also have to look at the source/destination of the traffic do determine its direction. The code in this PR is not sufficient to determine the traffic direction 🤦♂️ And yes, I should be also able to support ToProxy
, it reliably contains the CT result. I'm putting this PR back into Draft mode.
FWIW: The reason why ToOverlay
is absent from the list, is that it doesn't seem to actually populate the CT reason in the send_trace_notify
call (second last argument):
Lines 138 to 139 in ae5588c
send_trace_notify(ctx, TRACE_TO_OVERLAY, seclabel, 0, 0, ENCAP_IFINDEX, | |
0, monitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have reworked the code to actually infer the traffic direction. It is now comparing the endpoint ID behind the actual source of the IP packet with the endpoint ID of the trace event to determine the traffic direction.
Note that I still consider the traffic information best effort. I have added support for DropNotify
as well, with the (in reality not always valid) assumption that we don't need connection tracking there, since for dropped packets there is (probably) no ongoing connection. However, since we don't have the connection tracking state available for drop events anyways, there is not much we can do otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now comparing the endpoint ID behind the actual source of the IP packet with the endpoint ID of the trace event to determine the traffic direction.
I don't quite follow this part. A packet can be coming from or going to arbitrary peers, but that doesn't tell you whether the connection was initiated from there or somewhere else. That's why we bring in the connection tracker, which stores state that tells you, based upon a 5-tuple lookup, the originally-seen packet 5tuple (and hence direction when compared to the current tuple).
However, since we don't have the connection tracking state available for drop events anyways, there is not much we can do otherwise.
On average for drops we can roughly assume they're forward direction because we're dropping them and it's hard to sustain a communication channel if one of the directions is dropped :) ...but yeah in the corner cases like policy transition from allow -> deny, or other weird cases like Cilium restart (and hence proxy connection gets disrupted), drops can get tricky to classify with 100% confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the implementation I think we're on the same page on these points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the review and input @joestringer!
You have made me think about some more cases that I haven't considered. For example I'm currently only comparing the source address of the packet. I wonder if I should check for tn.Source == dstEP
as well (and assign traffic direction unknown
if neither source nor destination are the endpoint at which the event occurred). I also have not considered if NAT has an impact on correctness here as well. Maybe the proper way would be to just extend the trace points in the datapath with a direction flag, instead of trying to re-implement the logic here.
For now, I think I going to leave it as is. I might revisit the approach in the future, but I currently don't want to allocate too many cycles on a best-effort feature.
Signed-off-by: Sebastian Wicki <[email protected]>
Where possible, populate the TrafficDirection field from the trace and drop notifications. This is done by comparing the `Source` endpoint field of the trace/drop event with source address of the captured IP packet. For drop notifications, we assume there are no ongoing connections which this packet belongs to and just compare the source address. For trace notifications however, we do assume that there might an ongoing connection. Therefore, we rely on the connection tracking state to revert the traffic direction for reply packages. For trace observation points without connection tracking, we do not assign any traffic direction. Signed-off-by: Sebastian Wicki <[email protected]>
f60216c
to
fa7f33e
Compare
test-me-please |
Where possible, populate the
TrafficDirection
field from the traceand drop notifications. This is done by comparing the
Source
endpoint field of the trace/drop event with source address of the
captured IP packet.
For drop notifications, we assume there are no ongoing connections
which this packet belongs to and just compare the source address. For
trace notifications however, we do assume that there might an ongoing
connection. Therefore, we rely on the connection tracking state to
revert the traffic direction for reply packages. For trace observation
points without connection tracking, we do not assign any traffic
direction.
Pulling in the datapath team for a quick review as well to confirm that.
/cc @michi-covalent